Skip to content

Commit

Permalink
Merge branch 'milestone6' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
bfulroth committed Dec 11, 2023
2 parents 7088626 + 1272432 commit 3c8e50a
Show file tree
Hide file tree
Showing 263 changed files with 937 additions and 27,970 deletions.
11 changes: 0 additions & 11 deletions Pipfile

This file was deleted.

20 changes: 0 additions & 20 deletions Pipfile.lock

This file was deleted.

217 changes: 108 additions & 109 deletions README.md

Large diffs are not rendered by default.

187 changes: 187 additions & 0 deletions README_Template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
AC215 - \<Project Title>
==============================
### Presentation Video
* \<Link Here>

### Blog Post Link
* \<Link Here>
---


Notes:

The files are empty placeholders only
Never commit large data files,trained models, personal API Keys/secrets to GitHub

Project Organization
------------

.
├── .github
│   ├── workflows
│ │   │   ├── cicdworkflow.yaml
├── data # DO NOT UPLOAD DATA
│   ├── interim <- Intermediate preprocessed data
│   │   ├── test.csv
│   │   ├── train.csv
│   │   └── val.csv
│   ├── processed <- Final dataset files for modeling
│   │   ├── file_00-0.tfrec
│   │   ├── file_00-1.tfrec
│   │   ├── file_00-2.tfrec
│   │   └── file_00-3.tfrec
│   └── raw <- Original immutable input data
│   └── training_data.zip
├── LICENSE
├── notebooks <- Jupyter notebooks for EDA and model testing
│   ├── eda.ipynb
│   └── model_testing.ipynb
├── README.md
├── references <- Reference materials such as papers
├── reports <- Folder containing your milestone markdown submissions
│   ├── milestone2.md
│   └── milestone3.md
│   └── milestone4.md
│   └── milestone5.md
├── presentations <- Folder containing your midterm presentation
│   └── midterm.pdf
├── requirements.txt
├── src <- Source code and Dockerfiles for data processing and modeling
│   ├── data-collector <- Scripts for dataset creation
│   │   ├── ...
│   ├── data-processor <- Code for data processing
│   │   ├── ...
│   └── model-training <- Model training, evaluation, and prediction code
│   │   ├── ...
│   └── model-deploy <- Model deployment
│   │   ├── ...
│   ├── workflow <- Scripts for automating data collection, preprocessing, modeling
│   │   ├── ...
│   ├── api-service <- Code for App backend APIs
│   │   ├── ...
│   ├── frontend <- Code for App frontend
│   │   ├── ...
│   ├── deployment <- Code for App deployment to GCP
│   │   ├── deploy-create-instance.yml
│   │   ├── deploy-docker-images.yml
│   │   ├── deploy-provision-instance.yml
│   │   ├── deploy-setup-containers.yml
│   │   ├── deploy-setup-webserver.yml
│   │   ├── deploy-k8s-cluster.yml
│   │   ├── inventory.yml
│   │   ├── Dockerfile
│   │   ├── docker-entrypoint.sh
│   │   ├── docker-shell.sh

--------

# AC215 - Final Project

**Team Members**
Pavlov Protovief, Paolo Primopadre and Pablo El Padron

**Group Name**
Awesome Group

**Project - Problem Definition**
In this project we aim to develop an application that can identify various species of mushrooms in the wild using computer vision and offer educational content through a chatbot interface.

## Data Description

## Proposed Solution

After completions of building a robust ML Pipeline in our previous milestone we have built a backend api service and frontend app. This will be our user-facing application that ties together the various components built in previous milestones.

**Mushroom App**

A user friendly React app was built to identify various species of mushrooms in the wild using computer vision models from the backend. Using the app a user can take a picture of a mushroom and upload it. The app will send the image to the backend api to get prediction results on weather the mushroom is poisonous or not.

Here are some screenshots of our app:
<img src="images/frontend-1.png" width="800">

<img src="images/frontend-2.png" width="800">

**Kubernetes Deployment**

We deployed our frontend and backend to a kubernetes cluster to take care of load balancing and failover. We used ansible scripts to manage creating and updating the k8s cluster. Ansible helps us manage infrastructure as code and this is very useful to keep track of our app infrastructure as code in GitHub. It helps use setup deployments in a very automated way.

Here is our deployed app on a K8s cluster in GCP:
<img src="images/k8s-cluster.png" width="800">


### Code Structure

The following are the folders from the previous milestones:
```
- data-collector
- data-processor
- model-training
- model-deploy
- api-service
- frontend
- deployment
```

**API Service Container**
This container has all the python files to run and expose thr backend apis.

To run the container locally:
- Open a terminal and go to the location where `awesome-app/src/api-service`
- Run `sh docker-shell.sh`
- Once inside the docker container run `uvicorn_server`
- To view and test APIs go to `http://localhost:9000/docs`

**Frontend Container**
This container contains all the files to develop and build a react app. There are dockerfiles for both development and production

To run the container locally:
- Open a terminal and go to the location where `awesome-app/src/frontend`
- Run `sh docker-shell.sh`
- If running the container for the first time, run `yarn install`
- Once inside the docker container run `yarn start`
- Go to `http://localhost:3000` to access the app locally


**Deployment Container**
This container helps manage building and deploying all our app containers. The deployment is to GCP and all docker images go to GCR.

To run the container locally:
- Open a terminal and go to the location where `awesome-app/src/deployment`
- Run `sh docker-shell.sh`
- Build and Push Docker Containers to GCR (Google Container Registry)
```
ansible-playbook deploy-docker-images.yml -i inventory.yml
```

- Create & Deploy Cluster
```
ansible-playbook deploy-k8s-cluster.yml -i inventory.yml --extra-vars cluster_state=present
```

- View the App
* Copy the `nginx_ingress_ip` from the terminal from the create cluster command
* Go to `http://<YOUR INGRESS IP>.sslip.io`

- Run ML Tasks in Vertex AI
* Run `python cli.py --data_collector`, run just the data collector on Vertex AI
* Run `python cli.py --data_processor`, run just the data processor on Vertex AI
* Run `python cli.py --pipeline`, run the entire ML pipeline in Vertex AI

### Deploy using GitHub Actions

Finally we added CI/CD using GitHub Actions, such that we can trigger deployment or any other pipeline using GitHub Events. Our yaml files can be found under `.github/workflows`

`cicdworkflow.yml` - Brief description here

We implemented a CI/CD workflow to use the deployment container to
* Invoke docker image building and pushing to GCR on code changes
* Deploy the changed containers to update the k8s cluster
* Run Vertex AI jobs if needed

---

## NOTE

**DO NOT KEEP YOUR GCP INSTANCES RUNNING**

Once you are done with taking screenshots for the milestone bring them down.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
6 changes: 6 additions & 0 deletions app/src/data/FooDD.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
outs:
- md5: 19e0ef1d6e16de4e1571d63e4fb7a61e.dir
size: 5025229542
nfiles: 3887
hash: md5
path: FooDD
6 changes: 6 additions & 0 deletions app/src/data/Nutrition5k_Other.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
outs:
- md5: b288772da01471caa0b4c5c089b7c24c.dir
size: 2329639
nfiles: 3
hash: md5
path: Nutrition5k_Other
6 changes: 6 additions & 0 deletions app/src/data/Nutrition5k_realsense_overhead.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
outs:
- md5: 15ba7c603b87eb7aab4b114de8dff413.dir
size: 3178878531
nfiles: 10470
hash: md5
path: Nutrition5k_realsense_overhead
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The idea is that more labels and more images could be added to our raw dataset w

**Screenshot of a successful run of the command in the Google VM**

![](../reports/data_labels_processing_output_1.jpg)
![](../../../reports/data_labels_processing_output_1.jpg)
3) Output should go to Google Bucket directory specified in the filepath constants at the top of the labels_processing.py script.
Outputs are as follows:
-full_cleaned_dish_labels.csv (data before splits)
Expand All @@ -34,4 +34,4 @@ Outputs are as follows:
-validation_data.pickle

**Screenshot of a successful csv and train, test, split pkl outputs on the Google VM**
![](../reports/data_labels_processing_output_2.jpg)
![](../../../reports/data_labels_processing_output_2.jpg)
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ set -e

export IMAGE_NAME=labels-processing-docker-image
export BASE_DIR=$(pwd)
export SECRETS_DIR=$(pwd)/../secrets/
export SECRETS_DIR=$(pwd)/../../../secrets/
export GCS_BUCKET_URI="gs://snapnutrition_data_bucket"
export GCP_PROJECT="csci-115-398800"
export GCP_REGION="us-central1"
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -90,18 +90,18 @@ git push --atomic origin {GIT_BRANCH_NAME} '{VERSION_TAG}'
```

#### Screenshot of Succesful Versioning for Data Label CSV's
![](../reports/data_versioning_1.png)
![](../../../reports/data_versioning_1.png)

### **Switching Dataset Versions**

In our set-up, you would most commonly do this outside of the VM. To grab different versions in a local codebase, follow this link: https://dvc.org/doc/command-reference/checkout

Otherwise, look at our [colab notebook](../notebooks/data_versioning_control_demo.ipynb) on a common use case and how to switch versions to download:
Otherwise, look at our [colab notebook](../../../notebooks/data_versioning_control_demo.ipynb) on a common use case and how to switch versions to download:

#### Screenshot of notebook downloading version 2 of dish labels and Seeing the Test csv Appear!

![](../reports/dvc_notebook_1.png)
Again, full [colab notebook here](../notebooks/data_versioning_control_demo.ipynb)
![](../../../reports/dvc_notebook_1.png)
Again, full [colab notebook here](../../../notebooks/data_versioning_control_demo.ipynb)

### Save Model, Output, and/or Dataset Version Dependencies
To checkpoint dependencies (e.g. a saved model and datset version, see detailed instructions from the DVC documentation here: https://dvc.org/doc/use-cases/versioning-data-and-models/tutorial)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ set -e

export IMAGE_NAME=dvc-docker-image
export BASE_DIR=$(pwd)
export SECRETS_DIR=$(pwd)/../secrets/
export SECRETS_DIR=$(pwd)/../../../secrets/
export GCS_BUCKET_URI="gs://snapnutrition_data_bucket"
export GCP_PROJECT="csci-115-398800"
export GCP_REGION="us-central1"
Expand Down
10 changes: 5 additions & 5 deletions app/README.md → app/src/deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -516,18 +516,18 @@ sudo docker exec -it nginx /bin/bash
**Example of Containers Automatically Registered in Google Container Registry using Ansible**

Ansible CLI output when deploying Docker images
![](../reports/mile_5_cli_deploy_containers.png)
![](../../../reports/mile_5_cli_deploy_containers.png)

Google Container Registry
![](../reports/mile_5_containers_GCR.png)
![](../../../reports/mile_5_containers_GCR.png)

**Example of Automatically Created and Provisioned VM using Ansible**<br>

Ansible CLI output when provisioning instance
![](../reports/mile_5_cli_provision_VM.png)
![](../../../reports/mile_5_cli_provision_VM.png)

Automatically deployed VM running
![](../reports/mile_5_automatically_deployed_vm.png)
![](../../../reports/mile_5_automatically_deployed_vm.png)

SSH into VM shows three containers running (nginx, api-service, frontend)
![](../reports/mile_5_vm_running_3_containers.png)
![](../../../reports/mile_5_vm_running_3_containers.png)
6 changes: 3 additions & 3 deletions app/src/deployment/deploy-docker-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@
community.general.docker_image:
build:
path: /frontend
name: gcr.io/{{ gcp_project }}/snapnutrition-app-frontend-react-c:{{ tag.stdout}}
name: gcr.io/{{ gcp_project }}/snapnutrition-app-frontend-mushroom-inspired-c:{{ tag.stdout}}
source: build
- name: Push frontend image to GCR
community.general.docker_image:
name: gcr.io/{{ gcp_project }}/snapnutrition-app-frontend-react-c:{{ tag.stdout}}
repository: gcr.io/{{ gcp_project }}/snapnutrition-app-frontend-react-c:{{ tag.stdout}}
name: gcr.io/{{ gcp_project }}/snapnutrition-app-frontend-mushroom-inspired-c:{{ tag.stdout}}
repository: gcr.io/{{ gcp_project }}/snapnutrition-app-frontend-mushroom-inspired-c:{{ tag.stdout}}
push: yes
source: local
- name: Build api-service container image
Expand Down
2 changes: 1 addition & 1 deletion app/src/deployment/deploy-k8s-cluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@
run: frontend
spec:
containers:
- image: "gcr.io/{{ gcp_project }}/snapnutrition-app-frontend-react-c:{{ tag.stdout}}"
- image: "gcr.io/{{ gcp_project }}/snapnutrition-app-frontend-mushroom-inspired-c:{{ tag.stdout}}"
imagePullPolicy: IfNotPresent
name: frontend
ports:
Expand Down
2 changes: 1 addition & 1 deletion app/src/deployment/deploy-setup-containers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
- name: Create frontend container
docker_container:
name: "frontend"
image: "gcr.io/{{ gcp_project }}/snapnutrition-app-frontend-react-c:{{ tag.stdout}}"
image: "gcr.io/{{ gcp_project }}/snapnutrition-app-frontend-mushroom-inspired-c:{{ tag.stdout}}"
state: started
recreate: yes
pull: yes
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 3c8e50a

Please sign in to comment.