diff --git a/docs/images/tgwui_llava_drag-n-drop_birds.gif b/docs/images/tgwui_llava_drag-n-drop_birds.gif new file mode 100644 index 00000000..a1030d12 Binary files /dev/null and b/docs/images/tgwui_llava_drag-n-drop_birds.gif differ diff --git a/docs/images/tgwui_multimodal_llava_spacewalk.png b/docs/images/tgwui_multimodal_llava_spacewalk.png new file mode 100644 index 00000000..7db6a4a0 Binary files /dev/null and b/docs/images/tgwui_multimodal_llava_spacewalk.png differ diff --git a/docs/tutorial_llava.md b/docs/tutorial_llava.md index a2546044..77e6f95f 100644 --- a/docs/tutorial_llava.md +++ b/docs/tutorial_llava.md @@ -1,3 +1,160 @@ -# Tutorial - Llava +# Tutorial - LLaVA -Coming soon. \ No newline at end of file +Give your locally running LLM an access to vision, by running [LLaVA](https://llava-vl.github.io/) on Jetson! + +![](./images/tgwui_multimodal_llava_spacewalk.png) + +## Clone and set up `jetson-containers` + +``` +git clone https://github.com/dusty-nv/jetson-containers +cd jetson-containers +sudo apt update; sudo apt install -y python3-pip +pip3 install -r requirements.txt +``` +## 1. Use `text-generation-webui` container to test Llava model + +!!! abstract "What you need" + + 1. One of the following Jetson: + + Jetson AGX Orin 64GB + Jetson AGX Orin (32GB) + + 2. Running one of the following [JetPack.5x](https://developer.nvidia.com/embedded/jetpack) + + JetPack 5.1.2 (L4T r35.4.1) + JetPack 5.1.1 (L4T r35.3.1) + JetPack 5.1 (L4T r35.2.1) + + 3. Sufficient storage space (preferably with NVMe SSD). + + - `6.2GB` for `text-generation-webui` container image + - Space for models + - CLIP model : `1.7GB` + - Llava-Llama2 merged model : `7.3GB` + +### Use `text-generation-webui` container for web UI + +Using the `multimodal` extension, you can use the LLaVA model in oobaboonga's `text-generation-webui`. + +``` +./run.sh $(./autotag text-generation-webui) /bin/bash -c \ + "python3 /opt/text-generation-webui/download-model.py \ + --output=/data/models/text-generation-webui \ + liuhaotian/llava-llama-2-13b-chat-lightning-gptq" +./run.sh $(./autotag text-generation-webui) /bin/bash -c \ + "cd /opt/text-generation-webui && python3 server.py --listen \ + --model-dir=/data/models/text-generation-webui \ + --model=liuhaotian_llava-llama-2-13b-chat-lightning-gptq \ + --multimodal-pipeline=llava-llama-2-13b \ + --extensions=multimodal \ + --chat \ + --verbose" +``` + +Go to **Chat** tab, drag and drop an image of your choice into the **Drop Image Here** area, and your question in the text area above and hit **Generate** button. + +![](./images/tgwui_llava_drag-n-drop_birds.gif) + +### Result + +![](./images/tgwui_multimodal_llava_spacewalk.png) + +## 2. Use `llava` container to run `llava.serve.cli` + +!!! abstract "What you need" + + 1. One of the following Jetson: + + Jetson AGX Orin 64GB + Jetson AGX Orin (32GB) + + 2. Running one of the following [JetPack.5x](https://developer.nvidia.com/embedded/jetpack) + + JetPack 5.1.2 (L4T r35.4.1) + JetPack 5.1.1 (L4T r35.3.1) + JetPack 5.1 (L4T r35.2.1) + + 3. Sufficient storage space (preferably with NVMe SSD). + + - `6.1GB` for `llava` container image + - Space for models + - 7B model : `14GB`, or + - 13B model : `26GB` + +!!! tip "" + + See [`jetson-containers`' `llava` package README](https://github.com/dusty-nv/jetson-containers/blob/master/packages/llm/llava/README.md) for more infomation** + + + +### llava-llama-2-7b-chat + +``` +./run.sh --env HUGGING_FACE_HUB_TOKEN= $(./autotag llava) \ + python3 -m llava.serve.cli \ + --model-path liuhaotian/llava-llama-2-7b-chat-lightning-lora-preview \ + --model-base meta-llama/Llama-2-7b-chat-hf \ + --image-file /data/images/hoover.jpg +``` + +### llava-llama-2-13b-chat + +This may only run on Jetson AGX Orin 64GB. + +``` +./run.sh $(./autotag llava) \ + python3 -m llava.serve.cli \ + --model-path liuhaotian/llava-llama-2-13b-chat-lightning-preview \ + --image-file /data/images/hoover.jpg +``` + + + diff --git a/docs/tutorial_nanosam.md b/docs/tutorial_nanosam.md index 55c9fe98..0e7509a4 100644 --- a/docs/tutorial_nanosam.md +++ b/docs/tutorial_nanosam.md @@ -1,5 +1,7 @@ # Tutorial - NanoSAM -Coming soon. +See "NanoSAM" repo. + +[https://github.com/NVIDIA-AI-IOT/nanosam](https://github.com/NVIDIA-AI-IOT/nanosam) ![](https://raw.githubusercontent.com/NVIDIA-AI-IOT/nanosam/main/assets/mouse_gif_compressed.gif) \ No newline at end of file