Skip to content

Commit

Permalink
Merge pull request #153 from dusty-nv/20240516-staging
Browse files Browse the repository at this point in the history
copy tweaks
  • Loading branch information
dusty-nv authored May 16, 2024
2 parents 0482ad5 + c2c75c4 commit 2cd4f81
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/tutorial_live-llava.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ You can also tag incoming images and add them to the database using the web UI,

## Video VILA

The VILA-1.5 family of models can understand multiple images per query, enabling video summarization, action & behavior analysis, change detection, and other temporal-based vision functions. The [`vision/video.py`](https://github.com/dusty-nv/NanoLLM/blob/main/nano_llm/vision/video.py){:target="_blank"} example keeps a rolling history of frames:
The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. The [`vision/video.py`](https://github.com/dusty-nv/NanoLLM/blob/main/nano_llm/vision/video.py){:target="_blank"} example keeps a rolling history of frames:

``` bash
jetson-containers run $(autotag nano_llm) \
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial_nano-vlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ The [Live Llava](tutorial_live-llava.md) tutorial shows how to enable additional

## Video Sequences

The VILA-1.5 family of models can understand multiple images per query, enabling video summarization, action & behavior analysis, change detection, and other temporal-based vision functions. By manipulating the KV cache and dropping off the last frame from the chat history, we can keep the stream rolling continuously beyond the maximum context length of the model. The [`vision/video.py`](https://github.com/dusty-nv/NanoLLM/blob/main/nano_llm/vision/video.py){:target="_blank"} example shows how to use this:
The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. By manipulating the KV cache and dropping off the last frame from the chat history, we can keep the stream rolling continuously beyond the maximum context length of the model. The [`vision/video.py`](https://github.com/dusty-nv/NanoLLM/blob/main/nano_llm/vision/video.py){:target="_blank"} example shows how to use this:

``` bash
jetson-containers run $(autotag nano_llm) \
Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ nav:
- Audio:
- Whisper: tutorial_whisper.md
- AudioCraft: tutorial_audiocraft.md
- VoiceCraft: tutorial_voicecraft.md
- VoiceCraft 🆕: tutorial_voicecraft.md
- Metropolis Microservices:
- First Steps: tutorial_mmj.md
# - Tools:
Expand Down

0 comments on commit 2cd4f81

Please sign in to comment.