You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On Windows, I cannot use multimodal models like llava-v1.6-vicuna-7b:q4_0 or llava-llama-3-8b-v1.1, as it throws an error: "Error running ggml inference: Failed to load clip model: C:\Users\xxx.cache\nexa\hub\official\llava-v1.6-vicuna-7b\projector-q4_0.gguf. Please refer to our docs to install the nexaai package: https://docs.nexaai.com/getting-started/installation." However, I am able to use the omniVLM:fp16 model, and I am not sure where the issue is occurring.
Issue Description
On Windows, I cannot use multimodal models like llava-v1.6-vicuna-7b:q4_0 or llava-llama-3-8b-v1.1, as it throws an error: "Error running ggml inference: Failed to load clip model: C:\Users\xxx.cache\nexa\hub\official\llava-v1.6-vicuna-7b\projector-q4_0.gguf. Please refer to our docs to install the nexaai package: https://docs.nexaai.com/getting-started/installation." However, I am able to use the omniVLM:fp16 model, and I am not sure where the issue is occurring.
Steps to Reproduce
1.
$env:CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON"; pip install nexaai --prefer-binary --index-url https://github.nexa.ai/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir
2.
nexa run llava-v1.6-vicuna-7b:q4_0
OS
Windows 11
Python Version
3.12.7
Nexa SDK Version
0.0.9.6
GPU (if using one)
NVIDIA RTX 4090 Laptop,CUDA 12.3
The text was updated successfully, but these errors were encountered: