Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vLLM ZeroDivisionError #3373

Open
mfahadyousaf opened this issue Dec 10, 2024 · 0 comments
Open

vLLM ZeroDivisionError #3373

mfahadyousaf opened this issue Dec 10, 2024 · 0 comments

Comments

@mfahadyousaf
Copy link

🐛 Describe the bug

When trying to follow the guide on running vLLM with docker, a ZeroDivisionError is raised.

The reason for this error is because the vLLM launcher file is using torch.cuda.is_available instead of torch.cuda.is_available() as a function, which causes the if condition to always evaluates as true even when the statement is false.

Error logs

2024-12-10T13:37:00,743 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-12-10T13:37:00,743 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 301, in
2024-12-10T13:37:00,743 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - worker.run_server()
2024-12-10T13:37:00,744 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 266, in run_server
2024-12-10T13:37:00,747 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.handle_connection_async(cl_socket)
2024-12-10T13:37:00,745 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.AsyncWorkerThread - 9000 Worker disconnected. WORKER_STARTED
2024-12-10T13:37:00,748 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 220, in handle_connection_async
2024-12-10T13:37:00,749 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service, result, code = self.load_model(msg)
2024-12-10T13:37:00,750 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_service_worker.py", line 133, in load_model
2024-12-10T13:37:00,750 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service = model_loader.load(
2024-12-10T13:37:00,750 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/model_loader.py", line 143, in load
2024-12-10T13:37:00,751 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - initialize_fn(service.context)
2024-12-10T13:37:00,751 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/ts/torch_handler/vllm_handler.py", line 47, in initialize
2024-12-10T13:37:00,752 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.vllm_engine = AsyncLLMEngine.from_engine_args(vllm_engine_config)
2024-12-10T13:37:00,753 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 568, in from_engine_args
2024-12-10T13:37:00,754 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - engine_config = engine_args.create_engine_config()
2024-12-10T13:37:00,755 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/engine/arg_utils.py", line 1030, in create_engine_config
2024-12-10T13:37:00,755 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - return EngineConfig(
2024-12-10T13:37:00,756 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "", line 14, in init
2024-12-10T13:37:00,757 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/config.py", line 1872, in post_init
2024-12-10T13:37:00,757 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.model_config.verify_with_parallel_config(self.parallel_config)
2024-12-10T13:37:00,758 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/home/venv/lib/python3.9/site-packages/vllm/config.py", line 407, in verify_with_parallel_config
2024-12-10T13:37:00,758 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - if total_num_attention_heads % tensor_parallel_size != 0:
2024-12-10T13:37:00,759 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - ZeroDivisionError: integer division or modulo by zero
2024-12-10T13:37:00,749 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.AsyncWorkerThread - Failed to send request to backend

Installation instructions

Yes I am using Docker follow the instructions mentioned here

Instead of the original ** model_id**, I am using mistralai/Pixtral-12B-2409

Command: docker run --rm -ti --shm-size 10g -e HUGGING_FACE_HUB_TOKEN=$token -p 8089:8080 -v data:/data ts/vllm --model_id mistralai/Pixtral-12B-2409 --disable_token_auth

Model Packaging

https://github.com/pytorch/serve/tree/master?tab=readme-ov-file#-quick-start-llm-deployment-with-docker

config.properties

No response

Versions

Python version: 3.9 (64-bit runtime)
Python executable: /home/venv/bin/python

Versions of relevant python libraries:
captum==0.6.0
numpy==1.26.4
nvgpu==0.10.0
pillow==10.3.0
psutil==5.9.8
requests==2.32.3
sentencepiece==0.2.0
torch==2.4.0+cu121
torch-model-archiver-nightly==2024.10.15
torch-workflow-archiver-nightly==2024.10.15
torchaudio==2.4.0+cu121
torchserve-nightly==2024.10.15
torchvision==0.19.0+cu121
transformers==4.47.0
wheel==0.42.0
torch==2.4.0+cu121
**Warning: torchtext not present ..
torchvision==0.19.0+cu121
torchaudio==2.4.0+cu121

Java Version:

OS: Ubuntu 20.04.6 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: N/A
CMake version: N/A

Environment:
library_path (LD_/DYLD_): /usr/local/nvidia/lib:/usr/local/nvidia/lib64

Repro instructions

git clone https://github.com/pytorch/serve.git

docker build --pull . -f docker/Dockerfile.vllm -t ts/vllm

docker run --rm -ti --shm-size 10g -e HUGGING_FACE_HUB_TOKEN=$token -p 8089:8080 -v data:/data ts/vllm --model_id mistralai/Pixtral-12B-2409 --disable_token_auth

Possible Solution

Instead of using torch.cuda.is_available at Line 83 and Line 93, it should be torch.cuda.is_available()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant