You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have 8 GPUs, this setting will give me 1 worker per gpu.
then I did load test with both k6 and locust, and below shows the relationship between number of workers(from 1 to 8) and throughput.
As can be seen in the chart, gpu usage is dropping when number of workers increased, so it feels like the load balancer in torchserve leads to the inefficiency. Anyone can give me some clues how can I improve the throughput further?
Error logs
throughput increase non-linearly with number of workers
🐛 Describe the bug
I am hosting a bert-like model using below torchserve config.
I have 8 GPUs, this setting will give me 1 worker per gpu.
then I did load test with both k6 and locust, and below shows the relationship between number of workers(from 1 to 8) and throughput.
As can be seen in the chart, gpu usage is dropping when number of workers increased, so it feels like the load balancer in torchserve leads to the inefficiency. Anyone can give me some clues how can I improve the throughput further?
Error logs
throughput increase non-linearly with number of workers
Installation instructions
torchserve = "^0.10.0"
Model Packaging
torchserve = "0.10.0"
config.properties
inference_address=http://localhost:8080
management_address=http://localhost:8081
metrics_address=http://localhost:8082
load_models=model_name=weights.mar
async_logging=true
job_queue_size=200
models={ "model_name": { "1.0": { "minWorkers": 8 , "batchSize": 8 , "maxBatchDelay": 10 } } }
Versions
$ python serve/ts_scripts/print_env_info.py
Environment headers
Torchserve branch:
torchserve==0.10.0
torch-model-archiver==0.11.0
Python version: 3.11 (64-bit runtime)
Python executable: /home/me/.cache/pypoetry/virtualenvs/pre-deploy-j4GApv9r-py3.11/bin/python
Versions of relevant python libraries:
numpy==1.24.3
nvgpu==0.10.0
pillow==10.4.0
psutil==6.0.0
requests==2.32.3
torch==2.3.1+cu121
torch-model-archiver==0.11.0
torch_tensorrt==2.3.0+cu121
torchserve==0.10.0
torchvision==0.18.1
transformers==4.44.2
wheel==0.44.0
torch==2.3.1+cu121
**Warning: torchtext not present ..
torchvision==0.18.1
**Warning: torchaudio not present ..
Java Version:
OS: Debian GNU/Linux 12 (bookworm)
GCC version: (Debian 12.2.0-14) 12.2.0
Clang version: 14.0.6
CMake version: version 3.25.1
Is CUDA available: Yes
CUDA runtime version: N/A
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-80GB
GPU 1: NVIDIA A100-SXM4-80GB
GPU 2: NVIDIA A100-SXM4-80GB
GPU 3: NVIDIA A100-SXM4-80GB
GPU 4: NVIDIA A100-SXM4-80GB
GPU 5: NVIDIA A100-SXM4-80GB
GPU 6: NVIDIA A100-SXM4-80GB
GPU 7: NVIDIA A100-SXM4-80GB
Nvidia driver version: 550.54.15
cuDNN version: None
Environment:
library_path (LD_/DYLD_):
Repro instructions
wget http://mar_file.mar
torch-model-archiver ...
torchserve --start
Possible Solution
No response
The text was updated successfully, but these errors were encountered: