Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throughput increase non-linearly with number of workers #3338

Open
vandesa003 opened this issue Oct 3, 2024 · 0 comments
Open

throughput increase non-linearly with number of workers #3338

vandesa003 opened this issue Oct 3, 2024 · 0 comments

Comments

@vandesa003
Copy link

🐛 Describe the bug

I am hosting a bert-like model using below torchserve config.

inference_address=http://localhost:8080
management_address=http://localhost:8081
metrics_address=http://localhost:8082
load_models=model_name=weights.mar
async_logging=true
job_queue_size=200

models={ "model_name": {  "1.0": { "minWorkers": 8 , "batchSize": 8 , "maxBatchDelay": 10  }  }  }

I have 8 GPUs, this setting will give me 1 worker per gpu.

then I did load test with both k6 and locust, and below shows the relationship between number of workers(from 1 to 8) and throughput.
output

As can be seen in the chart, gpu usage is dropping when number of workers increased, so it feels like the load balancer in torchserve leads to the inefficiency. Anyone can give me some clues how can I improve the throughput further?

Error logs

throughput increase non-linearly with number of workers

Installation instructions

torchserve = "^0.10.0"

Model Packaging

torchserve = "0.10.0"

config.properties

inference_address=http://localhost:8080
management_address=http://localhost:8081
metrics_address=http://localhost:8082
load_models=model_name=weights.mar
async_logging=true
job_queue_size=200

models={ "model_name": { "1.0": { "minWorkers": 8 , "batchSize": 8 , "maxBatchDelay": 10 } } }

Versions

$ python serve/ts_scripts/print_env_info.py

Environment headers

Torchserve branch:

torchserve==0.10.0
torch-model-archiver==0.11.0

Python version: 3.11 (64-bit runtime)
Python executable: /home/me/.cache/pypoetry/virtualenvs/pre-deploy-j4GApv9r-py3.11/bin/python

Versions of relevant python libraries:
numpy==1.24.3
nvgpu==0.10.0
pillow==10.4.0
psutil==6.0.0
requests==2.32.3
torch==2.3.1+cu121
torch-model-archiver==0.11.0
torch_tensorrt==2.3.0+cu121
torchserve==0.10.0
torchvision==0.18.1
transformers==4.44.2
wheel==0.44.0
torch==2.3.1+cu121
**Warning: torchtext not present ..
torchvision==0.18.1
**Warning: torchaudio not present ..

Java Version:

OS: Debian GNU/Linux 12 (bookworm)
GCC version: (Debian 12.2.0-14) 12.2.0
Clang version: 14.0.6
CMake version: version 3.25.1

Is CUDA available: Yes
CUDA runtime version: N/A
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-80GB
GPU 1: NVIDIA A100-SXM4-80GB
GPU 2: NVIDIA A100-SXM4-80GB
GPU 3: NVIDIA A100-SXM4-80GB
GPU 4: NVIDIA A100-SXM4-80GB
GPU 5: NVIDIA A100-SXM4-80GB
GPU 6: NVIDIA A100-SXM4-80GB
GPU 7: NVIDIA A100-SXM4-80GB
Nvidia driver version: 550.54.15
cuDNN version: None

Environment:
library_path (LD_/DYLD_):

Repro instructions

wget http://mar_file.mar
torch-model-archiver ...
torchserve --start

Possible Solution

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant