Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on minWorkers and maxWorkers parameters #3339

Open
krzwaraksa opened this issue Oct 3, 2024 · 0 comments
Open

Clarification on minWorkers and maxWorkers parameters #3339

krzwaraksa opened this issue Oct 3, 2024 · 0 comments

Comments

@krzwaraksa
Copy link

📚 The doc issue

I have some questions related to model parameters:

  1. I know there is no autoscaling in Torchserve, and looking at code, models will scale minWorkers number of workers on startup. maxWorkers seems to be only used when downscaling a model, meaning if currentWorkers > maxWorkers, it will kill currentWorkers - maxWorkers workers (WorkloadManager.java:151). Given that we'll only scale/downscale number of workers on scaleWorkers API call, is there any practical use case of setting minWorkers != maxWorkers? For example in examples/cloud_storage_stream_inference/config.properties minWorkers is set to 10 and maxWorkers to 1000, when do we want that?
  2. In docs/getting_started.md it reads: If you specify model(s) when you run TorchServe, it automatically scales backend workers to the number equal to available vCPUs (if you run on a CPU instance) or to the number of available GPUs (if you run on a GPU instance).. I can't find any evidence of this behavior in the code, could somebody clarify how if this statement is true and how does it work?

Thank you!

Suggest a potential alternative/fix

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant