Clarification on minWorkers and maxWorkers parameters #3339

krzwaraksa · 2024-10-03T13:07:00Z

I have some questions related to model parameters:

I know there is no autoscaling in Torchserve, and looking at code, models will scale minWorkers number of workers on startup. maxWorkers seems to be only used when downscaling a model, meaning if currentWorkers > maxWorkers, it will kill currentWorkers - maxWorkers workers (WorkloadManager.java:151). Given that we'll only scale/downscale number of workers on scaleWorkers API call, is there any practical use case of setting minWorkers != maxWorkers? For example in examples/cloud_storage_stream_inference/config.properties minWorkers is set to 10 and maxWorkers to 1000, when do we want that?
In docs/getting_started.md it reads: If you specify model(s) when you run TorchServe, it automatically scales backend workers to the number equal to available vCPUs (if you run on a CPU instance) or to the number of available GPUs (if you run on a GPU instance).. I can't find any evidence of this behavior in the code, could somebody clarify how if this statement is true and how does it work?

Thank you!

No response

The text was updated successfully, but these errors were encountered:

Provide feedback