You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some questions related to model parameters:
I know there is no autoscaling in Torchserve, and looking at code, models will scale minWorkers number of workers on startup. maxWorkers seems to be only used when downscaling a model, meaning if currentWorkers > maxWorkers, it will kill currentWorkers - maxWorkers workers (WorkloadManager.java:151). Given that we'll only scale/downscale number of workers on scaleWorkers API call, is there any practical use case of setting minWorkers != maxWorkers? For example in examples/cloud_storage_stream_inference/config.propertiesminWorkers is set to 10 and maxWorkers to 1000, when do we want that?
In docs/getting_started.md it reads: If you specify model(s) when you run TorchServe, it automatically scales backend workers to the number equal to available vCPUs (if you run on a CPU instance) or to the number of available GPUs (if you run on a GPU instance).. I can't find any evidence of this behavior in the code, could somebody clarify how if this statement is true and how does it work?
Thank you!
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered:
📚 The doc issue
I have some questions related to model parameters:
minWorkers
number of workers on startup.maxWorkers
seems to be only used when downscaling a model, meaning ifcurrentWorkers > maxWorkers
, it will killcurrentWorkers - maxWorkers
workers (WorkloadManager.java:151
). Given that we'll only scale/downscale number of workers onscaleWorkers
API call, is there any practical use case of settingminWorkers
!=maxWorkers
? For example inexamples/cloud_storage_stream_inference/config.properties
minWorkers
is set to 10 andmaxWorkers
to 1000, when do we want that?docs/getting_started.md
it reads:If you specify model(s) when you run TorchServe, it automatically scales backend workers to the number equal to available vCPUs (if you run on a CPU instance) or to the number of available GPUs (if you run on a GPU instance).
. I can't find any evidence of this behavior in the code, could somebody clarify how if this statement is true and how does it work?Thank you!
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered: