Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model & Instance scaling #3358

Open
markcNewell opened this issue Nov 5, 2024 · 0 comments
Open

Model & Instance scaling #3358

markcNewell opened this issue Nov 5, 2024 · 0 comments

Comments

@markcNewell
Copy link

📚 The doc issue

Hi, I have some questions regarding model scaling.

We're currently running one torchserve instance on a kubernetes cluster for a bunch of models that come under a variety of loads throughout the day at different times. After a bit of digging I've worked out that there is no auto scaling feature in torchserve (which is a bit misleading with the minWorkers and maxWorkers). Therefore, it seems our only solution would be to have horizontal scaling on kubernetes documented here https://github.com/pytorch/serve/blob/master/kubernetes/autoscale.md. However, as our models have a varying degree of load at any one time we don't really want to be scaling them all at the same time.

  1. Are we doing something fundamentally wrong with our setup? Maybe one torchserve instance for each "group" of models
  2. If our setup isn't flawed, would it be worth creating a sidecar container or separate application which monitors the queue time for each model and scales them up or down using the management API?

Thanks

Suggest a potential alternative/fix

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant