You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have created a custom handler for my image-to-image translation Stable Diffusion project, containerized it using Docker, and deployed it on an g6.xlarge instance on AWS. Currently, I am experiencing low throughput and high latency issues. I am testing the TorchServe API by sending 20 requests per minute from two devices using threading (a total of 40 requests per minute).
If there are 3 workers and the batch size is 5, how will the workers handle the batch? Will the 5 requests (1 batch) be divided among the 3 workers, or will each of the 3 workers handle 5 requests (1 batch) individually?
Will including number_of_netty_threads and netty_client_threads in the config.properties significantly improve throughput and latency?
Due to budget constraints, I can only afford one GPU instance. How can I optimize this setup to achieve better latency and throughput ? If anyone could give some suggestion then it would be an great help for me.
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered:
📚 The doc issue
I have created a custom handler for my image-to-image translation Stable Diffusion project, containerized it using Docker, and deployed it on an g6.xlarge instance on AWS. Currently, I am experiencing low throughput and high latency issues. I am testing the TorchServe API by sending 20 requests per minute from two devices using threading (a total of 40 requests per minute).
Device 1:
Device 2:
Config.properties:
If there are 3 workers and the batch size is 5, how will the workers handle the batch? Will the 5 requests (1 batch) be divided among the 3 workers, or will each of the 3 workers handle 5 requests (1 batch) individually?
Will including number_of_netty_threads and netty_client_threads in the config.properties significantly improve throughput and latency?
Due to budget constraints, I can only afford one GPU instance. How can I optimize this setup to achieve better latency and throughput ? If anyone could give some suggestion then it would be an great help for me.
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered: