Low Throughput and High Latency with TorchServe Deployment on AWS #3359

dummyuser-123 · 2024-11-06T10:49:57Z

📚 The doc issue

I have created a custom handler for my image-to-image translation Stable Diffusion project, containerized it using Docker, and deployed it on an g6.xlarge instance on AWS. Currently, I am experiencing low throughput and high latency issues. I am testing the TorchServe API by sending 20 requests per minute from two devices using threading (a total of 40 requests per minute).

Device 1:

Sr. No	Image Resolution	Time (sec)
0	564x705	42
1	465x710	47
2	848x484	48
3	564x705	60
4	848x484	75
5	848x484	76
6	465x710	83
7	465x710	96
9	565x848	100
8	848x484	116
10	848x484	120
11	465x710	127
12	465x710	137
13	848x484	145
14	563x788	149
15	465x710	160
16	465x710	173
17	564x705	173
18	563x788	178
19	565x848	181

Device 2:

Sr. No	Image Resolution	Time (sec)
0	563x788	41
3	563x788	45
1	563x788	59
2	848x484	56
4	465x710	96
5	465x710	93
7	564x705	94
6	564x705	98
8	848x484	91
9	848x484	115
10	564x705	112
15	565x848	140
14	563x788	143
12	564x705	152
13	565x848	149
11	848x484	155
17	564x705	158
18	848x484	155
16	564x705	161
19	465x710	156

Config.properties:

inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8083
enable_envvars_config=true
install_py_dep_per_model=true
load_models=stable-diffusion.mar
model_store=/home/model-server/model-store

models={
    "stable-diffusion": {
        "1.0": {
            "defaultVersion": true,
            "marName": "stable-diffusion.mar",
            "minWorkers": 3,
            "maxWorkers": 4,
            "batchSize": 5,
            "maxBatchDelay": 3000,
            "responseTimeout": 180
        }
    }
}

If there are 3 workers and the batch size is 5, how will the workers handle the batch? Will the 5 requests (1 batch) be divided among the 3 workers, or will each of the 3 workers handle 5 requests (1 batch) individually?
Will including number_of_netty_threads and netty_client_threads in the config.properties significantly improve throughput and latency?

Due to budget constraints, I can only afford one GPU instance. How can I optimize this setup to achieve better latency and throughput ? If anyone could give some suggestion then it would be an great help for me.

Suggest a potential alternative/fix

No response

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low Throughput and High Latency with TorchServe Deployment on AWS #3359

Low Throughput and High Latency with TorchServe Deployment on AWS #3359

dummyuser-123 commented Nov 6, 2024

Low Throughput and High Latency with TorchServe Deployment on AWS #3359

Low Throughput and High Latency with TorchServe Deployment on AWS #3359

Comments

dummyuser-123 commented Nov 6, 2024

📚 The doc issue

Suggest a potential alternative/fix