You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, TorchServe's batch inference is handled by looping through the requests and feeding them individually to the user-defined transform function (#108). However, this doesn't take full advantage of GPU's parallelism and compute power, thus yielding slower endpoints with low resource usage.
On the other hand, TorchServe's documentation on batch inference, shows an example where the developer handles this logic and feeds the entire input batch to the model.
For my use case, this is highly desirable to increase the throughput of the model.
How would this feature be used? Please describe.
Provide batch_transform_fn functions. If a user wants to customize the default batch logic, they can provide functions batch_input_fn, batch_predict_fn and batch_output_fn where they are given the entire batch of requests as input.
Describe alternatives you've considered
I haven't found an alternative to achieve this functionality using the sagemaker-pytorch-inference-toolkit, so I'm writing a custom Dockerfile that just uses torchserve.
The text was updated successfully, but these errors were encountered:
Describe the feature you'd like
Currently, TorchServe's batch inference is handled by looping through the requests and feeding them individually to the user-defined transform function (#108). However, this doesn't take full advantage of GPU's parallelism and compute power, thus yielding slower endpoints with low resource usage.
On the other hand, TorchServe's documentation on batch inference, shows an example where the developer handles this logic and feeds the entire input batch to the model.
For my use case, this is highly desirable to increase the throughput of the model.
How would this feature be used? Please describe.
Provide
batch_transform_fn
functions. If a user wants to customize the default batch logic, they can provide functionsbatch_input_fn
,batch_predict_fn
andbatch_output_fn
where they are given the entire batch of requests as input.Describe alternatives you've considered
I haven't found an alternative to achieve this functionality using the sagemaker-pytorch-inference-toolkit, so I'm writing a custom Dockerfile that just uses torchserve.
The text was updated successfully, but these errors were encountered: