Support user-defined batch inference logic #123

dgcnz · 2023-04-26T16:43:36Z

Describe the feature you'd like

Currently, TorchServe's batch inference is handled by looping through the requests and feeding them individually to the user-defined transform function (#108). However, this doesn't take full advantage of GPU's parallelism and compute power, thus yielding slower endpoints with low resource usage.

On the other hand, TorchServe's documentation on batch inference, shows an example where the developer handles this logic and feeds the entire input batch to the model.

For my use case, this is highly desirable to increase the throughput of the model.

How would this feature be used? Please describe.

Provide batch_transform_fn functions. If a user wants to customize the default batch logic, they can provide functions batch_input_fn, batch_predict_fn and batch_output_fn where they are given the entire batch of requests as input.

Describe alternatives you've considered

I haven't found an alternative to achieve this functionality using the sagemaker-pytorch-inference-toolkit, so I'm writing a custom Dockerfile that just uses torchserve.

The text was updated successfully, but these errors were encountered:

taepd mentioned this issue Jun 4, 2023

fix: transform function to support proper batch inference #125

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support user-defined batch inference logic #123

Support user-defined batch inference logic #123

dgcnz commented Apr 26, 2023

Support user-defined batch inference logic #123

Support user-defined batch inference logic #123

Comments

dgcnz commented Apr 26, 2023