Endpoint marked ready before model is fully loaded #138

juliuscoburger · 2024-12-19T15:56:22Z

Describe the bug
I have a model which takes fairly long to load (40s). When I run some constant traffic against the endpoint and then scale in another instance, I see a short spike of errors. From logging timestamps I could conclude that these errors happened before the model loading completed.

I found that, on startup of the model_server there is a fixed 1s wait period (https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/model_server.py#L266) and afterwards we just check if there is a matching process and return this without double checking if the model is actually loaded.

To reproduce

Have running endpoint with model which takes some time to initialize.
Run some constant traffic agianst endpoint
Scale in another instance
Expected behavior
No errors spikes on scaling events and waiting til the model is fully loaded.

System information

I am using custom docker image with CPU and noticed this issue for multiple frameworks

Additional context
Is there a parameter to control this initial loading time of the model which I might have missed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Endpoint marked ready before model is fully loaded #138

Endpoint marked ready before model is fully loaded #138

juliuscoburger commented Dec 19, 2024

Endpoint marked ready before model is fully loaded #138

Endpoint marked ready before model is fully loaded #138

Comments

juliuscoburger commented Dec 19, 2024