You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I have a model which takes fairly long to load (40s). When I run some constant traffic against the endpoint and then scale in another instance, I see a short spike of errors. From logging timestamps I could conclude that these errors happened before the model loading completed.
Describe the bug
I have a model which takes fairly long to load (40s). When I run some constant traffic against the endpoint and then scale in another instance, I see a short spike of errors. From logging timestamps I could conclude that these errors happened before the model loading completed.
I found that, on startup of the model_server there is a fixed 1s wait period (https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/model_server.py#L266) and afterwards we just check if there is a matching process and return this without double checking if the model is actually loaded.
To reproduce
Expected behavior
No errors spikes on scaling events and waiting til the model is fully loaded.
System information
Additional context
Is there a parameter to control this initial loading time of the model which I might have missed?
The text was updated successfully, but these errors were encountered: