-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace processes when a model is unloaded #36
Conversation
4badeec
to
79284b5
Compare
I have some bugs to solve with this that I discovered from an overnight run of my workers, the hung worker replacement isn't working correctly. |
On Linux, it seems like there is a tremendous amount of memory allocated by something outside Python when you allow one worker to process many jobs on many different models. In order to limit the damage from that behavior, we'll try to replace the processes when the model is scheduled to be unloaded. I realize this probably makes it a _little_ harder to decouple a process from a model, but it's a huge stability improvement. This also switches the model management strategy a tiny bit by allocating a model to every open worker before trying to unload a model. Previously, you would have N processes = `threads` + `queue`, but jobs were very likely to be scheduled on only the `threads` number of workers.
In order to clean up jobs in progress we need to know where the job went in the event that we kill a process.
This still isn't quite right, because I've seen BrokenPipeError get tossed in other spots, like just starting inference normally. |
Let me know when you feel this branch is feature stable and I'll work on promoting it to main when you feel ready. |
Given that we are still in a somewhat limited beta, and I have seen these changes work locally for me for extended periods of time, I am going to merge this into main as it is generally more efficient and overall a net improvement. Further, it may address some problems that people are somewhat frequently encountering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is in the general spirit of the contribution requirements of this project and furthers the goal of worker stability. I have been able to run this branch (re-based on the current changes on main) for an extended period of time. Considering we are in a beta phase, I feel this is stable enough to be on main.
On Linux, it seems like there is a tremendous amount of memory allocated by something outside Python when you allow one worker to process many jobs on many different models. In order to limit the damage from that behavior, we'll try to replace the processes when the model is scheduled to be unloaded.
I realize this probably makes it a little harder to decouple a process from a model, but it's a huge stability improvement.
This also switches the model management strategy a tiny bit by allocating a model to every open worker before trying to unload a model. Previously, you would have
N processes =
threads
+queue
, but jobs were very likely to be scheduled on only thethreads
number of workers.