Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace processes when a model is unloaded #36

Merged
merged 8 commits into from
Dec 14, 2023
Merged

Conversation

zten
Copy link
Contributor

@zten zten commented Nov 27, 2023

On Linux, it seems like there is a tremendous amount of memory allocated by something outside Python when you allow one worker to process many jobs on many different models. In order to limit the damage from that behavior, we'll try to replace the processes when the model is scheduled to be unloaded.

I realize this probably makes it a little harder to decouple a process from a model, but it's a huge stability improvement.

This also switches the model management strategy a tiny bit by allocating a model to every open worker before trying to unload a model. Previously, you would have
N processes = threads + queue, but jobs were very likely to be scheduled on only the threads number of workers.

@zten zten force-pushed the lru-cache branch 3 times, most recently from 4badeec to 79284b5 Compare November 27, 2023 03:28
@zten
Copy link
Contributor Author

zten commented Nov 27, 2023

I have some bugs to solve with this that I discovered from an overnight run of my workers, the hung worker replacement isn't working correctly.

@db0 db0 requested a review from tazlin December 8, 2023 13:14
On Linux, it seems like there is a tremendous amount of memory
allocated by something outside Python when you allow one
worker to process many jobs on many different models. In order
to limit the damage from that behavior, we'll try to replace
the processes when the model is scheduled to be unloaded.

I realize this probably makes it a _little_ harder to decouple
a process from a model, but it's a huge stability improvement.

This also switches the model management strategy a tiny bit
by allocating a model to every open worker before trying
to unload a model. Previously, you would have
N processes = `threads` + `queue`, but jobs were very likely
to be scheduled on only the `threads` number of workers.
In order to clean up jobs in progress we need to know
where the job went in the event that we kill a process.
@zten
Copy link
Contributor Author

zten commented Dec 11, 2023

This still isn't quite right, because I've seen BrokenPipeError get tossed in other spots, like just starting inference normally.

@tazlin
Copy link
Member

tazlin commented Dec 12, 2023

Let me know when you feel this branch is feature stable and I'll work on promoting it to main when you feel ready.

@tazlin
Copy link
Member

tazlin commented Dec 14, 2023

Given that we are still in a somewhat limited beta, and I have seen these changes work locally for me for extended periods of time, I am going to merge this into main as it is generally more efficient and overall a net improvement. Further, it may address some problems that people are somewhat frequently encountering.

Copy link
Member

@tazlin tazlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is in the general spirit of the contribution requirements of this project and furthers the goal of worker stability. I have been able to run this branch (re-based on the current changes on main) for an extended period of time. Considering we are in a beta phase, I feel this is stable enough to be on main.

@tazlin tazlin merged commit fcb6a50 into Haidra-Org:main Dec 14, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants