-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM in GroupedHashAggregateStream::group_aggregate_batch()
#13831
Comments
This is a good find. Given your description it sounds like the storage for the group values is what is taking the memory group by truncated_time, k8s_deployment_name, message The aggregate operator does account for the memory stored in the groups here: However, I believe the memory accounting is only updated after processing an entire batch of values. So for example, if you are using a batch of 8000 rows and each row has values 8k, that means at least 256 MB will be allocated (and since your query has 3 columns that may be even higher) I can think of two possible solutuions:
|
I think part of the issue is that we are accounting for the memory used by the actual values but the underlying |
n/m it looks like with |
I believe that won't change the allocation behavior of I think we have to:
|
FYI it looks like |
Yes, that will work for the issue, but AFAIK it will lead to |
This might be another option: https://crates.io/crates/fallible_collections |
I think the the std |
I have confirmed the fix suggested in the issue works:
yields:
|
The "right" solution is probably the kind of thing that @Rachelint proposed in this PR: (not use a single large allocation, but manage the growth in chunks) However that was a pretty serious amount of work. It might now be tractable to begin to contemplate again |
I have some new ideas about this epic. But still have no bandwidth to continue pushing it this month due to the busy work in the employer's side... |
THank you @Rachelint -- I can't wait to see what you come up with. Good luck with your busy month! |
Describe the bug
When attempting to accumulate large text fields with a
group by
, it was observed thatgroup_aggregate_batch()
can OOM despite ostensibly using theMemoryPool
.Query:
On 8x ~50MB parquet files where the
message
column can be up to 8192 byte strings. When profiled, by far it was the largest use of memory:When logging, we can see it fails while interning
To Reproduce
2.set
ulimit -v 1152000
Expected behavior
group_aggregate_batch()
doesn't make the assumption:But instead realizes that adding 1 row to a million doesn't allocate 1,000,001, but rather 2,000,000 when the
Vec
exponentially resizes.Additional context
Proposed solution:
Add
Above
The text was updated successfully, but these errors were encountered: