-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement maintains_input_order for AggregateExec #13897
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend some tests if possible to avoid breakages during future refactorings
If we add a unit test with an unnecessary |
Unless some downstream rules explicitly check the
It wouldn't remove it, since the current upstream rules (like EnforceSorting) are using EquivalenceProperties::ordering_satisfy() API, and that one is consulting the AggregateExec output ordering (or related property cache of any operator), not the let projection_mapping =
ProjectionMapping::try_new(&group_by.expr, &input.schema())?;
...
let cache = Self::compute_properties(
&input,
Arc::clone(&schema),
&projection_mapping,
&mode,
&input_order_mode,
); pub fn compute_properties(
input: &Arc<dyn ExecutionPlan>,
schema: SchemaRef,
projection_mapping: &ProjectionMapping,
mode: &AggregateMode,
input_order_mode: &InputOrderMode,
) -> PlanProperties {
// Construct equivalence properties:
let eq_properties = input
.equivalence_properties()
.project(projection_mapping, schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK then, since the change is trivial, let's get this in and then start thinking of a "real-world" plan and rule that would break if this flag was somehow removed.
One problem is it's not obvious why it works, though it's only a one-line function 🤔
Maybe we can mark or comment something like 'experimental' to this method, if it's not tested? |
AFAIK the flag is valid in all cases. What do you mean by topK aggregate?
Let's add a TODO to remind us to add a test exercising this |
@2010YOUY01 does this relieve your concerns? |
I'm just worried we will miss some corner cases, since whether to maintain input order for AggregateExec is implementation dependent, so I think more tests are necessary. datafusion/datafusion/physical-plan/src/aggregates/mod.rs Lines 588 to 594 in 30660e0
However, since it is not used by the internal optimizer and there is a TODO for testing, it looks good to me |
Could you detail how only the first group key's ordering is maintained while others are invalidated? If that's the case, the output ordering calculation for AggregateExec might also expose some bugs. |
@2010YOUY01 I think a concrete example of when input ordering is not preserved even though input order mode is not linear would be very helpful |
Which issue does this PR close?
None
Rationale for this change
maintains_input_order
helps with sort pushdown optimization. As explained inInputOrderMode
documentation, given an ordering [a, b] and a grouping [b] (Linear
mode), [a, b] will not be satisfied anymore. However,Sorted
andPartiallySorted
modes maintain the input order in aggregation.What changes are included in this PR?
maintains_input_order
forAggregateExec
Are these changes tested?
No, but I'm open to suggestions. I checked
BoundedWindowAggExec
for inspiration but it seems this behavior is tested in SQL logic tests. Sort pushdown is implemented for window operations, but not for aggregation. Therefore, I cannot use a similar test as of now, but it can be applied if/when sort pushdown is implemented forAggregationExec
Are there any user-facing changes?
None