-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cherry pick Rows out in EXPLAIN ANALYZE #670
base: main
Are you sure you want to change the base?
Conversation
* Add "Rows out" print in cdbexplain_showExecStats set gp_enable_explain_rows_out=on; explain (analyze,verbose,format text) select * from tt where a > b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=8) (actual time=2.534..223.679 rows=499500 loops=1) Output: a, b -> Seq Scan on public.tt (cost=0.00..431.00 rows=1 width=8) (actual time=0.145..127.350 rows=166742 loops=1) Output: a, b Filter: (tt.a > tt.b) Rows out: 166500.00 rows avg x 3 workers, 166742 rows max (seg0), 166340 rows min (seg2). ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hiiii, @robozmey welcome!🎊 Thanks for taking the effort to make our project better! 🙌 Keep making such awesome contributions!
@@ -277,6 +277,7 @@ int gp_hashagg_groups_per_bucket = 5; | |||
int gp_motion_slice_noop = 0; | |||
|
|||
/* Cloudberry Database Experimental Feature GUCs */ | |||
bool gp_enable_explain_rows_out = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to send a param to explain command than using a guc to control if print out the "rows out"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rows out feature created for auto_explain. If we add "rows out" as parameter of explain, then we'll need change auto_explain source, and auto_explain wont be compatible with vanilla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto_explain wont be compatible with vanilla
Hmm... cannot understand. Could you give more details? Is auto_explain aslo included in vanilla?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto_explain is included in vanilla https://github.com/greenplum-db/gpdb-archive/tree/main/contrib/auto_explain
Also GUC gp_enable_explain_rows_out is made by analogy with GUC gp_enable_explain_allstat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a Parallel Plan test(ex: cbdb_parallel.sql) to ensure the output is as expected?
src/backend/commands/explain_gp.c
Outdated
ntuples_imin); | ||
} | ||
else { | ||
// ExplainOpenGroup("Rows Out", NULL, false, es); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove unused codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
d6aa761
I'm not sure what we need Parallel Plan test, but I expanded |
In GPDB, there is one process executed the Slice on each segment. And explain use the statistic of that gang of QEs to compute the results. See CBDB style README https://github.com/apache/cloudberry/blob/11333c0b4d3a4962b0a6610ceb5b6d7a12e45ec4/src/backend/optimizer/README.cbdb.parallel for more details. This is a significant difference between CBDB and GPDB. In parallel plan cases, what do the statistic results should be? Do current codes compute results right or it's already as expected? |
Yes, agree! Looks we don't a lot things for the parallel workers. Cloudberry uses motion to gather the slice result both from workers on different segments and from the parallel worker on same segment . So the EXPLAIN ANALYZE command can still work properly to collect results from all workers. But we should confirm the EXPLAIN ANALYZE output format for the plan contains parallel workers. what should it be like?
Do we need to add the parallel worker info into this line? like
|
@fanfuxiaoran Thanks for your analysis and verification.
Sounds good, with a worker info is more reasonable of parallel plan. |
Cherry pick yezzey-gp/ygp@1d41230
Change logs
Add "Rows out" print in cdbexplain_showExecStats
Why are the changes needed?
"Rows out" is useful for auto explain
Does this PR introduce any user-facing change?
If yes, please clarify the previous behavior and the change this PR proposes.
How was this patch tested?
This feature tests in regression
gp_explain
testContributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
make installcheck
make -C src/test installcheck-cbdb-parallel
cloudberrydb/dev
team for review and approval when your PR is ready🥳