Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: Prune topN schema for avoiding no-need column exchange #58500

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

EricZequan
Copy link
Contributor

@EricZequan EricZequan commented Dec 24, 2024

What problem does this PR solve?

Issue Number: ref #54245

Problem Summary:

What is changed and how it works?

Prune the TopN's schema to avoid no-needs column transmission by implementing inline projection for TopN.

The implementation is very similar to #19900

To learn more about inline projection, see: #14428


First, we modified the topN structure logicalSchemaProducer so that TopN can have its own schema information. In pkg/planner/core/task.go:(PhysicalTopN) getPushedDownTopN, we prune the schema of TopN with the columns required by the upper-level plan to avoid passing unnecessary columns.

Then, in the executor part, we added columnIdxsUsedByChild to TopN to mark the columns that actually need to be processed. In the Next method, the required column results are recorded in req according to this mark.

VectorDBBench test result :
QPS improvement: 1M-768D--14%, 500K-1536D--30%

before:
1M-768D---"results": [{"metrics": {"max_load_count": 0, "load_duration": 0.0, "qps": 72.7114, "serial_latency_p99": 0.1342, "recall": 0.9012}
500K-1536D---"results": [{"metrics": {"max_load_count": 0, "load_duration": 0.0, "qps": 61.2001, "serial_latency_p99": 0.1865, "recall": 0.9335}

after:
1M-768D---"results": [{"metrics": {"max_load_count": 0, "load_duration": 0.0, "qps": 82.7748, "serial_latency_p99": 0.1297, "recall": 0.8865}
500K-1536D---"results": [{"metrics": {"max_load_count": 0, "load_duration": 0.0, "qps": 79.219, "serial_latency_p99": 0.1503, "recall": 0.9171}

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: “EricZequan” <[email protected]>
Copy link

ti-chi-bot bot commented Dec 24, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rustin170506 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Dec 24, 2024
Copy link

tiprow bot commented Dec 24, 2024

Hi @EricZequan. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 24, 2024
Signed-off-by: “EricZequan” <[email protected]>
Comment on lines 920 to 925
// └─Byitem: vec_distance(vec, '[1,2,3]')
// └─Schema: id, vec
//
// New: DataSource(id, vec) -> Projection(id, vec->dis) -> TopN(by dis) -> Projection(id)
// └─Byitem: dis
// └─Schema: id, dis
//
// Note that for plan now, TopN has its own schema and does not use the schema of children.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@breezewish PTAL~

@@ -413,3 +411,36 @@ func TestVectorSearchWithPKForceTiKV(t *testing.T) {
require.Equal(t, output[i].Warn, testdata.ConvertSQLWarnToStrings(tk.Session().GetSessionVars().StmtCtx.GetWarnings()))
}
}

func TestVectorSearchHeavyFunction(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@breezewish PTAL~

Signed-off-by: “EricZequan” <[email protected]>
@EricZequan
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Dec 24, 2024

@EricZequan: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: “EricZequan” <[email protected]>
@breezewish
Copy link
Member

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Dec 24, 2024
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Copy link

codecov bot commented Dec 25, 2024

Codecov Report

Attention: Patch coverage is 57.00935% with 46 lines in your changes missing coverage. Please review.

Project coverage is 73.5905%. Comparing base (f2db9c4) to head (031b875).
Report is 19 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #58500        +/-   ##
================================================
+ Coverage   73.5476%   73.5905%   +0.0429%     
================================================
  Files          1681       1681                
  Lines        464295     467670      +3375     
================================================
+ Hits         341478     344161      +2683     
- Misses       102000     102661       +661     
- Partials      20817      20848        +31     
Flag Coverage Δ
integration 42.8625% <57.0093%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.6910% <ø> (ø)
parser ∅ <ø> (∅)
br 45.4612% <ø> (-0.3174%) ⬇️

Signed-off-by: “EricZequan” <[email protected]>
@EricZequan
Copy link
Contributor Author

/test unit-test

Copy link

tiprow bot commented Dec 25, 2024

@EricZequan: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test fast_test_tiprow
  • /test fast_test_tiprow_ddlargsv1
  • /test tidb_parser_test

Use /test all to run the following jobs that were automatically triggered:

  • fast_test_tiprow
  • tidb_parser_test

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Copy link

tiprow bot commented Dec 25, 2024

@EricZequan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow 031b875 link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

ti-chi-bot bot commented Dec 25, 2024

@EricZequan: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/mysql-test 031b875 link true /test mysql-test
idc-jenkins-ci-tidb/unit-test 031b875 link true /test unit-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants