Fix CAgg migration performance regression #7517

fabriziomello · 2024-12-04T18:43:11Z

When creating the migration plan we need to figure out the boundaries (min and max) of the primary dimension to split the COPY data into multiple transactions to don't exauste the instance resources.

The problem is we're reading the boundaries direct from the CAgg view and when the realtime is enabled it lead to big performance penalty (this was one of the reasons for the new format).

Also in the COPY DATA migration plan step we changed the batch size used that previously was based on the materialization hypertable partition range and now will be the Continuous Aggregate bucked size multiplied by 10 (ten).

Fixed it by querying direct the materialization hypertable instead.

Also in ef2cfe3 we introduced a problem that now we're not explicitly updating the watermark but instead warning the user to manually execute the refresh procedure to update it, but without update the watermark the query performance for realtime Continuous Aggregate can be awful so we introduced a small procedure to update the watermark for the new migrated cagg during the migration.

Disable-check: force-changelog-file

codecov · 2024-12-04T18:54:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.22%. Comparing base (59f50f2) to head (04da963).
Report is 666 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7517      +/-   ##
==========================================
+ Coverage   80.06%   82.22%   +2.15%     
==========================================
  Files         190      231      +41     
  Lines       37181    43373    +6192     
  Branches     9450    10909    +1459     
==========================================
+ Hits        29770    35663    +5893     
- Misses       2997     3383     +386     
+ Partials     4414     4327      -87

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mkindahl · 2024-12-13T09:53:03Z

There is no reason this change should not go in the release notes, so we should add a changelog entry.

In addition, this is a non-trivial change, so we should probably stick to two reviewers.

mkindahl

LGTM

When creating the migration plan we need to figure out the boundaries (min and max) of the primary dimension to split the COPY data into multiple transactions to don't exauste the instance resources. The problem is we're reading the boundaries direct from the CAgg view and when the realtime is enabled it lead to big performance penalty (this was one of the reasons for the new format). Also in the `COPY DATA` migration plan step we changed the batch size used that previously was based on the materialization hypertable partition range and now will be the Continuous Aggregate bucked size multiplied by 10 (ten). Fixed it by querying direct the materialization hypertable instead. Also in ef2cfe3 we introduced a problem that now we're not explicitly updating the watermark but instead warning the user to manually execute the refresh procedure to update it, but without update the watermark the query performance for realtime Continuous Aggregate can be awful so we introduced a small procedure to update the watermark for the new migrated cagg during the migration.

fabriziomello added performance continuous_aggregate migration force-auto-backport Automatically backport this PR or fix of this issue, even if it's not marked as "bug" labels Dec 4, 2024

fabriziomello self-assigned this Dec 4, 2024

fabriziomello force-pushed the cagg_migrate_fix_performance_regression branch 12 times, most recently from 7035bac to ffedd0a Compare December 11, 2024 14:02

mkindahl approved these changes Dec 13, 2024

View reviewed changes

fabriziomello force-pushed the cagg_migrate_fix_performance_regression branch 2 times, most recently from 7ab9690 to 6da1365 Compare December 17, 2024 13:23

fabriziomello force-pushed the cagg_migrate_fix_performance_regression branch from 6da1365 to 04da963 Compare December 23, 2024 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CAgg migration performance regression #7517

Fix CAgg migration performance regression #7517

fabriziomello commented Dec 4, 2024 •

edited

Loading

codecov bot commented Dec 4, 2024 •

edited

Loading

mkindahl commented Dec 13, 2024

mkindahl left a comment

Fix CAgg migration performance regression #7517

Are you sure you want to change the base?

Fix CAgg migration performance regression #7517

Conversation

fabriziomello commented Dec 4, 2024 • edited Loading

codecov bot commented Dec 4, 2024 • edited Loading

Codecov Report

mkindahl commented Dec 13, 2024

mkindahl left a comment

Choose a reason for hiding this comment

fabriziomello commented Dec 4, 2024 •

edited

Loading

codecov bot commented Dec 4, 2024 •

edited

Loading