-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore improvements #3986
Draft
Michal-Leszczynski
wants to merge
15
commits into
master
Choose a base branch
from
ml/restore
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Restore improvements #3986
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Michal-Leszczynski
force-pushed
the
ml/restore
branch
5 times, most recently
from
August 28, 2024 10:49
ec2d3a1
to
c14c3dd
Compare
This decreases main function complexity. Noticed by: cognitive complexity 66 of func `(*tablesWorker).restore` is high (> 50) (gocognit).
…abled and enabled As a preparation for restoring data, SM should disable tombstone_gc and compaction. They should be re-enabled after the restore finishes.
The idea is to first index all files to be restored, so that we can create better batches. Restore workload is aggregated first by table, then by remote sstable dir. From batching we expect: - batch contains X*shard_cnt sstables - batch contains similarly sized sstables - batch is created from any manifest/table - no waiting for manifest/table restore to finish - workload across different nodes is evenly distributed - --batch-size is ignored, batches are aiming to have size equal to 5% of expected node workload New flag --table-parallel - it allows for running multiple download and l&s jobs from the same node (no documentation as it might not be exposed later on).
… and --table-parallel flags
Michal-Leszczynski
force-pushed
the
ml/restore
branch
from
September 2, 2024 12:37
01cb187
to
98da79f
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on #3956.
This PR introduces 3 big changes:
Indexing
Restore task no longer iterates over manifests, and restores all the files from given manifest before proceeding to the next one.
The problem with this approach was that it limited the pool of sstables used for creating batches and it resulted in nodes spending more time in the idle state in between restored manifests.
Now, SM indexes all of the files from all manifests at the beginning, and then creates batches from all of them, without the need to finish one manifest before another.
Batching.
Restore task no longer batches files based on
--batch-size
flag.The problem with this approach was that it didn't take shard_cnt or sstable size into consideration. It was also difficult to guess the right value for this flag without knowing backup statistics.
Now, SM creates batches containing multiple of node shard_cnt sstables. It keeps on adding the biggest (not yet restored) sstables to the batch until it reaches size of 5% of the expected node workload (total_workload/total_shard_cnt_in_the_cluster*node_shard_cnt).
Additional, testing flags.
Those flags are there just for the testing purposes. Some of them will be removed, some might stay, depending on how useful we find them:
--table-parallel
- how many tables should be restored from a single node at a single time (might be useful for many smaller tables)--stream-to-all-replicas
- runs l&s without the primary_replica_only option and skips the post restore repair (might be useful in a cluster with small RF and big amount of replica sets)--unpin-agent-cpu
- unpins agent from cpus for the time of the restore (it looks like it increases download speed to some extent)