Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: reorder upload queue when possible #10218

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Dec 20, 2024

Problem

The upload queue currently sees significant head-of-line blocking. For example, index uploads act as upload barriers, and for every layer flush we schedule a layer and index upload, which effectively serializes layer uploads.

Requires #10227.
Requires #10228.
Resolves #10096.

Summary of changes

Allow upload queue operations to bypass the queue if they don't conflict with preceding operations.

upload_queue.num_inprogress_deletions == upload_queue.inprogress_tasks.len()
}
/// TODO: consider moving this and other associated logic into UploadOp and UploadQueue.
fn can_bypass(a: &UploadOp, b: &UploadOp) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's unit test this against the most important invariants:

  • A layer upload must happen before an index that references it
  • A layer deletion must happen after an index that de-references it
  • If a layer name that is re-used, the second upload must come after an index that de-references the earlier layer of the same name
  • ...whichever others we can think of

Copy link

github-actions bot commented Dec 20, 2024

7095 tests run: 6793 passed, 5 failed, 297 skipped (full report)


Failures on Postgres 17

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_sharding_split_compaction[debug-pg17-compact-shard-ancestors-persistent] or test_sharding_split_compaction[debug-pg17-None] or test_timeline_archival_chaos[release-pg17] or test_timeline_archival_chaos[release-pg17] or test_timeline_archival_chaos[release-pg17]"
Flaky tests (5)

Postgres 17

Postgres 14

  • test_physical_replication_config_mismatch_too_many_known_xids: release-arm64

Test coverage report is not available

The comment gets automatically updated with the latest test results
7c3ae0e at 2024-12-23T13:08:56.921Z :recycle:

@skyzh skyzh self-requested a review December 20, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: improve flush upload queue parallelism
2 participants