Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight AoU docs adjustment [VS-1366] #8955

Merged
merged 4 commits into from
Aug 19, 2024
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,9 +128,7 @@ The pipeline takes in the VDS and outputs a variant annotations table in BigQuer
- For both PGEN and VCF extracts of ACAF only:
- Specify an `extract_overhead_memory_override_gib` of 5 (GiB, up from the default of 3 GiB).
- Specify a `y_bed_weight_scaling` of 8 (up from the default of 4).
- If re-running the extract workflow with call caching enabled, it may be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly!
- For Echo ACAF VCF extract, the VCF extract workflow was call-caching re-run with `ExtractTask` memory hard-coded to 100 GiB. 9/9 extract shards which did not complete on the initial run of the workflow succeeded on their first (non-preempted) attempt in the second run of the workflow.
- For Echo ACAF PGEN extract, the PGEN extract workflow was call-caching re-run with `PgenExtractTask` memory hard-coded to 50 GiB. 20/24 extract shards which did not complete on the initial run of the workflow succeeded on their first (non-preempted) attempt in the second run of the workflow. The remaining 4 shards hit `OutOfMemoryErrors` on their first attempt but succeeded on the second attempt with 50 GiB * 1.5 = 75 GiB of memory thanks to "retry with more memory".
- When re-running the extract workflow with call caching enabled, it will be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` runtime attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly! Both VCF and PGEN extracts can have their memory set to `"50 GiB"` for the call-caching re-run. Most extract shards should finish on the first re-run attempt, but a few stragglers will likely OOM and automatically re-run with more memory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Are you suggesting to edit the WDL to change the value of the memory runtime attribute? Wouldn't that break caching? I must be missing something.

mcovarr marked this conversation as resolved.
Show resolved Hide resolved
- If you want to collect the monitoring logs from a large number of `Extract` shards, the `summarize_task_monitor_logs.py` script will not work if the task is scattered too wide. Use the `summarize_task_monitor_logs_from_file.py` script, instead, which takes a FOFN of GCS paths instead of a space-separated series of localized files.
- These workflows do not use the Terra Data Entity Model to run, so be sure to select the `Run workflow with inputs defined by file paths` workflow submission option.

Expand Down
Loading