From 3f194569375ec71d64f782ea3cc4ed441b8c2380 Mon Sep 17 00:00:00 2001
From: Miguel Covarrubias <mcovarr@broadinstitute.org>
Date: Wed, 14 Aug 2024 16:53:52 -0400
Subject: [PATCH 1/4] Slight docs adjustment [VS-1366]

---
 scripts/variantstore/docs/aou/AOU_DELIVERABLES.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md b/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
index c1ff184b282..d3be5e609a6 100644
--- a/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
+++ b/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
@@ -128,9 +128,7 @@ The pipeline takes in the VDS and outputs a variant annotations table in BigQuer
       - For both PGEN and VCF extracts of ACAF only:
         - Specify an `extract_overhead_memory_override_gib` of 5 (GiB, up from the default of 3 GiB).
         - Specify a `y_bed_weight_scaling` of 8 (up from the default of 4).
-        - If re-running the extract workflow with call caching enabled, it may be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly!
-          - For Echo ACAF VCF extract, the VCF extract workflow was call-caching re-run with `ExtractTask` memory hard-coded to 100 GiB. 9/9 extract shards which did not complete on the initial run of the workflow succeeded on their first (non-preempted) attempt in the second run of the workflow.
-          - For Echo ACAF PGEN extract, the PGEN extract workflow was call-caching re-run with `PgenExtractTask` memory hard-coded to 50 GiB. 20/24 extract shards which did not complete on the initial run of the workflow succeeded on their first (non-preempted) attempt in the second run of the workflow. The remaining 4 shards hit `OutOfMemoryErrors` on their first attempt but succeeded on the second attempt with 50 GiB * 1.5 = 75 GiB of memory thanks to "retry with more memory".
+        - When re-running the extract workflow with call caching enabled, it will be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` runtime attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly! Both VCF and PGEN extracts can have their memory set to `"50 GiB"` for the call-caching re-run. Most extract shards should finish on the first re-run attempt, but a few stragglers will require more than one attempt.
     - If you want to collect the monitoring logs from a large number of `Extract` shards, the `summarize_task_monitor_logs.py` script will not work if the task is scattered too wide.  Use the `summarize_task_monitor_logs_from_file.py` script, instead, which takes a FOFN of GCS paths instead of a space-separated series of localized files.
     - These workflows do not use the Terra Data Entity Model to run, so be sure to select the `Run workflow with inputs defined by file paths` workflow submission option.
 

From e34f1ad69e0675585f7c58a4d8a32b6d0c523f78 Mon Sep 17 00:00:00 2001
From: Miguel Covarrubias <mcovarr@broadinstitute.org>
Date: Wed, 14 Aug 2024 17:01:04 -0400
Subject: [PATCH 2/4] fix

---
 scripts/variantstore/docs/aou/AOU_DELIVERABLES.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md b/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
index d3be5e609a6..77f6858301a 100644
--- a/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
+++ b/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
@@ -128,7 +128,7 @@ The pipeline takes in the VDS and outputs a variant annotations table in BigQuer
       - For both PGEN and VCF extracts of ACAF only:
         - Specify an `extract_overhead_memory_override_gib` of 5 (GiB, up from the default of 3 GiB).
         - Specify a `y_bed_weight_scaling` of 8 (up from the default of 4).
-        - When re-running the extract workflow with call caching enabled, it will be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` runtime attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly! Both VCF and PGEN extracts can have their memory set to `"50 GiB"` for the call-caching re-run. Most extract shards should finish on the first re-run attempt, but a few stragglers will require more than one attempt.
+        - When re-running the extract workflow with call caching enabled, it will be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` runtime attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly! Both VCF and PGEN extracts can have their memory set to `"50 GiB"` for the call-caching re-run. Most extract shards should finish on the first re-run attempt, but a few stragglers will need to automatically re-run with more memory.
     - If you want to collect the monitoring logs from a large number of `Extract` shards, the `summarize_task_monitor_logs.py` script will not work if the task is scattered too wide.  Use the `summarize_task_monitor_logs_from_file.py` script, instead, which takes a FOFN of GCS paths instead of a space-separated series of localized files.
     - These workflows do not use the Terra Data Entity Model to run, so be sure to select the `Run workflow with inputs defined by file paths` workflow submission option.
 

From 3d84a1f7fcffbb56b38cd0910c2a312d0d3f795c Mon Sep 17 00:00:00 2001
From: Miguel Covarrubias <mcovarr@broadinstitute.org>
Date: Wed, 14 Aug 2024 17:01:44 -0400
Subject: [PATCH 3/4] fix fix

---
 scripts/variantstore/docs/aou/AOU_DELIVERABLES.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md b/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
index 77f6858301a..85d04a1ae65 100644
--- a/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
+++ b/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
@@ -128,7 +128,7 @@ The pipeline takes in the VDS and outputs a variant annotations table in BigQuer
       - For both PGEN and VCF extracts of ACAF only:
         - Specify an `extract_overhead_memory_override_gib` of 5 (GiB, up from the default of 3 GiB).
         - Specify a `y_bed_weight_scaling` of 8 (up from the default of 4).
-        - When re-running the extract workflow with call caching enabled, it will be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` runtime attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly! Both VCF and PGEN extracts can have their memory set to `"50 GiB"` for the call-caching re-run. Most extract shards should finish on the first re-run attempt, but a few stragglers will need to automatically re-run with more memory.
+        - When re-running the extract workflow with call caching enabled, it will be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` runtime attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly! Both VCF and PGEN extracts can have their memory set to `"50 GiB"` for the call-caching re-run. Most extract shards should finish on the first re-run attempt, but a few stragglers will likely OOM and automatically re-run with more memory.
     - If you want to collect the monitoring logs from a large number of `Extract` shards, the `summarize_task_monitor_logs.py` script will not work if the task is scattered too wide.  Use the `summarize_task_monitor_logs_from_file.py` script, instead, which takes a FOFN of GCS paths instead of a space-separated series of localized files.
     - These workflows do not use the Terra Data Entity Model to run, so be sure to select the `Run workflow with inputs defined by file paths` workflow submission option.
 

From 0b9c66a66665ffc6620588b67fe8f59e8341c3ce Mon Sep 17 00:00:00 2001
From: Miguel Covarrubias <mcovarr@users.noreply.github.com>
Date: Mon, 19 Aug 2024 12:09:20 -0400
Subject: [PATCH 4/4] Update scripts/variantstore/docs/aou/AOU_DELIVERABLES.md

Co-authored-by: George Grant <ggrant@broadinstitute.org>
---
 scripts/variantstore/docs/aou/AOU_DELIVERABLES.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md b/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
index 85d04a1ae65..d69edaf4136 100644
--- a/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
+++ b/scripts/variantstore/docs/aou/AOU_DELIVERABLES.md
@@ -128,7 +128,7 @@ The pipeline takes in the VDS and outputs a variant annotations table in BigQuer
       - For both PGEN and VCF extracts of ACAF only:
         - Specify an `extract_overhead_memory_override_gib` of 5 (GiB, up from the default of 3 GiB).
         - Specify a `y_bed_weight_scaling` of 8 (up from the default of 4).
-        - When re-running the extract workflow with call caching enabled, it will be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` runtime attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly! Both VCF and PGEN extracts can have their memory set to `"50 GiB"` for the call-caching re-run. Most extract shards should finish on the first re-run attempt, but a few stragglers will likely OOM and automatically re-run with more memory.
+        - When re-running the extract workflow with call caching enabled, it will be necessary to increase memory in the `ExtractTask` / `PgenExtractTask` tasks. Due to the way call caching works in Cromwell (i.e. the `memory` runtime attribute is not part of the call caching hashes), it is possible to edit the value of the `memory` runtime attribute of a task _in the WDL_ without breaking call caching. However, do *not* alter the value of the `memory_gib` input parameter as changing that absolutely will break call caching and will cause tens of thousands of shards to re-run needlessly! Both VCF and PGEN extracts can have their memory set to `"50 GiB"` for the call-caching re-run. Most extract shards should finish on the first re-run attempt, but a few stragglers will likely OOM and automatically re-run with more memory.
     - If you want to collect the monitoring logs from a large number of `Extract` shards, the `summarize_task_monitor_logs.py` script will not work if the task is scattered too wide.  Use the `summarize_task_monitor_logs_from_file.py` script, instead, which takes a FOFN of GCS paths instead of a space-separated series of localized files.
     - These workflows do not use the Terra Data Entity Model to run, so be sure to select the `Run workflow with inputs defined by file paths` workflow submission option.