broadinstitute · VJalili · Sep 25, 2024 · May 13, 2024 · May 13, 2024 · May 29, 2024
diff --git a/website/docs/modules/evidence_qc.md b/website/docs/modules/evidence_qc.md
@@ -5,6 +5,8 @@ sidebar_position: 2
 slug: eqc
 ---
 
+import { Highlight, HighlightOptionalArg } from "../../src/components/highlight.js"
+
 Runs ploidy estimation, dosage scoring, and optionally VCF QC. 
 The results from this module can be used for QC and batching.
 
@@ -17,9 +19,36 @@ for further guidance on creating batches.
 We also recommend using sex assignments generated from the ploidy 
 estimates and incorporating them into the PED file, with sex = 0 for sex aneuploidies.
 
-### Prerequisites
+The following diagram illustrates the upstream and downstream workflows of the `EvidenceQC` workflow 
+in the recommended invocation order. You may refer to 
+[this diagram](https://github.com/broadinstitute/gatk-sv/blob/main/terra_pipeline_diagram.jpg) 
+for the overall recommended invocation order.
+
+<br/>
+
+```mermaid
+
+stateDiagram
+  direction LR
+
+  classDef inModules stroke-width:0px,fill:#00509d,color:#caf0f8
+  classDef thisModule font-weight:bold,stroke-width:0px,fill:#ff9900,color:white
+  classDef outModules stroke-width:0px,fill:#caf0f8,color:#00509d
+
+  gse: GatherSampleEvidence
+  eqc: EvidenceQC
+  batching: Batching, sample QC, and sex assignment
+
+  gse --> eqc
+  eqc --> batching
+
+  class eqc thisModule
+  class gse inModules
+  class batching outModules
+```
+
+<br/>
 
-- [Gather Sample Evidence](./gse)
 
 ### Inputs
 

diff --git a/website/docs/modules/gather_batch_evidence.md b/website/docs/modules/gather_batch_evidence.md
@@ -5,25 +5,174 @@ sidebar_position: 4
 slug: gbe
 ---
 
-Runs CNV callers (cnMOPs, GATK gCNV) and combines single-sample 
-raw evidence into a batch. See above for more information on batching.
+Runs CNV callers ([cn.MOPS](https://academic.oup.com/nar/article/40/9/e69/1136601), GATK-gCNV) 
+and combines single-sample raw evidence into a batch.
 
-### Prerequisites
+The following diagram illustrates the downstream workflows of the `GatherBatchEvidence` workflow 
+in the recommended invocation order. You may refer to 
+[this diagram](https://github.com/broadinstitute/gatk-sv/blob/main/terra_pipeline_diagram.jpg) 
+for the overall recommended invocation order.
 
-- GatherSampleEvidence
-- (Recommended) EvidenceQC
-- gCNV training. 
+```mermaid
 
-### Inputs
-- PED file (updated with EvidenceQC sex assignments, including sex = 0 
-  for sex aneuploidies. Calls will not be made on sex chromosomes 
-  when sex = 0 in order to avoid generating many confusing calls 
-  or upsetting normalized copy numbers for the batch.)
-- Read count, BAF, PE, SD, and SR files (GatherSampleEvidence)
-- Caller VCFs (GatherSampleEvidence)
-- Contig ploidy model and gCNV model files (gCNV training)
+stateDiagram
+  direction LR
+
+  classDef inModules stroke-width:0px,fill:#00509d,color:#caf0f8
+  classDef thisModule font-weight:bold,stroke-width:0px,fill:#ff9900,color:white
+  classDef outModules stroke-width:0px,fill:#caf0f8,color:#00509d
 
-### Outputs
+  gbe: GatherBatchEvidence
+  t: TrainGCNV
+  cb: ClusterBatch
+  t --> gbe
+  gbe --> cb
+
+  class gbe thisModule
+  class t inModules
+  class cb outModules
+```
+
+## Inputs
+This workflow takes as input the read counts, BAF, PE, SD, SR, and per-caller VCF files 
+produced in the GatherSampleEvidence workflow, and contig ploidy and gCNV models from 
+the TrainGCNV workflow.
+The following is the list of the inputs the GatherBatchEvidence workflow takes.
+
+
+#### `batch`
+An identifier for the batch.
+
+
+#### `samples`
+Sets the list of sample IDs. 
+
+
+#### `counts`
+Set to the [`GatherSampleEvidence.coverage_counts`](./gse#coverage-counts) output.
+
+
+#### Raw calls
+
+The following inputs set the per-caller raw SV calls, and should be set 
+if the caller was run in the [`GatherSampleEvidence`](./gse) workflow.
+You may set each of the following inputs to the linked output from 
+the GatherSampleEvidence workflow.
+
+
+- `manta_vcfs`: [`GatherSampleEvidence.manta_vcf`](./gse#manta-vcf);
+- `melt_vcfs`: [`GatherSampleEvidence.melt_vcf`](./gse#melt-vcf);
+- `scramble_vcfs`: [`GatherSampleEvidence.scramble_vcf`](./gse#scramble-vcf);
+- `wham_vcfs`: [`GatherSampleEvidence.wham_vcf`](./gse#wham-vcf).
+
+#### `PE_files`
+Set to the [`GatherSampleEvidence.pesr_disc`](./gse#pesr-disc) output.
+
+#### `SR_files`
+Set to the [`GatherSampleEvidence.pesr_split`](./gse#pesr-split)
+
+
+#### `SD_files`
+Set to the [`GatherSampleEvidence.pesr_sd`](./gse#pesr-sd)
+
+
+#### `matrix_qc_distance`
+You may refer to [this file](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/templates/terra_workspaces/cohort_mode/workflow_configurations/GatherBatchEvidence.json.tmpl)
+for an example value. 
+
+
+#### `min_svsize`
+Sets the minimum size of SVs to include.
+
+
+#### `ped_file`
+A pedigree file describing the familial relationshipts between the samples in the cohort.
+Please refer to [this section](./#ped_file) for details. 
+
+
+#### `run_matrix_qc`
+Enables or disables running optional QC tasks. 
+
+
+#### `gcnv_qs_cutoff`
+You may refer to [this file](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/templates/terra_workspaces/cohort_mode/workflow_configurations/GatherBatchEvidence.json.tmpl)
+for an example value. 
+
+#### cn.MOPS files
+The workflow needs the following cn.MOPS files.
+
+- `cnmops_chrom_file` and `cnmops_allo_file`: FASTA index files (`.fai`) for respectively 
+  non-sex chromosomes (autosomes) and chromosomes X and Y (allosomes). 
+  The file format is explained [on this page](https://www.htslib.org/doc/faidx.html).
+
+  You may use the following files for these fields:
+
+  ```json
+  "cnmops_chrom_file": "gs://gcp-public-data--broad-references/hg38/v0/sv-resources/resources/v1/autosome.fai"
+  "cnmops_allo_file": "gs://gcp-public-data--broad-references/hg38/v0/sv-resources/resources/v1/allosome.fai"
+  ```
+
+- `cnmops_exclude_list`: 
+  You may use [this file](https://github.com/broadinstitute/gatk-sv/blob/d66f760865a89f30dbce456a3f720dec8b70705c/inputs/values/resources_hg38.json#L10)
+  for this field.
+
+#### GATK-gCNV inputs
+
+The following inputs are configured based on the outputs generated in the [`TrainGCNV`](./gcnv) workflow.
+
+- `contig_ploidy_model_tar`: [`TrainGCNV.cohort_contig_ploidy_model_tar`](./gcnv#contig-ploidy-model-tarball)
+- `gcnv_model_tars`: [`TrainGCNV.cohort_gcnv_model_tars`](./gcnv#model-tarballs)
+
+
+The workflow also enables setting a few optional arguments of gCNV.
+The arguments and their default values are provided 
+[here](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/templates/terra_workspaces/cohort_mode/workflow_configurations/GatherBatchEvidence.json.tmpl) 
+as the following, and each argument is documented on 
+[this page](https://gatk.broadinstitute.org/hc/en-us/articles/360037593411-PostprocessGermlineCNVCalls)
+and
+[this page](https://gatk.broadinstitute.org/hc/en-us/articles/360047217671-GermlineCNVCaller).
+
+
+#### Docker images
+
+The workflow needs the following Docker images, the latest versions of which are in 
+[this file](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/values/dockers.json).
+
+  - `cnmops_docker`;
+  - `condense_counts_docker`;
+  - `linux_docker`;
+  - `sv_base_docker`;
+  - `sv_base_mini_docker`;
+  - `sv_pipeline_docker`;
+  - `sv_pipeline_qc_docker`;
+  - `gcnv_gatk_docker`;
+  - `gatk_docker`.
+
+#### Static inputs
+
+You may refer to [this reference file](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/values/resources_hg38.json)
+for values of the following inputs.
+
+ - `primary_contigs_fai`;
+ - `cytoband`;
+ - `ref_dict`;
+ - `mei_bed`;
+ - `genome_file`;
+ - `sd_locs_vcf`.
+
+
+#### Optional Inputs
+The following is the list of a few optional inputs of the 
+workflow, with an example of possible values. 
+
+- `"allosomal_contigs": [["chrX", "chrY"]]`
+- `"ploidy_sample_psi_scale": 0.001`
+
+
+
+
+
+## Outputs
 
 - Combined read count matrix, SR, PE, and BAF files
 - Standardized call VCFs

diff --git a/website/docs/modules/gather_sample_evidence.md b/website/docs/modules/gather_sample_evidence.md
@@ -6,20 +6,77 @@ slug: gse
 ---
 
 Runs raw evidence collection on each sample with the following SV callers: 
-Manta, Wham, and/or MELT. For guidance on pre-filtering prior to GatherSampleEvidence, 
+Manta, Wham, Scramble, and/or MELT. For guidance on pre-filtering prior to GatherSampleEvidence, 
 refer to the Sample Exclusion section.
 
-Note: a list of sample IDs must be provided. Refer to the sample ID 
-requirements for specifications of allowable sample IDs. 
+The following diagram illustrates the downstream workflows of the `GatherSampleEvidence` workflow 
+in the recommended invocation order. You may refer to 
+[this diagram](https://github.com/broadinstitute/gatk-sv/blob/main/terra_pipeline_diagram.jpg) 
+for the overall recommended invocation order.
+
+
+```mermaid
+
+stateDiagram
+  direction LR
+
+  classDef inModules stroke-width:0px,fill:#00509d,color:#caf0f8
+  classDef thisModule font-weight:bold,stroke-width:0px,fill:#ff9900,color:white
+  classDef outModules stroke-width:0px,fill:#caf0f8,color:#00509d
+
+  gse: GatherSampleEvidence
+  eqc: EvidenceQC
+  gse --> eqc
+
+  class gse thisModule
+  class eqc outModules
+```
+
+
+## Inputs
+
+#### `bam_or_cram_file`
+A BAM or CRAM file aligned to hg38. Index file (.bai) must be provided if using BAM.
+
+#### `sample_id`
+Refer to the [sample ID requirements](/docs/gs/inputs#sampleids) for specifications of allowable sample IDs. 
 IDs that do not meet these requirements may cause errors.
 
-### Inputs
+#### `preprocessed_intervals`
+Picard interval list.
+
+#### `sd_locs_vcf`
+(`sd`: site depth) 
+A VCF file containing allele counts at common SNP loci of the genome, which is used for calculating BAF.  
+For human genome, you may use [`dbSNP`](https://www.ncbi.nlm.nih.gov/snp/) 
+that contains a complete list of common and clinical human single nucleotide variations, 
+microsatellites, and small-scale insertions and deletions. 
+You may find a link to the file in 
+[this reference](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/values/resources_hg38.json).
 
-- Per-sample BAM or CRAM files aligned to hg38. Index files (.bai) must be provided if using BAMs.
 
-### Outputs
+## Outputs
 
-- Caller VCFs (Manta, MELT, and/or Wham)
 - Binned read counts file
 - Split reads (SR) file
 - Discordant read pairs (PE) file
+
+#### `manta_vcf` {#manta-vcf}
+A VCF file containing variants called by Manta. 
+
+#### `melt_vcf` {#melt-vcf}
+A VCF file containing variants called by MELT. 
+
+#### `scramble_vcf` {#scramble-vcf}
+A VCF file containing variants called by Scramble. 
+
+#### `wham_vcf` {#wham-vcf}
+A VCF file containing variants called by Wham. 
+
+#### `coverage_counts` {#coverage-counts}
+
+#### `pesr_disc` {#pesr-disc}
+
+#### `pesr_split` {#pesr-split}
+
+#### `pesr_sd` {#pesr-sd}
diff --git a/website/docs/modules/index.md b/website/docs/modules/index.md
@@ -36,3 +36,18 @@ consisting of multiple modules to be executed in the following order.
 - **Module 09 (in development)** Visualization, including scripts that generates IGV screenshots and rd plots.
 
 - Additional modules to be added: de novo and mosaic scripts
+
+
+## Pipeline Parameters
+
+Several inputs are shared across different modules of the pipeline, which are explained in this section.
+
+#### `ped_file`
+
+A pedigree file describing the familial relationships between the samples in the cohort.
+The file needs to be in the 
+[PED format](https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format).
+Updated with [EvidenceQC](./eqc) sex assignments, including 
+`sex = 0` for sex aneuploidies; 
+genotypes on chrX and chrY for samples with `sex = 0` in the PED file will be set to 
+`./.` and these samples will be excluded from sex-specific training steps.