New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add documentation for the de-novo pipeline #675

Draft

VJalili wants to merge 3 commits into broadinstitute:main from VJalili:denovo-docs

Member

VJalili commented May 13, 2024

This PR extends the docs in the following areas:

Document input and output of the de-novo pipeline;
Document the method used for de-novo variant calling.

VJalili added 3 commits

May 13, 2024 12:40


          Draft de-novo workflow documentation.

18fe0cc


          Move de-novo method description to a new concepts section.

637a2da


          Add a link to the method description.

b76a248

mwalker174 reviewed

View reviewed changes

Collaborator

mwalker174 left a comment

I have some initial suggestions here. I think the inputs need to be greatly simplified / cleaned up for use in Terra before we commit any documentation.

website/docs/modules/denovo.md

+              slug: denovo
+              ---
+              The de-novo workflow operates on the annotated multi-sample VCF file created by

Collaborator

mwalker174 Sep 10, 2024

Suggested change

      
            The de-novo workflow operates on the annotated multi-sample VCF file created by 
          
            The de novo SV workflow operates on the annotated multi-sample VCF file created by

website/docs/modules/denovo.md

+              ### Inputs
+              - `vcf_file`: output of [AnnotateVcf](./av) called output_vcf.
+                Note thatAll families in the vcf file must be included in the pedigree file

Collaborator

mwalker174 Sep 10, 2024

Suggested change

      
              Note thatAll families in the vcf file must be included in the pedigree file
          
              Note that all families in the vcf file must be included in the pedigree file

website/docs/modules/denovo.md

Comment on lines +19 to +22

+              - `ped_input`: Must have a header as follows:
+                | FamID | IndividualID | FatherID | MotherID | Gender    | Affected |
+                |-|-|-|-|-|-|

Collaborator

mwalker174 Sep 10, 2024

Better to just link to this page: https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format

website/docs/modules/denovo.md

+                | FamID | IndividualID | FatherID | MotherID | Gender    | Affected |
+                |-|-|-|-|-|-|
+              - `genomic_disorder_input`: a file in BED format that contains regions of genomic disorder;

Collaborator

mwalker174 Sep 10, 2024

Suggested change

      
            - `genomic_disorder_input`: a file in BED format that contains regions of genomic disorder; 
          
            - `genomic_disorder_input`: a file in BED format that contains genomic disorder regions;

website/docs/modules/denovo.md

+              - `genomic_disorder_input`: a file in BED format that contains regions of genomic disorder;
+                 variants that overlap these regions will not be removed from the input VCF file.
+              - `contigs`: Should be set to the following list.

Collaborator

mwalker174 Sep 10, 2024

Suggested change

      
            - `contigs`: Should be set to the following list.
          
            - `contigs`: List of reference contig names, e.g. for `hg38`:

website/docs/modules/denovo.md

+                [ "chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX" ]`
+                ```
+              - `python_config`: a text file as the following.

Collaborator

mwalker174 Sep 10, 2024

Suggested change

      
            - `python_config`: a text file as the following.
          
            - `python_config`: a text file defining the following parameters:

website/docs/modules/denovo.md

+                gq_min: '0'
+                ```
+                Note that you value may increase the value of `cohort_AF` if the cohort is small.

Collaborator

mwalker174 Sep 10, 2024

How small?

website/docs/modules/denovo.md

Comment on lines +53 to +54

		a txt file with first column as batch and second column raw file generated from
		module05-ClusterBatch for all callers except depth (clustered_manta_vcf, clustered_melt_vcf, clustered_wham_vcf).

Collaborator

mwalker174 Sep 10, 2024

Suggested change

      
              a txt file with first column as batch and second column raw file generated from 
          
              module05-ClusterBatch for all callers except depth (clustered_manta_vcf, clustered_melt_vcf, clustered_wham_vcf).
          
              a txt file where the first column is the batch name and second column is the raw file generated from 
          
              the ClusterBatch workflow for all callers except depth (clustered_manta_vcf, clustered_melt_vcf, clustered_wham_vcf).

website/docs/modules/denovo.md

+                - Must match batch names in batch_bincov, batch_raw_file, and batch_depth_raw_file
+                - These batches and the samples contained in them are relevant in regards to the bincov matrices and raw files
+              - `prefix`: choose any prefix which will become the prefix of output files

Collaborator

mwalker174 Sep 11, 2024

Suggested change

      
            - `prefix`: choose any prefix which will become the prefix of output files
          
            - `prefix`: a prefix for output filenames

website/docs/concepts/denovo.md

		@@ -0,0 +1,44 @@
		---

Collaborator

mwalker174 Sep 11, 2024

This file is a lot of detail. IMO, it would be better to reference the scripts themselves and have those be sufficiently organized/commented that the methods are clear.

Member Author

VJalili commented Sep 11, 2024

Thank you, @mwalker174, for the feedback! I agree that we need the setup of the inputs polished before we add docs; I will update docs after the inputs are updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet