Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for the de-novo pipeline #675

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

VJalili
Copy link
Member

@VJalili VJalili commented May 13, 2024

This PR extends the docs in the following areas:

  • Document input and output of the de-novo pipeline;
  • Document the method used for de-novo variant calling.

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some initial suggestions here. I think the inputs need to be greatly simplified / cleaned up for use in Terra before we commit any documentation.

slug: denovo
---

The de-novo workflow operates on the annotated multi-sample VCF file created by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The de-novo workflow operates on the annotated multi-sample VCF file created by
The de novo SV workflow operates on the annotated multi-sample VCF file created by

### Inputs

- `vcf_file`: output of [AnnotateVcf](./av) called output_vcf.
Note thatAll families in the vcf file must be included in the pedigree file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note thatAll families in the vcf file must be included in the pedigree file
Note that all families in the vcf file must be included in the pedigree file

Comment on lines +19 to +22
- `ped_input`: Must have a header as follows:

| FamID | IndividualID | FatherID | MotherID | Gender | Affected |
|-|-|-|-|-|-|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| FamID | IndividualID | FatherID | MotherID | Gender | Affected |
|-|-|-|-|-|-|

- `genomic_disorder_input`: a file in BED format that contains regions of genomic disorder;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `genomic_disorder_input`: a file in BED format that contains regions of genomic disorder;
- `genomic_disorder_input`: a file in BED format that contains genomic disorder regions;

- `genomic_disorder_input`: a file in BED format that contains regions of genomic disorder;
variants that overlap these regions will not be removed from the input VCF file.

- `contigs`: Should be set to the following list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `contigs`: Should be set to the following list.
- `contigs`: List of reference contig names, e.g. for `hg38`:

[ "chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX" ]`
```

- `python_config`: a text file as the following.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `python_config`: a text file as the following.
- `python_config`: a text file defining the following parameters:

gq_min: '0'
```

Note that you value may increase the value of `cohort_AF` if the cohort is small.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How small?

Comment on lines +53 to +54
a txt file with first column as batch and second column raw file generated from
module05-ClusterBatch for all callers except depth (clustered_manta_vcf, clustered_melt_vcf, clustered_wham_vcf).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a txt file with first column as batch and second column raw file generated from
module05-ClusterBatch for all callers except depth (clustered_manta_vcf, clustered_melt_vcf, clustered_wham_vcf).
a txt file where the first column is the batch name and second column is the raw file generated from
the ClusterBatch workflow for all callers except depth (clustered_manta_vcf, clustered_melt_vcf, clustered_wham_vcf).

- Must match batch names in batch_bincov, batch_raw_file, and batch_depth_raw_file
- These batches and the samples contained in them are relevant in regards to the bincov matrices and raw files

- `prefix`: choose any prefix which will become the prefix of output files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `prefix`: choose any prefix which will become the prefix of output files
- `prefix`: a prefix for output filenames

@@ -0,0 +1,44 @@
---
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a lot of detail. IMO, it would be better to reference the scripts themselves and have those be sufficiently organized/commented that the methods are clear.

@VJalili
Copy link
Member Author

VJalili commented Sep 11, 2024

Thank you, @mwalker174, for the feedback! I agree that we need the setup of the inputs polished before we add docs; I will update docs after the inputs are updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants