Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GatherSampleEvidence & TrainGCNV docs #681

Merged
merged 26 commits into from
Sep 25, 2024

Conversation

VJalili
Copy link
Member

@VJalili VJalili commented May 29, 2024

This PR extends the documentation of the GatherSampleEvidence and TrainGCNV workflows. Specifically, it implements the following changes.

  • Extends the documentation on the inputs and outputs of the GatherSampleEvidence, GatherBatchEvidence, and TrainGCNV workflows.
  • Instead of enlisting inputs and outputs, they are formatted as sections. This enables referencing them individually, which is useful when referencing an output of workflow passed as input to another workflow. The header level is set to 4 (i.e., ####); hence, these sections are not rendered in the quick navigation section (top-right panel) to reduce clutter.
  • Adds diagrams for better visualization of up and downstream dependencies of workflows, currently implemented for GatherSampleEvidence, GatherBatchEvidence, EvidenceQC, and TrainGCNV workflows.
  • Add JS & CSS components to enable highlighting text, which is used in this PR to tag optional outputs.

@VJalili VJalili added the documentation Improvements or additions to documentation label May 29, 2024
@VJalili VJalili marked this pull request as ready for review August 2, 2024 17:18
@VJalili VJalili requested a review from mwalker174 August 2, 2024 17:18
Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VJalili for getting these started. I have a few comments below. In general, IMO the documentation can describe each input but should not provide actual values, and we should refer the reader to /inputs for values of specific parameters to cut down on redundancy and avoid inconsistencies.

website/docs/modules/gather_sample_evidence.md Outdated Show resolved Hide resolved
Comment on lines 55 to 59
You may download the file from the following link.

- Per-sample BAM or CRAM files aligned to hg38. Index files (.bai) must be provided if using BAMs.
```shell
gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant since all resource files are available in /inputs/values and also feels inconsistent to list it for just this input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; this would also lead to having to keep updated references in various places. I changed the link to refer to the reference file, would this be better?

website/docs/modules/gather_sample_evidence.md Outdated Show resolved Hide resolved

gse: GatherSampleEvidence
eqc: EvidenceQC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we do usually run EvidenceQC prior to gCNV for outlier detection and batching, it isn't strictly required. Do you want these diagrams to reflect our recommended process or strictly input/output relationships?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was aiming for workflow dependency, loosely reflecting and replacing the Prerequisites list we have. I am happy to change it to visualize the recommended execution order if that makes more sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chiming in that if you decide you want to represent execution order, I think we can just use the overall pipeline diagram for that and additional workflow-specific diagrams aren't necessary. If you want to show dependencies, that would add new information, but I do think it could be confusing when those dependencies don't match the execution order...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is replicating the Prerequisites section we had, and apparently, it is their dependencies. I will change it for the recommended execution order, as module dependency is more code-related than useful to the end user. We can add a link to the diagram for the overall picture, and it is informative to have these diagrams to show what is at the upstream and downstream of each module to better navigate in the pipeline.

website/docs/modules/train_gcnv.md Outdated Show resolved Hide resolved
website/docs/modules/train_gcnv.md Outdated Show resolved Hide resolved
website/docs/modules/train_gcnv.md Outdated Show resolved Hide resolved
website/docs/modules/train_gcnv.md Outdated Show resolved Hide resolved
website/docs/modules/gather_batch_evidence.md Outdated Show resolved Hide resolved
website/docs/modules/gather_batch_evidence.md Outdated Show resolved Hide resolved
@VJalili
Copy link
Member Author

VJalili commented Sep 13, 2024

Thank you, @mwalker174, for the review. I like your idea of scoping documentation to descriptions of arguments and keeping references to file links or recommended/default values in separate places. I updated the PR accordingly.

Copy link
Collaborator

@epiercehoffman epiercehoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really looking forward to having the website fully set up as a one-stop-shop for all our documentation. I have some questions about how we want to approach documenting inputs/outputs and dependencies, as well as some more detailed comments/questions/corrections

t: TrainGCNV
gse --> eqc
eqc --> t
eqc --> gbe
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
eqc --> gbe
t --> gbe

This is related to the existing thread on displaying dependencies. None of the outputs of EvidenceQC are used in GatherBatchEvidence (or TrainGCNV technically). EvidenceQC is recommended to use to create batches for TrainGCNV (and the following steps) but if that's what you wanted to represent I would just exclude GatherBatchEvidence from this diagram since it follows TrainGCNV

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved in the updated diagrams; please recheck.

website/docs/modules/gather_batch_evidence.md Show resolved Hide resolved
website/docs/modules/gather_batch_evidence.md Outdated Show resolved Hide resolved
website/docs/modules/gather_batch_evidence.md Outdated Show resolved Hide resolved
The following is the list of the inputs the GatherBatchEvidence workflow takes.


#### `batch`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to have detailed documentation for every input like this? Is that necessary?

Maybe it could be collapsible so it's more approachable for users who do not need that level of detail? Most users will just use the pre-configured default inputs and will only need detailed documentation on the pipeline-level inputs and outputs, and I wouldn't want to make it more difficult for them to navigate the documentation.

One other thing to consider is there are places where we do want users to be able to edit inputs as necessary, and I wouldn't want those inputs to get lost among the others - a separate category that does not collapse maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to document every required input of these modules.

We have discussed a few options for those required inputs that do not have values set on Terra, or set values need to be adjusted, or set values need tweaking for cohort-to-cohort, etc. One of the options is tagging/labeling such inputs (similar to labeling optional/conditional outputs) and we can think of other alternatives. However, that is beyond the scope of this PR as here we are just documenting all the required (at least leaving a placeholder for them), and we will revisit their spotlighting later.

website/docs/modules/gather_batch_evidence.md Outdated Show resolved Hide resolved
website/docs/modules/gather_batch_evidence.md Show resolved Hide resolved
#### `wham_vcf` {#wham-vcf}
A VCF file containing variants called by Wham.

#### `coverage_counts` {#coverage-counts}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there supposed to be descriptions here? Feels inconsistent with the other sections

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a description of these. We discussed leaving them as placeholders to make sure we will populate them. If you have a description, feel free to suggest one.

website/docs/modules/gather_batch_evidence.md Show resolved Hide resolved
@VJalili
Copy link
Member Author

VJalili commented Sep 19, 2024

@mwalker174 this is ready for a re-review. Per our discussion, a few inputs/outputs have not been documented and are left as placeholders for your follow-up PR 🚀 Thanks for your feedback!!

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks let's fix the warning message I sent offline and then get this in.

@VJalili
Copy link
Member Author

VJalili commented Sep 25, 2024

Thank you, @mwalker174! and Emma! for the feedback. Looking forward to further improvements coming next on the modules documentation.

@VJalili VJalili merged commit 83e9464 into broadinstitute:main Sep 25, 2024
3 checks passed
@VJalili VJalili deleted the docs_gather_sample_evidence branch September 25, 2024 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants