Skip to content

Commit

Permalink
Skip subsampling if batch size is less than n_samples_subsample (#707)
Browse files Browse the repository at this point in the history
Includes a check that n_samples_subsample is less than the batch size - i.e. length(samples). If this is the case, the subsampling task RandomSubsampleStringArray is skipped. 

Also updates documentation to indicate this.
  • Loading branch information
kjaisingh authored Aug 15, 2024
1 parent 54be67f commit f261eed
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ Read the full EvidenceQC documentation [here](https://github.com/broadinstitute/

Read the full TrainGCNV documentation [here](https://github.com/broadinstitute/gatk-sv#gcnv-training-1).
* Before running this workflow, create the batches (~100-500 samples) you will use for the rest of the pipeline based on sample coverage, WGD score (from `02-EvidenceQC`), and PCR status. These will likely not be the same as the batches you used for `02-EvidenceQC`.
* By default, `03-TrainGCNV` is configured to be run once per `sample_set` on 100 randomly-chosen samples from that set to create a gCNV model for each batch. If your `sample_set` contains fewer than 100 samples (not recommended), you will need to edit the `n_samples_subsample` parameter to be less than or equal to the number of samples.
* By default, `03-TrainGCNV` is configured to be run once per `sample_set` on 100 randomly-chosen samples from that set to create a gCNV model for each batch. To modify this behavior, you can set the `n_samples_subsample` parameter to the number of samples to use for training.

#### 04-GatherBatchEvidence

Expand Down
2 changes: 1 addition & 1 deletion wdl/TrainGCNV.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ workflow TrainGCNV {
}
}

if (defined(n_samples_subsample) && !defined(sample_ids_training_subset)) {
if (defined(n_samples_subsample) && (select_first([n_samples_subsample]) < length(samples)) && !defined(sample_ids_training_subset)) {
call util.RandomSubsampleStringArray {
input:
strings = write_lines(samples),
Expand Down

0 comments on commit f261eed

Please sign in to comment.