Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TSV rename map option to RenameVcfSamples #475

Merged
merged 4 commits into from
Aug 11, 2023

Conversation

epiercehoffman
Copy link
Collaborator

Updates

Revamp RenameVcfSamples

  • Add option to provide File input to map from old to new sample IDs instead of string arrays
  • Make check for whether all samples have a new ID provided optional (slower)
  • Fix disk size determination to use GB :)
  • Add test JSON

Testing

  • Validated all WDLs and JSONs with womtool
  • Tested WDL with string arrays & checking that all samples have a new ID specified (test JSON)
  • Tested WDL with File rename map input for a subset of the samples, without the check that all samples have a new ID specified

Comment on lines 69 to 81
if ~{check_rename_all_samples}; then
python /opt/sv-pipeline/scripts/vcf_replacement_samples.py --vcf ~{vcf} --dict ~{sample_id_rename_map} > reheader.list
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd feel better requiring this check - it should be fast no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the default behavior is to perform the check (check_rename_all_samples = true), but if a map to rename only a subset of samples (instead of providing an old & new sample ID for all samples in the VCF) is provided, the check would fail, so I wanted to provide a way to turn it off in that situation. Does that seem reasonable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the behavior of bcftools when a sample name is invalid? I think it would be better to either rely on bcftools catching such errors or, if it doesn't, having a simple bash command to do the check rather than a script.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bcftools does not raise an error when there is a sample ID in the renaming map that doesn't appear in the header.

I didn't see a good way to do the check in bash, but I replaced the script with a very short inline python block (the script wrote an output file that is no longer needed). By default the check is enabled, so it will catch incorrect IDs when it checks that every sample in the header appears in the renaming map. If a user wishes to rename only a subset of samples, they can either 1) disable the check or 2) include all samples in the renaming map anyway.

I tested this with incorrect sample IDs (failed as expected when check=true), missing sample IDs (failed as expected when check=true and succeeds as expected when check=false), and extra sample IDs (succeeds as expected) in the renaming map.

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One last minor suggestion here.

{
"RenameVcfSamples.vcf": {{ test_batch.clean_vcf | tojson }},
"RenameVcfSamples.prefix": {{ test_batch.name | tojson }},
"RenameVcfSamples.current_sample_ids": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking about this again. In order to be consistent with other templates we should use {{ test_batch.samples }} here and create a new {{ test_batch.alternate_ids }} or the like for the new ids.

@epiercehoffman epiercehoffman merged commit 9fe9694 into main Aug 11, 2023
5 checks passed
@epiercehoffman epiercehoffman deleted the eph_rename_vcf_samples branch August 11, 2023 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants