-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TSV rename map option to RenameVcfSamples #475
Conversation
wdl/RenameVcfSamples.wdl
Outdated
if ~{check_rename_all_samples}; then | ||
python /opt/sv-pipeline/scripts/vcf_replacement_samples.py --vcf ~{vcf} --dict ~{sample_id_rename_map} > reheader.list | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd feel better requiring this check - it should be fast no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the default behavior is to perform the check (check_rename_all_samples = true
), but if a map to rename only a subset of samples (instead of providing an old & new sample ID for all samples in the VCF) is provided, the check would fail, so I wanted to provide a way to turn it off in that situation. Does that seem reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the behavior of bcftools when a sample name is invalid? I think it would be better to either rely on bcftools catching such errors or, if it doesn't, having a simple bash command to do the check rather than a script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bcftools does not raise an error when there is a sample ID in the renaming map that doesn't appear in the header.
I didn't see a good way to do the check in bash, but I replaced the script with a very short inline python block (the script wrote an output file that is no longer needed). By default the check is enabled, so it will catch incorrect IDs when it checks that every sample in the header appears in the renaming map. If a user wishes to rename only a subset of samples, they can either 1) disable the check or 2) include all samples in the renaming map anyway.
I tested this with incorrect sample IDs (failed as expected when check=true), missing sample IDs (failed as expected when check=true and succeeds as expected when check=false), and extra sample IDs (succeeds as expected) in the renaming map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! One last minor suggestion here.
{ | ||
"RenameVcfSamples.vcf": {{ test_batch.clean_vcf | tojson }}, | ||
"RenameVcfSamples.prefix": {{ test_batch.name | tojson }}, | ||
"RenameVcfSamples.current_sample_ids": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking about this again. In order to be consistent with other templates we should use {{ test_batch.samples }}
here and create a new {{ test_batch.alternate_ids }}
or the like for the new ids.
f73779d
to
dbdfb59
Compare
Updates
Revamp RenameVcfSamples
Testing