Improve Scramble accuracy for BWA and Dragen 3.7.8 #722
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates methods for Scramble calling. This includes several critical components:
GatherSampleEvidence
now automatically detects if Dragen 3.7.8 is used using the CRAM header (or if theis_dragen_3_7_8
flag is set explicitly). If so, it performs realignment of all soft-clipped reads near Scramble calls using BWA and re-runs Scramble off the realignment. The cost of this is very small and eliminates this artifact.ResolveComplexVariants
to prioritize Manta over Scramble. A major source of Scramble false positives was at deletion sites. This filter catches such cases missed in the raw call filter from part (1).ApplyManualVariantFilter
to remove Scramble-only SVAs flagged asHIGH_SR_BACKGROUND
. This raised SVA precision from ~0.65 to ~0.85 in our tests, with a tolerably small loss of sensitivity. In the future, improvements and/or retraining of the GQ Recalibrator model may obviate the need for this hard filter.This PR also includes changes to the README to recommend Scramble for MEI calling and deprecate MELT support.
These changes were extensively tested and validated on CRAMS aligned with both BWA and Dragen-3.7.8, including a full run of 1000 Genomes Phase 3 on 3.7.8.