Improve Scramble accuracy for BWA and Dragen 3.7.8 #722

mwalker174 · 2024-09-09T16:45:26Z

Updates methods for Scramble calling. This includes several critical components:

Filtering of calls within reference MEIs overlapping indels and small deletion SVs. This was a major source of false positives for Scramble with WGS. The filter identifies such cases from the BAM/CRAM and Manta calls.
Improved SVLEN estimation. The previous method was overly simplistic, but length estimation now accounts for the orientation of soft clips and alignment strand. Note that this method's accuracy is greatly limited in cases where split reads are found on only one side of the breakpoint.
Deduplication. Scramble seems to emit duplicated calls in many cases. Now calls within a short window are collapsed appropriately.
Dragen 3.7.8 realignment. An apparent regression was introduced somewhere after Dragen 3.4.12 and after or as of 3.7.8 where small indels cause erroneous soft-clipping. This caused a massive number of false positive calls with Scramble. GatherSampleEvidence now automatically detects if Dragen 3.7.8 is used using the CRAM header (or if the is_dragen_3_7_8 flag is set explicitly). If so, it performs realignment of all soft-clipped reads near Scramble calls using BWA and re-runs Scramble off the realignment. The cost of this is very small and eliminates this artifact.
Updates the overlap breakpoint filter in ResolveComplexVariants to prioritize Manta over Scramble. A major source of Scramble false positives was at deletion sites. This filter catches such cases missed in the raw call filter from part (1).
SVA precision was initially low even after implementation of the above filters. A hard filter has been added to ApplyManualVariantFilter to remove Scramble-only SVAs flagged as HIGH_SR_BACKGROUND. This raised SVA precision from ~0.65 to ~0.85 in our tests, with a tolerably small loss of sensitivity. In the future, improvements and/or retraining of the GQ Recalibrator model may obviate the need for this hard filter.
Filter secondary/supplementary reads from consideration in the Scramble tool. This was bringing in further false positives. The Dockerfile has been updated.

This PR also includes changes to the README to recommend Scramble for MEI calling and deprecate MELT support.

These changes were extensively tested and validated on CRAMS aligned with both BWA and Dragen-3.7.8, including a full run of 1000 Genomes Phase 3 on 3.7.8.

parent 2b9af68 author Mark Walker <[email protected]> 1701883462 -0500 committer Mark Walker <[email protected]> 1718121961 -0400 parent ceb41fc author Mark Walker <[email protected]> 1701883462 -0500 committer Mark Walker <[email protected]> 1715005950 -0400 parent 9ad640d author Mark Walker <[email protected]> 1701883462 -0500 committer Mark Walker <[email protected]> 1704486807 -0500 Add scramble filter Realignment Add bwa to mini docker Build bwa from git Use /opt/bin Start in /opt Runtime parameters Fix alignment command bump memory Make scramble vcf python script Scramble make vcf task added Fix realignment bai path Missing sv_pipeline_docker input Fix scramble table output Buffered clustering Fix reverses Use samtools view -M; realign from Scramble table instead of vcf Fix interval generation and only realign soft-clipped reads Slop 1 base Slop 150 bases And merge Change scramble priority in overlap bp filter; update dockers Reduce realignment cores to 4 Sort scramble table Fix wdl Update json templates Bump realignment memory to 12gb Update script to filter on indels Indel filtering switch Update Add MEI intervals and manta/wham filtering Delete comments Optimize mei tree loading; consume table directly Add scramble_min_clipped_reads Fix reads_index_ Update scramble vcf script args Add missing vcf script input in GatherSampleEvidence LINE1 to L1 alt Give preference to MEI over Manta insertions in breakpoint overlap filter Update dockers with last build Update dockers Reduce memory usage in RealignSoftClippedReads Bump realignment memory Add manta_vcf_input in case manta isn't run but scramble is Increase min_clipped_reads to 5 Bump memory Update scramble commit Bump MakeScrambleVcf memory to 4 Reduce min reads to 3 Add mei tree padding and alignment_score_cutoff Select alignment score cutoff based on aligner Update dockers Remove debugging lines Add scramble_mei_bed Add scramble_mei_bed to test template Bump ScramblePart1 to 3 GB Adjust scramble clipped reads threshold based on coverage Top level scramble_min_clipped_reads_fraction optional Update dockers Add mw_scramble_filter to ResolveComplexVariants in dockstore.yml Update dockers Update dockstore yml Final touches Update templates; add SVA filter Revert unnecessary changes Lint Update readme Update single sample pipeline and gathersampleevidence batch Update readme Readme grammar Scramble intermedates description in README Add comments to scramble vcf script Minor fix

mwalker174 added 5 commits September 9, 2024 12:17

Fix GatherSampleEvidence optional comparison

8a44017

Fix double declaration of mei_bed in GATKSVPipelineSingleSample

8a63c4f

Update Scramble commit to latest master

0520d12

Update ref panel vcfs

767e8a2

mwalker174 mentioned this pull request Sep 30, 2024

Add genotype filtering Terra workflow configs and documentation #695

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Scramble accuracy for BWA and Dragen 3.7.8 #722

Improve Scramble accuracy for BWA and Dragen 3.7.8 #722

mwalker174 commented Sep 9, 2024

Improve Scramble accuracy for BWA and Dragen 3.7.8 #722

Are you sure you want to change the base?

Improve Scramble accuracy for BWA and Dragen 3.7.8 #722

Conversation

mwalker174 commented Sep 9, 2024