Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Scramble accuracy for BWA and Dragen 3.7.8 #722

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mwalker174
Copy link
Collaborator

Updates methods for Scramble calling. This includes several critical components:

  1. Filtering of calls within reference MEIs overlapping indels and small deletion SVs. This was a major source of false positives for Scramble with WGS. The filter identifies such cases from the BAM/CRAM and Manta calls.
  2. Improved SVLEN estimation. The previous method was overly simplistic, but length estimation now accounts for the orientation of soft clips and alignment strand. Note that this method's accuracy is greatly limited in cases where split reads are found on only one side of the breakpoint.
  3. Deduplication. Scramble seems to emit duplicated calls in many cases. Now calls within a short window are collapsed appropriately.
  4. Dragen 3.7.8 realignment. An apparent regression was introduced somewhere after Dragen 3.4.12 and after or as of 3.7.8 where small indels cause erroneous soft-clipping. This caused a massive number of false positive calls with Scramble. GatherSampleEvidence now automatically detects if Dragen 3.7.8 is used using the CRAM header (or if the is_dragen_3_7_8 flag is set explicitly). If so, it performs realignment of all soft-clipped reads near Scramble calls using BWA and re-runs Scramble off the realignment. The cost of this is very small and eliminates this artifact.
  5. Updates the overlap breakpoint filter in ResolveComplexVariants to prioritize Manta over Scramble. A major source of Scramble false positives was at deletion sites. This filter catches such cases missed in the raw call filter from part (1).
  6. SVA precision was initially low even after implementation of the above filters. A hard filter has been added to ApplyManualVariantFilter to remove Scramble-only SVAs flagged as HIGH_SR_BACKGROUND. This raised SVA precision from ~0.65 to ~0.85 in our tests, with a tolerably small loss of sensitivity. In the future, improvements and/or retraining of the GQ Recalibrator model may obviate the need for this hard filter.
  7. Filter secondary/supplementary reads from consideration in the Scramble tool. This was bringing in further false positives. The Dockerfile has been updated.

This PR also includes changes to the README to recommend Scramble for MEI calling and deprecate MELT support.

These changes were extensively tested and validated on CRAMS aligned with both BWA and Dragen-3.7.8, including a full run of 1000 Genomes Phase 3 on 3.7.8.

parent 2b9af68
author Mark Walker <[email protected]> 1701883462 -0500
committer Mark Walker <[email protected]> 1718121961 -0400

parent ceb41fc
author Mark Walker <[email protected]> 1701883462 -0500
committer Mark Walker <[email protected]> 1715005950 -0400

parent 9ad640d
author Mark Walker <[email protected]> 1701883462 -0500
committer Mark Walker <[email protected]> 1704486807 -0500

Add scramble filter

Realignment

Add bwa to mini docker

Build bwa from git

Use /opt/bin

Start in /opt

Runtime parameters

Fix alignment command

bump memory

Make scramble vcf python script

Scramble make vcf task added

Fix realignment bai path

Missing sv_pipeline_docker input

Fix scramble table output

Buffered clustering

Fix reverses

Use samtools view -M; realign from Scramble table instead of vcf

Fix interval generation and only realign soft-clipped reads

Slop 1 base

Slop 150 bases

And merge

Change scramble priority in overlap bp filter; update dockers

Reduce realignment cores to 4

Sort scramble table

Fix wdl

Update json templates

Bump realignment memory to 12gb

Update script to filter on indels

Indel filtering switch

Update

Add MEI intervals and manta/wham filtering

Delete comments

Optimize mei tree loading; consume table directly

Add scramble_min_clipped_reads

Fix reads_index_

Update scramble vcf script args

Add missing vcf script input in GatherSampleEvidence

LINE1 to L1 alt

Give preference to MEI over Manta insertions in breakpoint overlap filter

Update dockers with last build

Update dockers

Reduce memory usage in RealignSoftClippedReads

Bump realignment memory

Add manta_vcf_input in case manta isn't run but scramble is

Increase min_clipped_reads to 5

Bump memory

Update scramble commit

Bump MakeScrambleVcf memory to 4

Reduce min reads to 3

Add mei tree padding and alignment_score_cutoff

Select alignment score cutoff based on aligner

Update dockers

Remove debugging lines

Add scramble_mei_bed

Add scramble_mei_bed to test template

Bump ScramblePart1 to 3 GB

Adjust scramble clipped reads threshold based on coverage

Top level scramble_min_clipped_reads_fraction optional

Update dockers

Add mw_scramble_filter to ResolveComplexVariants in dockstore.yml

Update dockers

Update dockstore yml

Final touches

Update templates; add SVA filter

Revert unnecessary changes

Lint

Update readme

Update single sample pipeline and gathersampleevidence batch

Update readme

Readme grammar

Scramble intermedates description in README

Add comments to scramble vcf script

Minor fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant