Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SVStratify and GroupedSVCluster tools #8990

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

mwalker174
Copy link
Contributor

Implements two new tools and updates some methods for a revamp of the CombineBatches cross-batch integration module in gatk-sv.

  • SVStratify - tool for splitting out a VCF by variant class. Users pass in a configuration table (see tool documentation for an example) specifying one or more stratification groups classified by SVTYPE, SVLEN range, and reference context(s). The latter are specified as a set of interval lists using --context-name and --context-intervals arguments. All variants are matched with their respective group which is annotated in the STRAT INFO field. Optionally, the output can be split into multiple VCFs by group, which is a very useful functionality that currently can't be done efficiently with common commands/toolkits.
  • GroupedSVCluster - a hybrid tool combining functionality from SVStratify with SVCluster to perform intra-stratum clustering. This tool is critical for fine-tuned clustering of specific variants types within certain reference contexts. For example, small variants in simple repeats tend to have lower breakpoint accuracy and are typically "reclustered" during call set refinement with looser clustering criteria.
  • SVStratificationEngine - new class for performing stratification.
  • Updates to breakpoint refinement in CanonicalSVCollapser that should improve breakpoint accuracy, particularly in larger call sets. Raw evidence support and variant quality are now considered when choosing a representative breakpoint for a group of clustered SVs.
  • Added FlagFieldLogic type for customizing how BOTHSIDE_PASS and HIGH_SR_BACKGROUND INFO flags are collapsed during clustering.
  • RD_CN is now used as a backup if CN is not available when determining carrier status for sample overlap.
  • Removed no-sort option in favor of spooled sorting.
  • Bug fix: support for empty EVIDENCE info fields
  • Bug fix: in one of the JointGermlineCnvDefragmenter tests

 This is a combination of 2 commits.

Implement SVStratify

Interchrom events

Start unit tests

Finish unit tests

Multiple contexts per stratum

Integration tests; fix empty context bug

Test duplicate context name

Handle CPX/CTX, add some tests

Compiler error

Prioritize SR and PE evidence types for representative breakpoint strategy

Add SVStratifiedCluster

Integration tests for SVStratifiedClustering

Unused line

Spooled sorting in cluster tools

Start fixing JointGCNVSegmentation

Fix JointGCNVSegmentation integration test

Rename to GroupedSVCluster

Fix sorting bug

Comment

Use RD_CN for CNV sample overlap; improve testing

Documentation

Add comment about requiresOverlapAndProximity

Allow empty EVIDENCE list

Add STRAT INFO field

Representative breakpoint by variant quality

Handle BOTHSIDES_SUPPORT and HIGH_SR_BACKGROUND in variant collapsing

Add expected exception
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant