Add SVStratify and GroupedSVCluster tools #8990
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements two new tools and updates some methods for a revamp of the
CombineBatches
cross-batch integration module in gatk-sv.SVStratify
- tool for splitting out a VCF by variant class. Users pass in a configuration table (see tool documentation for an example) specifying one or more stratification groups classified by SVTYPE, SVLEN range, and reference context(s). The latter are specified as a set of interval lists using--context-name
and--context-intervals
arguments. All variants are matched with their respective group which is annotated in theSTRAT
INFO field. Optionally, the output can be split into multiple VCFs by group, which is a very useful functionality that currently can't be done efficiently with common commands/toolkits.GroupedSVCluster
- a hybrid tool combining functionality fromSVStratify
withSVCluster
to perform intra-stratum clustering. This tool is critical for fine-tuned clustering of specific variants types within certain reference contexts. For example, small variants in simple repeats tend to have lower breakpoint accuracy and are typically "reclustered" during call set refinement with looser clustering criteria.SVStratificationEngine
- new class for performing stratification.CanonicalSVCollapser
that should improve breakpoint accuracy, particularly in larger call sets. Raw evidence support and variant quality are now considered when choosing a representative breakpoint for a group of clustered SVs.FlagFieldLogic
type for customizing howBOTHSIDE_PASS
andHIGH_SR_BACKGROUND
INFO flags are collapsed during clustering.RD_CN
is now used as a backup ifCN
is not available when determining carrier status for sample overlap.