Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds VcfComparator tool #8933

Merged
merged 17 commits into from
Aug 6, 2024
Merged

Adds VcfComparator tool #8933

merged 17 commits into from
Aug 6, 2024

Conversation

meganshand
Copy link
Contributor

Tool is useful for WARP testing so that expected differences can be removed from the check.

@gatk-bot
Copy link

gatk-bot commented Jul 30, 2024

Github actions tests reported job failures from actions build 10169605080
Failures in the following jobs:

Test Type JDK Job ID Logs
unit 17.0.6+10 10169605080.12 logs
unit 17.0.6+10 10169605080.1 logs

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code I gave you to start with is very much gradware that we used to be able to hide in private classes in GATK3, but that era is over. Maybe @droazen will let us create a different walker category to hide it in?

@@ -469,6 +469,13 @@ static List<Genotype> cleanupGenotypeAnnotations(final VariantContext vc, final
builder.AD(AD);
}

//convert GQ0s that were reblocked back to no-calls for better AN and InbreedingCoeff annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there shouldn't be any edits in this class on this branch if the rebase went properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*
* Client tools must implement apply(List<VariantContext> variantContexts, ReferenceContext referenceContext)
*/
public abstract class MultiVariantWalkerGroupedByOverlap extends MultiVariantWalker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not gonna lie, I have no memory of where this class comes from. Apparently this was two years ago, so maybe it shouldn't be a surprise. I think it's a modified version of MultiVariantWalkerGroupedOnStart. I suspect we needed it because I wanted the star alleles to be returned with their deletions? I just talked myself into keeping this, although we probably don't need it for this exact implementation if you turned off the spandel checks.

@@ -83,6 +85,16 @@ public static boolean isAlleleSpecific(final VariantAnnotation annotation) {
return annotation instanceof AlleleSpecificAnnotation;
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be refactored to use the method below: isAlleleSpecificGatkKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

@Test(dataProvider = "getTestFiles")
public void testAnnotationDifferences(String args, String expectedWarnings) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that puts a huge delta on the VQSLOD (such that -4.8149 and -3.2178 from the existing test are acceptable)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an argument to adjust that threshold for VQSLOD currently (am I just missing it?). I see tolerances for other attributes though. Should I add one for VQSLOD and AS_VQSLOD?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if that arg isn't there then we don't need to test it. :-/ I mostly want to demonstrate usage of the inexact match that's already there. I guess you could manually edit a QUAL score.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, I added a QUAL test

}
}

private void validateSingleSampleDeletions(final VariantContext vc, final VariantContext match,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to a particular GATK version update, but I can't remember which one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine that future updates will need adjustments to this tool to account for expected differences. Are we hoping that those changes get committed into master each time? Or is it ok to leave in some of these examples for future users to model their own after?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine leaving old stuff, but it would probably be good to commit whatever we end up using for WARP test comparisons

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll add a note about this to the "how to update gatk in warp" doc.

throw new UserException("Alleles are mismatched at " + actual.getContig() + ":" + actual.getStart() + ": actual has "
+ actual.getAlternateAlleles() + " and expected has " + expected.getAlternateAlleles());

} else if (allowNewStars && hasNewStar(actual, expected)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be !allowNewStars...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is hard to write a test for without some example input, but I agree with you so I changed it here. If you know of an example VCF with some changed star alleles let me know.

continue;
}
final Object actualValue = actual.get(key);
if (expectedValue instanceof List && actualValue instanceof List || key.equals(GATKVCFConstants.AS_SB_TABLE_KEY)) { //|| (key.contains("RAW") joint in OR with AS_SB_TABLE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember what this comment means -- we can probably remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (int i = 0; i < iterationEnd; i++) {
if (i >= actualList.size() || i >= expectedList.size()) {
//TODO: the toStrings here don't do a great job
//throw makeVariantExceptionFromDifference(key, actualValue.toString(), expectedValue.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should throw, but maybe I commented it out for testing for some reason...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (for now at least)

} else {
alleleBalance = ad[1] / ((double) ad[0] + ad[1]);
}
if (alleleBalance >= 0.2 && alleleBalance <= 0.8) { //TODO: this isn't super robust
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this TODO. This is what the gnomAD criteria state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@droazen
Copy link
Collaborator

droazen commented Jul 31, 2024

@ldgauthier Well, there is always the @ExperimentalFeature tag if you think it's appropriate in this case. But if @meganshand did a pass to clean up / refactor the old code, @BetaFeature might be the right label....

@meganshand
Copy link
Contributor Author

@ldgauthier back to you!

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine leaving this as an ExperimentalFeature with no refactor. I worry a little about there being logic in there to ignore certain differences by default. If you could just do a quick skim with that in mind the I think this is good enough to merge for now.

}

@Test(dataProvider = "getTestFiles")
public void testAnnotationDifferences(String args, String expectedWarnings) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if that arg isn't there then we don't need to test it. :-/ I mostly want to demonstrate usage of the inexact match that's already there. I guess you could manually edit a QUAL score.

}
}

private void validateSingleSampleDeletions(final VariantContext vc, final VariantContext match,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine leaving old stuff, but it would probably be good to commit whatever we end up using for WARP test comparisons

Copy link
Contributor Author

@meganshand meganshand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked in WARP and everywhere that we call this tool we also check that the VCFs are identical with diff (after removing the header). So I think it's ok if we're skipping some differences by default with this tool since the raw diff will catch them. I also skimmed the tool to see what we're skipping and other than some of the defaults for QUAL and inbreedingCoeff, I think most checks are fairly robust.

}

@Test(dataProvider = "getTestFiles")
public void testAnnotationDifferences(String args, String expectedWarnings) throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, I added a QUAL test

}
}

private void validateSingleSampleDeletions(final VariantContext vc, final VariantContext match,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll add a note about this to the "how to update gatk in warp" doc.

@meganshand meganshand merged commit 096be07 into master Aug 6, 2024
20 checks passed
@meganshand meganshand deleted the ms_ldg_VCFcomparator branch August 6, 2024 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants