-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GVS walker to master [VS-964] #8355
Conversation
6678efe
to
87a2a12
Compare
Github actions tests reported job failures from actions build 5405388563
|
87a2a12
to
ec0ccd8
Compare
* Fix a bug where there were NO variants in a range.
ec0ccd8
to
fa74fc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks just about good to me. I left a couple comments, mostly questions and some very minor changes. Beyond those, I have another question and a couple more suggestions:
- Codecov claims a lot of this code doesn't have tests. Am I correct to assume that is because it mostly relies on BigQuery and is therefore prohibitively difficult to write tests for?
- The code is mostly stylistically good, but there are a lot of places where variables, especially method parameters, that should be
final
are not, so it would be good to do a quick pass and update those if possible. - Most of this code could benefit from more documentation. Some of it is very well documented, but some of the files don't seem to have much at all. It would be good to at least have doc comments for the public classes and any public attributes and methods that are not immediately self-explanatory by their names.
Thank you for doing the work to get this merged into master.
@Argument( | ||
fullName = "dataset-id", | ||
doc = "ID of the Google Cloud dataset to use when executing queries", | ||
optional = true // I guess, but won't it break otherwise or require that a dataset be created with the name temp_tables? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be optional or required?
@Argument( | ||
fullName = "sample-file", | ||
doc = "Alternative to `sample-table`. Pass in a (sample_id,sample_name) CSV that describes the full list of samples to extract. No header", | ||
optional = true, | ||
mutex={"sample-table"} | ||
|
||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Argument( | |
fullName = "sample-file", | |
doc = "Alternative to `sample-table`. Pass in a (sample_id,sample_name) CSV that describes the full list of samples to extract. No header", | |
optional = true, | |
mutex={"sample-table"} | |
) | |
@Argument( | |
fullName = "sample-file", | |
doc = "Alternative to `sample-table`. Pass in a (sample_id,sample_name) CSV that describes the full list of samples to extract. No header", | |
optional = true, | |
mutex={"sample-table"} | |
) |
@Argument( | ||
fullName = "print-debug-information", | ||
doc = "If true, print extra debugging output", | ||
optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Argument( | |
fullName = "print-debug-information", | |
doc = "If true, print extra debugging output", | |
optional = true) | |
@Argument( | |
fullName = "print-debug-information", | |
doc = "If true, print extra debugging output", | |
optional = true | |
) |
//TODO verify what we really need here | ||
annotationEngine = new VariantAnnotatorEngine(makeVariantAnnotations(), null, Collections.emptyList(), false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we really need here?
// We want to find the tranche with the smallest target_truth_sensitivity that is | ||
// equal to or greater than our truthSensitivityThreshold. | ||
// e.g. if truthSensitivitySNPThreshold is 99.8 and we have tranches with target_truth_sensitivities | ||
// of 99.5, 99.7, 99.9, and 100.0, we want the 99.9 sensitivity tranche. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to move this to a javadoc comment for this method?
@Argument( | ||
fullName = "vet-avro-file-name", | ||
doc = "Path to data from Vet table in Avro format", | ||
optional = true | ||
) | ||
private GATKPath vetAvroFileName = null; | ||
|
||
@Argument( | ||
fullName = "ref-ranges-avro-file-name", | ||
doc = "Path to data from Vet table in Avro format", | ||
optional = true | ||
) | ||
private GATKPath refRangesAvroFileName = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have the same description so is one of them inaccurate?
/** | ||
* Enforce that if cost information is being recorded to the cost-observability-tablename then *all* recorded | ||
* parameters are set | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to not fully describe what this method does.
// if there is an avro file, the BQ specific parameters are unnecessary, | ||
// but they all are required if there is no avro file | ||
// KCIBUL: revisit!!! | ||
// if ((cohortAvroFileName == null && vetAvroFileName == null && refRangesAvroFileName == null) && (projectID == null || (cohortTable == null && vetRangesFQDataSet == null))) { | ||
// throw new UserException("Project id (--project-id) and cohort table (--cohort-extract-table) are required " + | ||
// "if no avro file (--cohort-avro-file-name or --vet-avro-file-name and --ref-ranges-avro-file-name) is provided."); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be removed?
// COHORT_FIELDS | ||
// public static final ImmutableSet<String> COHORT_FIELDS = ImmutableSet.of( | ||
// SchemaUtils.LOCATION_FIELD_NAME, | ||
// SchemaUtils.SAMPLE_ID_FIELD_NAME, | ||
// SchemaUtils.STATE_FIELD_NAME, | ||
// SchemaUtils.REF_ALLELE_FIELD_NAME, | ||
// SchemaUtils.ALT_ALLELE_FIELD_NAME, | ||
// SchemaUtils.CALL_GT, | ||
// SchemaUtils.CALL_GQ, | ||
// SchemaUtils.CALL_RGQ, | ||
// SchemaUtils.QUALapprox, | ||
// SchemaUtils.AS_QUALapprox, | ||
// SchemaUtils.CALL_PL);//, AS_VarDP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be here?
fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME, | ||
doc = "Output VCF file to which annotated variants should be written." | ||
) | ||
protected String outputVcfPathString = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preferred type for file arguments in GATK is GATKPath, so can this be switched to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I am not overwhelmingly happy with needing to drop this many files for our extract process into GATK in order to make this happen, but for now we're going to move these files over and we can consider separating them back out again later when we move to another repo and have things nice and cleanly carved up. This little island of GVS in core GATK should likely be moved back home then
@mcovarr @koncheto-broad Do all these files exist as they are here in the ah_var_store branch currently? |
|
Would it be easier/preferable for you all to move those changes into the var store branch and I'll work on the PGEN stuff there instead of in master? |
Hi @KevinCLydon, yes actually all of this is on |
@mcovarr @koncheto-broad Had a talk today with @KevinCLydon about this PR, and he made a convincing case that the pgen project is more likely to require additional stuff from the GVS branch than from gatk/master. To avoid a nightmare scenario of constantly having to request additional transplants from the GVS branch, I'm on board with the idea of having Kevin try to work off of the GVS branch for the pgen project for now, provided that we can get timely rebases of |
Thanks @droazen, I'll go ahead and close this PR. The equivalent GVS walker code is already on |
Ready for review!