Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed support for newer Gencode GTF versions #8351

Merged
merged 5 commits into from
Jun 28, 2023

Conversation

jamesemery
Copy link
Collaborator

Fixes #7166 #7385

This doesn't contain any of the necessary work to support Gencode GFF3 files yet #, that will (probably) come in a subsequent PR as it requires a much more substantial refactoring effort of the Gencode datasources code.

@droazen
Copy link
Collaborator

droazen commented Jun 7, 2023

@jamesemery How "future-proof" is this PR? That is, how likely is it that future releases of Gencode will break the parser again? Has the parser been relaxed to the point where it will tolerate the addition of new fields, etc.?

@jamesemery
Copy link
Collaborator Author

@droazen I tried to relax absolutely everything but i stopped short of that. I have a test that any of the optional fields we previously special cased can now have any arbitrary value in them (which was the problem that sunk us here). However if they make DRASTIC changes to future gencode releases (like adding new top level transcript types or inventing a new reference orientation than + or -) then all bets are off...

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. A few questions and change requests.

@jamesemery jamesemery force-pushed the je_fixGTFCodecForGencodeV43Support branch from 58d2262 to 125aa5a Compare June 27, 2023 17:53
@jamesemery
Copy link
Collaborator Author

@jonn-smith responded to comments and back to you. I fully got rid of the old unparsed string of "anonymousOptionalFields" that were occasionally relevant for non-blessed fields in gencode. There is an enum list of the known optional fields for gencode but everything gets parsed into the same key-value list which should make phase 2 easier to finish here.

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@jamesemery jamesemery merged commit 01e45a2 into master Jun 28, 2023
20 checks passed
@jamesemery jamesemery deleted the je_fixGTFCodecForGencodeV43Support branch June 28, 2023 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants