Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

several genes are reported in "PREDICTED_LOF" for a balanced translocation #8852

Closed
Nehir291 opened this issue May 30, 2024 · 4 comments
Closed

Comments

@Nehir291
Copy link

bug report

I have a balanced translocation (CTX) case and it has several genes (475 genes) under PREDICTED_LOF. However, one CTX breakpoint hits an intron of a gene, whereas the other breakpoint is intergenic. How this CTX is calculated as PREDICTED_LOF?

Thanks.

@gokalpcelik
Copy link
Contributor

Gene that CTX hits the intron must be broken in the midst of its coding frame (exon continuation is broken) therefore PREDICTED_LOF should be justified for that matter. If both breakpoints were in the intergenic region then PREDICTED_LOF assessment might have been problematic.

@Nehir291
Copy link
Author

Nehir291 commented Jun 3, 2024

Thanks for the response and clarifying that.
The breakpoints are not overlapping at any exon (one bp is intergenic, the other is in intronic site).
I still don't see how so many genes have been computed under PREDCITED_LOF.

@epiercehoffman
Copy link
Contributor

epiercehoffman commented Jun 27, 2024

Hi @Nehir291, please refer to the SVAnnotate tool documentation for definitions of each annotation: https://gatk.broadinstitute.org/hc/en-us/articles/21905053774363-SVAnnotate. For translocations, PREDICTED_LOF is assigned if a breakpoint falls at any point in the transcript, so an intronic breakpoint would still be annotated as PREDICTED_LOF for any impacted genes. This is because only part of the gene exists on each chromosome after the translocation, which is likely to result in a truncated transcript subject to nonsense-mediated decay.

Perhaps in some cases this definition is overly permissive, such as if only the UTR or one shorter exon is removed by the translocation - those cases could be worth revisiting.

I hope this helps explain the behavior you are seeing. If this does not fully explain all the PREDICTED_LOF annotations you are observing, please share the CTX breakpoints, the annotations, the SVAnnotate version, and the GTF used so we can investigate further.

@epiercehoffman
Copy link
Contributor

Update: after some offline discussion, we found that this was a two-fold issue that has since been fixed.

  1. END was set to END2 in some older VCFs from GATK-SV, so it represented the breakpoint on the second chromosome rather than the first. This has been fixed - multiple more recent GATK-SV VCFs were found to have the correct values for END for CTX events
  2. Older versions of SVAnnotate annotated the interval CHROM:POS-END for CTX, expecting END to be very close to POS. This produced incorrect intervals when END was set to END2, which could be very large, resulting in long lists of genes under PREDICTED_LOF. This has been fixed in Handle CTX_INV subtype in SVAnnotate #8693 so SVAnnotate now independently annotates breakpoints at CHROM:POS and CHROM:END for CTX

For other users encountering this issue in their VCFs produced by older versions of GATK-SV, I recommend rerunning CleanVcf and AnnotateVcf with the latest versions of GATK-SV. A more manual alternative that requires less re-running of workflows would be:

  1. Extract CTX SVs
  2. Set END to POS
  3. Strip out functional consequence annotations
  4. Re-annotate with the latest version of SVAnnotate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants