You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello dear friends!
Thanks for developing vg such a useful and magic tool for pangenome graph.
Firstly I need to say I'm fresh to manipulating graphs due to the various formats (e.g. .vg, .xg, .gbz, .gam ...). And now, as a junior, I do need some helps:
I have a human pangenome graph with several genomes with a reference genome_a. And I want to see the locations of some interested genes regions in my graph like the Fig. 5d in HPRC publication. Due to the high complexity of these regions like MHC, gene annotations are not reliable for which we can just draw the gene locations from annotations. Therefore, I turned to using graph to get locally detailed and confident gene annotations. At first, I have tried this method (actually this method is following the odgi tutorial):
extract subgraphs with odgi
get the interested gene bed file and inject them to graph
odgi untangle the injected graph to see the locations of genes on each path
However, I found that for genes having CNV, this method seems often inable to capture all gene copies (actually usually just one copy), so I have turned to finding anther useful method. As for now, I intended to:
align interested genes sequence like HLA genes which were extracted from GRCh38.p14 to graph using Graphaligner
using the alignment generated by step 1 to get gene locations on each haplotype of my graph
For step 2, I initially used vg annotate but it seems only work for reference path (#4158). And I used vg surject using command:
which have not got results as I write this.
Also from #4158, in which the developers suggested:
but if you have the GAF and you have the GFA you can compare the node names that the GAF reads visit against the node names that each GFA path visits, and find the nodes at which each read intersects with each path it touches.
and I think I can also use this, well stupid method, to get the gene locations from the gaf file Graphaligner generated.
Emmm, I don't know whether vg surject I used above can generate correct alignment file containing the gene locations on each path or not. So I want to know anybody can give me some advice for my process and method or any other helpful method. Please!
Best wishes! Thanks!
The text was updated successfully, but these errors were encountered:
I don't think we have a known good way to get annotations against all the different samples in the graph using vg. Your idea of injecting into the path you have annotations on and then surjecting that sequence to each other path you are interested in, as an alignment, might work OK.
If you actually have assemblies you want annotated, I think we'd probably recommend using the Comparative Annotation Toolkit instead of vg. CAT is designed to annotate new assemblies using alignments and annotations on previous assemblies, and it actually thinks about things like paralogs and ortholog matching and pseudogenization. I'm not sure how well it works on e.g. MHC, but I also wouldn't lean on vg inject and vg surject and the HPRC graphs to get "reliable" annotations for the assemblies.
Maybe @ph09 or @glennhickey can speak to how well CAT's ortholog matchings are likely to agree with the HPRC graph's Minigraph-Cactus alignments?
PLEASE DO NOT MAKE SUPPORT REQUESTS HERE
Please the Biostars forum instead:
https://www.biostars.org/new/post/?tag_val=vg
Ok I will post on Biostars later.
Hello dear friends!
Thanks for developing vg such a useful and magic tool for pangenome graph.
Firstly I need to say I'm fresh to manipulating graphs due to the various formats (e.g. .vg, .xg, .gbz, .gam ...). And now, as a junior, I do need some helps:
I have a human pangenome graph with several genomes with a reference
genome_a
. And I want to see the locations of some interested genes regions in my graph like the Fig. 5d in HPRC publication. Due to the high complexity of these regions like MHC, gene annotations are not reliable for which we can just draw the gene locations from annotations. Therefore, I turned to using graph to get locally detailed and confident gene annotations. At first, I have tried this method (actually this method is following the odgi tutorial):odgi untangle
the injected graph to see the locations of genes on each pathHowever, I found that for genes having CNV, this method seems often inable to capture all gene copies (actually usually just one copy), so I have turned to finding anther useful method. As for now, I intended to:
For step 2, I initially used
vg annotate
but it seems only work for reference path (#4158). And I usedvg surject
using command:which have not got results as I write this.
Also from #4158, in which the developers suggested:
and I think I can also use this, well stupid method, to get the gene locations from the gaf file Graphaligner generated.
Emmm, I don't know whether
vg surject
I used above can generate correct alignment file containing the gene locations on each path or not. So I want to know anybody can give me some advice for my process and method or any other helpful method. Please!Best wishes! Thanks!
The text was updated successfully, but these errors were encountered: