-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statistics pipe #10
base: main
Are you sure you want to change the base?
Statistics pipe #10
Conversation
docs/source/conf.py
Outdated
import datetime | ||
|
||
sys.path.insert(0, os.path.abspath("../../src/python/ensembl/genes/metadata")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this copied from the transcriptomic pipeline? Should it be ensembl/gene/statistics?
docs/source/conf.py
Outdated
copyright_owner = "EMBL-European Bioinformatics Institute" | ||
copyright_dates = "[2016-%d]" % datetime.datetime.now().year | ||
copyright = copyright_dates + " " + copyright_owner | ||
html_baseurl = 'https://ensembl.github.io/ensembl-genes-metadata/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be the ensembl-genes-nf repo?
docs/source/conf.py
Outdated
"Ensembl-genes-metadata Documentation.", | ||
"Miscellaneous", | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statistics documentation rather than metadata?
"--ncbi_url", | ||
type=str, | ||
help="NCBI dataset url", | ||
default="https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely to be updated quite often. Do you think it's better to have a hardcoded default or move it to config?
…ne to run the different scripts and add the metakeys into the core
The pipeline can run for multiple species at the same time Busco in genome and proteine mode as default or use a specific mode thanks to a parameter. The busco lineage is calculated getting the list of taxon id from ncbi taxonomy and choosing the closest match with the list of taxonomy id available in the busco dataset config file. If the dataset is provided as input parameter this will be used for all the species in the batch.
The genome file is downloaded by ncbi dataset.
The default input is a core db but the user can check the completeness in the genome specifying the gca only.
The pipeline can also run OMARk and the Perl statistics from a core db and upload all the results in the Ensembl Rapid ftp creating the current directory and set the permission for the folder. This might change in the future according to the new structure for Beta ftp.
The script to apply the Busco patches in Beta is still missing