Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics pipe #10

Open
wants to merge 72 commits into
base: main
Choose a base branch
from
Open

Statistics pipe #10

wants to merge 72 commits into from

Conversation

ens-ftricomi
Copy link
Contributor

The pipeline can run for multiple species at the same time Busco in genome and proteine mode as default or use a specific mode thanks to a parameter. The busco lineage is calculated getting the list of taxon id from ncbi taxonomy and choosing the closest match with the list of taxonomy id available in the busco dataset config file. If the dataset is provided as input parameter this will be used for all the species in the batch.
The genome file is downloaded by ncbi dataset.
The default input is a core db but the user can check the completeness in the genome specifying the gca only.
The pipeline can also run OMARk and the Perl statistics from a core db and upload all the results in the Ensembl Rapid ftp creating the current directory and set the permission for the folder. This might change in the future according to the new structure for Beta ftp.

The script to apply the Busco patches in Beta is still missing

import datetime

sys.path.insert(0, os.path.abspath("../../src/python/ensembl/genes/metadata"))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this copied from the transcriptomic pipeline? Should it be ensembl/gene/statistics?

copyright_owner = "EMBL-European Bioinformatics Institute"
copyright_dates = "[2016-%d]" % datetime.datetime.now().year
copyright = copyright_dates + " " + copyright_owner
html_baseurl = 'https://ensembl.github.io/ensembl-genes-metadata/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the ensembl-genes-nf repo?

"Ensembl-genes-metadata Documentation.",
"Miscellaneous",
),
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statistics documentation rather than metadata?

"--ncbi_url",
type=str,
help="NCBI dataset url",
default="https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to be updated quite often. Do you think it's better to have a hardcoded default or move it to config?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants