Statistics pipe #10

ens-ftricomi · 2024-08-05T13:41:08Z

The pipeline can run for multiple species at the same time Busco in genome and proteine mode as default or use a specific mode thanks to a parameter. The busco lineage is calculated getting the list of taxon id from ncbi taxonomy and choosing the closest match with the list of taxonomy id available in the busco dataset config file. If the dataset is provided as input parameter this will be used for all the species in the batch.
The genome file is downloaded by ncbi dataset.
The default input is a core db but the user can check the completeness in the genome specifying the gca only.
The pipeline can also run OMARk and the Perl statistics from a core db and upload all the results in the Ensembl Rapid ftp creating the current directory and set the permission for the folder. This might change in the future according to the new structure for Beta ftp.

The script to apply the Busco patches in Beta is still missing

… from the db

…ion from the db

…m the db

…d not classification name

leannehaggerty · 2024-08-05T13:57:41Z

docs/source/conf.py

+import datetime
+
+sys.path.insert(0, os.path.abspath("../../src/python/ensembl/genes/metadata"))
+


Is this copied from the transcriptomic pipeline? Should it be ensembl/gene/statistics?

leannehaggerty · 2024-08-05T13:57:57Z

docs/source/conf.py

+copyright_owner = "EMBL-European Bioinformatics Institute"
+copyright_dates = "[2016-%d]" % datetime.datetime.now().year
+copyright = copyright_dates + " " + copyright_owner
+html_baseurl = 'https://ensembl.github.io/ensembl-genes-metadata/'


Should this be the ensembl-genes-nf repo?

leannehaggerty · 2024-08-05T13:59:04Z

docs/source/conf.py

+        "Ensembl-genes-metadata Documentation.",
+        "Miscellaneous",
+    ),
+]


Statistics documentation rather than metadata?

leannehaggerty · 2024-08-05T14:08:14Z

src/python/ensembl/genes/statistics/clade_selector.py

+        "--ncbi_url",
+        type=str,
+        help="NCBI dataset url",
+        default="https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/",


This is likely to be updated quite often. Do you think it's better to have a hardcoded default or move it to config?

src/python/ensembl/genes/statistics/clade_selector.py

…ne to run the different scripts and add the metakeys into the core

ens-ftricomi added 30 commits April 12, 2024 12:04

first commit

336370e

small fixes

877060a

fix in config

04a95dc

moved config in main dir

dbf48dd

added subworkflow configs

3bd21b7

parameters fix

f2d318f

fix false upper case

e9f7feb

fixed parameters

ef5bc79

error clean cache

7a3375b

bug fix busco workflow

866de80

adjust path for config

268dc22

wrong module name

f68323d

install dependencies

535bce5

wrong path

803f4d9

replaced take with input

8c315b0

bug fix busco output

b1d11e0

typo in varaiable declaration

bcc7819

fix variable declarations

3231ba7

removed double quotes

5bf80d5

fixed config parameters

3812837

fixed channels, busco commands, internal functions to get information…

f55f156

… from the db

omark subpipeline: fixed channels, internal functions to get informat…

e467a58

…ion from the db

fetch file: fixed channels, internal functions to get information fro…

ff3a189

…m the db

busco pipeline: redefined channels, defined parallelisation

cdc57f5

omark pipeline: redefined channels, defined parallelisation

69a346d

renamed main

cdf341d

added bin folder for scripts

6d072f6

added lib folder for mysql jar

d28d4b2

first commit ensembl statistics

dbad198

changed order of the channels

eb69010

ens-ftricomi added 3 commits May 16, 2024 17:05

added busco dataset param

96645b8

added dataset option and adjusted clade selector using taxonomy id an…

a992df4

…d not classification name

fixed busco dataset in a tuple

74cc47d

ens-ftricomi requested review from EreboPSilva, leannehaggerty, vianeyBE and swatiebi August 5, 2024 13:41

ens-ftricomi added 2 commits August 6, 2024 16:31

added plot

bbabbee

replaced diagram image

923a05f

leannehaggerty reviewed Aug 14, 2024

View reviewed changes

ens-ftricomi added 20 commits August 14, 2024 14:12

cleaning

9c5fe20

force copy in the ftp

aae5447

added busco miniprot version

d2bd020

script to get busco scores, prepare json and patches for core

e09ab26

apply busco patches to the core

8069f1e

ignore metakeys already present

b3e7196

module for loading busco score

09775e1

added option for applying patches

c7171b3

bug fixed python docker image

6ce34d7

tested busco patches

1ba1e40

removed print statements

4c71103

added repo required in enscode

13cb49e

added python script to run beta metakeys, added options in the pipeli…

aa57d6e

…ne to run the different scripts and add the metakeys into the core

bugfix wrong db name

4e52a16

cleaned unused files

3edf65a

cleaned unused files

74c2a51

updated help

608ff3f

cleaned unused modules

846789e

update option documentation

87471e3

updated plot

8996274

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistics pipe #10

Statistics pipe #10

ens-ftricomi commented Aug 5, 2024

leannehaggerty Aug 5, 2024

leannehaggerty Aug 5, 2024

leannehaggerty Aug 5, 2024

leannehaggerty Aug 5, 2024

		import datetime

		sys.path.insert(0, os.path.abspath("../../src/python/ensembl/genes/metadata"))

Statistics pipe #10

Are you sure you want to change the base?

Statistics pipe #10

Conversation

ens-ftricomi commented Aug 5, 2024

leannehaggerty Aug 5, 2024

Choose a reason for hiding this comment

leannehaggerty Aug 5, 2024

Choose a reason for hiding this comment

leannehaggerty Aug 5, 2024

Choose a reason for hiding this comment

leannehaggerty Aug 5, 2024

Choose a reason for hiding this comment