Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EBI sequences and metadata? #72

Open
pjotrp opened this issue May 30, 2020 · 4 comments
Open

Add EBI sequences and metadata? #72

pjotrp opened this issue May 30, 2020 · 4 comments
Assignees
Milestone

Comments

@pjotrp
Copy link
Collaborator

pjotrp commented May 30, 2020

EBI has some 2K sequences we could bring in too. @AndreaGuarracino can you take a look at:

https://www.covid19dataportal.org/sequences?db=embl

@LLTommy
Copy link
Collaborator

LLTommy commented Jun 8, 2020

EBI and NCBI do sync their data, we should have most (if not all) of those sequences - if they pass the metadata QC. (e.g. https://www.ebi.ac.uk/ena/browser/view/MN996531 and https://www.ncbi.nlm.nih.gov/nuccore/MN996531)

@AndreaGuarracino
Copy link
Collaborator

I'm currently working on the sequenced samples:

https://www.ncbi.nlm.nih.gov/sra/?term=txid2697049%5BOrganism%5D

I am almost ready to start the download. I am also parsing the metadata to create a YAML file for each run_accession, so when I am ready to PR the scripts, there will be ontology-decisions to make.

@pjotrp pjotrp added this to the Go live 1.0 milestone Jun 12, 2020
@pjotrp
Copy link
Collaborator Author

pjotrp commented Oct 29, 2020

Hi @AndreaGuarracino, what is the status of this issue?

@AndreaGuarracino
Copy link
Collaborator

Technically, this is an infinite issue.

I periodically update NCBI samples (which are FASTA files), following the guide already published in the blog.

Regarding the EBI samples, to date, I have uploaded 12892 samples downloaded from SRA to the platform, all short-reads ones. The last download session was at the beginning of July, leading to~ 2.4TB of gzipped FASTQ files (both short-reads + long-reads samples). I could download new samples from SRA and push another batch.

@pjotrp pjotrp modified the milestones: Go live 1.0, Later Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants