Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platform tutorial: nf-core/rnaseq full #131

Open
wants to merge 49 commits into
base: master
Choose a base branch
from

Conversation

llewellyn-sl
Copy link
Contributor

Adds an nf-core/rnaseq 1-page tutorial, including compute environment recommendation and config, importing via Seqera Pipelines, adding data via DE and datasets, pipeline launch and monitoring, results analysis with Data Studios, pipeline optimization, and pipeline requirements based on input dataset size and benchmarking.

@llewellyn-sl llewellyn-sl self-assigned this Jul 23, 2024
Copy link

netlify bot commented Jul 23, 2024

Deploy Preview for seqera-docs ready!

Name Link
🔨 Latest commit 143eb56
🔍 Latest deploy log https://app.netlify.com/sites/seqera-docs/deploys/6702e8b1e0e04d00083d31a9
😎 Deploy Preview https://deploy-preview-131--seqera-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@llewellyn-sl llewellyn-sl marked this pull request as ready for review October 1, 2024 18:55
Copy link
Contributor

@adamrtalbot adamrtalbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I like the flow but it starts to drift into a descriptive document. This means it becomes quite verbose and isn't obvious what steps to do next.

I would tighten up the second half to be more focused. You could probably achieve this by hacking bits out without adding too much so I don't think it's a huge job.

Comment on lines 500 to 512
```console
# Create MDS plot
# a. Display in RStudio
plotMDS(y, col=as.numeric(factor(targets$Group)), labels=targets$Group)
legend("topright", legend=levels(factor(targets$Group)),
col=1:nlevels(factor(targets$Group)), pch=20)

# b. Save MDS plot to file (change `png` to `pdf` to create a PDF file)
png("MDS_plot.png", width = 800, height = 600)
plotMDS(y, col=as.numeric(factor(targets$Group)), labels=targets$Group)
legend("topright", legend=levels(factor(targets$Group)),
col=1:nlevels(factor(targets$Group)), pch=20)
dev.off()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's format the R code. You should be able to use cmd+shift+r in Rstudio or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamrtalbot I need to run the R script again to used salmon.merged.gene_counts_length_scaled.tsv either way, so will reformat while I'm in there. Just to confirm, by reformat you mean more than just swapping out console with r in the Markdown code blocks, right?

Comment on lines 653 to 659
| **Pipeline step** | **Tools** | **Resource needs** | **Description** |
|-------------------------------------|---------------------------|------------------------------|---------------------------------------------------------------------------------------------------|
| **Quality Control (QC)** | FastQC, MultiQC | Moderate CPU, low memory | Initial quality checks of raw reads to assess sequencing quality and identify potential issues. |
| **Read Trimming** | Trim Galore! | Moderate CPU, moderate memory| Removal of adapter sequences and low-quality bases to prepare reads for alignment. |
| **Read Alignment** | HISAT2, STAR | High CPU, high memory | Alignment of trimmed reads to a reference genome, typically the most resource-intensive step. |
| **Quantification** | featureCounts, Salmon | Moderate CPU, moderate memory| Counting the number of reads mapped to each gene or transcript to measure expression levels. |
| **Differential Expression Analysis**| DESeq2, edgeR | Low CPU, moderate memory | Statistical analysis to identify genes with significant changes in expression between conditions. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just put real numbers here?

1. Read and convert the count data and sample information:

:::info
Replace `<PATH_TO_YOUR_COUNTS_FILE>` and `<PATH_TO_YOUR_SAMPLE_INFO_FILE>` with the paths to your `salmon.merged.gene_counts.tsv` and `sampleinfo.txt` files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how pedantic we want to be here about correctness in analysis.

There can be important differences in effective gene length across conditions, which we normally account for in analysis. Using salmon.merged.gene_counts.tsv will just ignore that effect.

We can either model those length differences (preferable, but would probably add some unnecessary complexity here), or otherwise just use salmon.merged.gene_counts_length_scaled.tsv (which is probably the simplest thing to do here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does RNA-Seq produce transcript-level estimates of gene quantification we can use (tximport)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow does give you the raw outputs from Salmon / Kallisto, which are at the transcript level, and which is what tximport reads.

But it also uses tximport internally to produce those count matrices (salmon.merged.gene_counts_length_scaled.tsv, salmon.merged.gene_counts.tsv), and provides gene lengths (salmon.merged.gene_lengths.tsv) that can be used as offsets directly in downstream analysis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd err on the side of pedantic. So need to run the script again with salmon.merged.gene_counts_length_scaled.tsv and update the GIFs and steps. Thanks for the detailed feedback gents!

llewellyn-sl and others added 13 commits October 4, 2024 12:27
Co-authored-by: Jonathan Manning <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
Signed-off-by: Llewellyn vd Berg <[email protected]>
@@ -18,7 +18,7 @@ Platform offers two methods to import pipelines to your workspace Launchpad —

![Seqera Pipelines overview](assets/seqera-pipelines-overview.gif)

To import the `nf-core/rnaseq` pipeline:
To import the `nf-core-rnaseq` pipeline:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To import the `nf-core-rnaseq` pipeline:
To import the `nf-core/rnaseq` pipeline:

The gif above shows nf-core/rnaseq as the title, is there a reason you wanted to change it to nf-core-rnaseq?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants