Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline support for Wave in Nextflow #323

Open
ewels opened this issue Nov 7, 2023 · 6 comments
Open

Offline support for Wave in Nextflow #323

ewels opened this issue Nov 7, 2023 · 6 comments

Comments

@ewels
Copy link
Member

ewels commented Nov 7, 2023

Wave in Nextflow is beautifully simple - no need to define container URIs, just the conda package names and we get everything for free. However, for wide adoption (or at least, adoption in @nf-core), we need to support offline usage of pipelines.

For offline work, the process is typically as follows:

  • On an online system:
    • Download Nextflow + required plugins
    • Download pipeline + configs etc
    • Download container images
  • Transfer to an offline system
  • Run

This hinges on Nextflow checking the local container cache (eg. NXF_SINGULARITY_CACHE) for images before attempting to download them. Things like Singularity container filenames are predictable so it's easy for us to wrap download functionality into tooling like nf-core download and make sure that they are available.

However, this assumption breaks with Wave. Currently, Nextflow needs to reach out to the Wave service (online) to find out the container URI and resulting local cache filename. So without an internet connection, it doesn't know where to check locally.

As I see it, we have two options:

  • We give functionality to the Nextflow Wave plugin to be able to figure out container URIs within plugin logic, therefore working offline. This would mean that the container URIs could be built offline and everything would work.
    • Pros: Avoids pinging the Wave service by default when local caches are available. Less stress on the Wave service and more robust in case of downtime.
    • Cons: Potentially lots of work, some features such as auth strings will not work offline.
  • We put the onus on @nf-core instead, building functionality into nf-core download to write container URIs to a Nextflow config file, fetch the container images, and bundle this config with the pipeline somehow so that it works without further configuration by the users.
    • Pros: Likely nothing to do on the Wave / Nextflow side 👀
    • Cons: Less flexible and generic, (mostly) specific to nf-core
@edmundmiller
Copy link

edmundmiller commented Nov 7, 2023

I think nextflow inspect does that:

$ nextflow inspect main.nf -profile local

{
    "processes": [
        {
            "name": "r2_CELL_CYCLE_SCORING_AND_PCA",
            "container": "wave.seqera.io/wt/4fc019059a1f/wave/build:create_objects--c32b27bc3124db00"
        },
        ...

So we just hook nextflow inspect into nf-core download. When they're running `nf-core download, they should have an internet connection, right? Worse case we export the containers on release and commit the json updates to the repos!

@ewels
Copy link
Member Author

ewels commented Nov 7, 2023

Yeah exactly, that's essentially my option 2 - fetch the container URIs at the point of download (or release) and have an associated config file that specifies the container URIs.

It basically means that offline users won't be using Wave at all, it's just a regular Nextflow run with containers as usual, but maybe this is the best solution.. My main issue with it is that it forces people to use nf-core download.

@pditommaso
Copy link
Contributor

I'm inclined to option 2 too. nextflow inspect command was made keeping this possibility in mind.

@edmundmiller
Copy link

It basically means that offline users won't be using Wave at all, it's just a regular Nextflow run with containers as usual, but maybe this is the best solution.. My main issue with it is that it forces people to use nf-core download.

Would users need to use wave at all, besides checking whether an image has been created? I was having that issue where it was returning the image name before it even got built (ie quay.io/nf-core/modules/bowtie:bowtie-1.3.0_samtools-1.16.1--82705d624eee2198). So it should be able to go out and look for that image(I'm guessing right now it's auth-ing with the repo through Tower Platform).

But if we could tweak the behavior slightly (it might already be this):

  1. Check if the image repo is public
  2. If the repo is private, auth through platform, and then try to download.

@edmundmiller
Copy link

What if we ran nextflow inspect in CI in the pipelines on release, and had a containers.json that got generated.

Every single commit wouldn't be reproducible, but the releases would be able to be nf-core downloadable.

I think that's a good compromise. It would vastly simplify the container downloading logic from nf-core download

@edmundmiller
Copy link

seqeralabs/nf-aggregate#43 Basically this 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants