Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling sopa to run many images #103

Open
josenimo opened this issue Aug 5, 2024 · 5 comments
Open

Scaling sopa to run many images #103

josenimo opened this issue Aug 5, 2024 · 5 comments

Comments

@josenimo
Copy link

josenimo commented Aug 5, 2024

Dear @quentinblampey

I am in the process of changing the provided Snakefile to run many samples (images) at once.
I am replacing the inputs and outputs paths to include the {sample} wildcard.
I however run into an issue with get_input_resolve since it handles all the patches itself.
Do you have any suggestions here?
Should I create my own get_input_resolve to include the {sample} wildcard?

I tried replacing the paths to relative paths with wildcards, but the ChildIOException starts appearing..
Any thoughts are welcome, current snakefile below:

Snakefile
configfile: "ome_tif.yaml"

from utils import WorkflowPaths, Args

paths = WorkflowPaths(config)
args = Args(paths, config)

SAMPLES = ["1", "2", "3", "4"]

localrules: all

rule all:
    input:
        expand("data/quantification/{sample}_quant.csv", sample=SAMPLES)
    shell:
        """
        echo 🎉 Successfully run sopa
        echo → SpatialData output directory: {paths.sdata_path}
        echo → Explorer output directory: {paths.explorer_directory}
        echo → Open the result in the explorer: 'open {paths.explorer_experiment}'
        """

rule prepare_image:
    input:
        raw_image= expand("data/input/{sample}.ome.tif", sample=SAMPLES)    
        # raw_image=paths.data_path if config["read"]["technology"] != "uniform" else []
    output:
        projection= "data/projections/{sample}_projection.ome.tif"
        # projection="data/projections/small_image_projection.ome.tif"
    conda:
        "sopa"
    params:
        nuclei_channels=config["projection"]["nuclei_channels"],
        nuclear_scaling_min_max_quantiles=config["projection"]["nuclear_scaling_min_max_quantiles"],
        nuclear_projection_quantile=config["projection"]["nuclear_projection_quantile"],
        membrane_channels=config["projection"]["membrane_channels"],
        membrane_scaling_min_max_quantiles=config["projection"]["membrane_scaling_min_max_quantiles"],
        membrane_projection_quantile=config["projection"]["membrane_projection_quantile"]
    shell:
        """
        python scripts/project.py \
        --input {input.raw_image} \
        --output {output.projection} \
        --channels_nucleus {params.nuclei_channels} \
        --nucleus_min_max_scaling_quantiles {params.nuclear_scaling_min_max_quantiles} \
        --nuclear_quantile {params.nuclear_projection_quantile} \
        --channels_membrane {params.membrane_channels} \
        --membrane_min_max_scaling_quantiles {params.membrane_scaling_min_max_quantiles} \
        --membrane_quantile {params.membrane_projection_quantile}
        """ 


rule to_spatialdata:
    input:
        projected_image="data/projections/{sample}_projection.ome.tif"
    output:
        zarr_object = "data/zarrs/{sample}.zarr"
    conda:
        "sopa"
    resources:
        mem_mb=128_000,
    params:
        args_reader = str(args['read'])
    shell:
        """
        sopa read {input.projected_image} --sdata-path {output.zarr_object} {params.args_reader}
        """

checkpoint patchify_cellpose:
    input:
        zarr_object = "data/zarrs/{sample}.zarr"
    output:
        patches_file = "data/zarrs/{sample}.zarr/.sopa_cache/patches_file_image",
        patches = touch("data/zarrs/{sample}.zarr/.sopa_cache/patches"),
    params:
        args_patchify = str(args["patchify"].where(contains="pixel")),
    conda:
        "sopa"
    shell:
        """
        sopa patchify image {input.zarr_object} {params.args_patchify}
        """

rule patch_segmentation_cellpose:
    input:
        zarr_object = "data/zarrs/{sample}.zarr",
        patches_file = "data/zarrs/{sample}.zarr/.sopa_cache/patches_file_image",
        patches = "data/zarrs/{sample}.zarr/.sopa_cache/patches"
    output:
        cellpose_boundaries = "data/zarrs/{sample}.zarr/.sopa_cache/cellpose_boundaries/", #not sure if this is correct, but to signal the order of events
        cellpose_tmp_dir = "data/zarrs/{sample}.zarr/.sopa_cache/cellpose_boundaries/{wildcards.index}.parquet"
    conda:
        "sopa"
    params:
        args_cellpose = str(args["segmentation"]["cellpose"]),
    shell:
        """
        sopa segmentation cellpose {input.zarr_object} --patch-dir {output.cellpose_tmp_dir} --patch-index {wildcards.index} {params.args_cellpose}
        """


def get_input_resolve(name, dirs=False):
    def _(wilcards):
        with getattr(checkpoints, f"patchify_{name}").get(**wilcards).output.patches_file.open() as f:
            return paths.cells_paths(f.read(), name, dirs=dirs)
    return _

rule resolve_cellpose:
    input:
        zarr_object = "data/zarrs/{sample}.zarr",
        cellpose_tmp_dir = "data/zarrs/{sample}.zarr/.sopa_cache/cellpose_boundaries/"
    output:
        touch("data/zarrs/{sample}.zarr/.sopa_cache/cellpose_boundaries_done"),
    conda:
        "sopa"
    shell:
        """
        sopa resolve cellpose {input.zarr_object} --patch-dir {input.cellpose_tmp_dir}
        """

rule rasterize:
    input:
        zarr_object = "data/zarrs/{sample}.zarr",
        cellpose_boundaries = "data/zarrs/{sample}.zarr/.sopa_cache/cellpose_boundaries_done" #to ensure to run after segmentation is done
    output:
        mask_tif="data/masks/{sample}_mask.tif"
    conda:
        "sopa"
    shell:
        """
        mkdir -p ./data/masks

        python scripts/rasterize.py \
        --input {input.zarr_object} \
        --output {output.mask_tif}
        """

rule quantify:
    input:
        raw_image= expand("data/input/{sample}.ome.tif", sample=SAMPLES),
        mask_tif="data/masks/{sample}_mask.tif",
        markers = "data/input/markers.csv"
    output:
        quantification = "data/quantification/{sample}_quant.csv"
    conda:
        "sopa"
    params:
        math = config["quantify"]["math"],
        quantile = config["quantify"]["quantile"],
    shell:
        """
        mkdir -p ./data/quantification

        python scripts/quant.py \
        --image {input.raw_image} \
        --label {input.mask_tif} \
        --markers {input.markers} \
        --output {output.quantification} \
        --math {params.math} \
        --quantile {params.quantile}
        """
@quentinblampey
Copy link
Collaborator

Hello,

Have you updated utils.py? I think that adding a wildcard in self.sdata_path here should do the trick, as everything else depends on this variable. But I'm not sure if you'll have a ChildIOException...
Let me know how it goes!

@josenimo
Copy link
Author

josenimo commented Aug 5, 2024

I have not updated utils.py, will do that :)
You mean hacking the utils.py or just listing all the data_paths in snakefile and running it through that (this seems the simplest)

To be explicit:

SAMPLES = ["/data/input/1.ome.tif", "/data/input/2.ome.tif", "/data/input/3.ome.tif", "/data/input/4.ome.tif"]
rule prepare_image:
    input:
        raw_image= expand("{sample}", sample=SAMPLES)

@quentinblampey
Copy link
Collaborator

I meant something like self.sdata_path = Path("data/zarrs/{sample}.zarr") in utils.py:L43, but I'm not sure it would work

@josenimo
Copy link
Author

josenimo commented Aug 9, 2024

Hey @quentinblampey,

So I have tried a couple of things.
First I tried your suggestion, or trying to pass a {sample} wildcard to the paths creator. Snakemake just creates one set of paths, and thus fails, I am not sure if snakemake is compatible with this kind of object creation setup.
I also tried replacing the paths object, with a .yaml file that could be read by the functions, I managed to make this work for my homebrewed functions, but did not dare to modify the sopa functions to accept a yaml file instead of a path directly.
Not sure where to go from here. At the moment I am running the analysis one image at a time.
The only idea I have left is to perhaps run the sopa python functions directly in the snakefile, haven't given it much thought though.

Happy to hear your comments.

@quentinblampey
Copy link
Collaborator

Hey, sorry to hear it's still not working. I have to say I still find Snakemake confusing sometimes, so I don't really know how to make this work without many trials and errors.

One question: why do you want to run everything at once? I personally prefer to run Sopa on one slide at a time; this way, if one slide fails, it will not make the other images fail.

Also, I personally use InquirerPy to prompt and execute the pipeline, which makes it very easy to run multiple slides. See this example here, which asks to choose a config and an existing data directory. With a similar idea, you could also use a multi-select to run multiple snakemake commands at once. I also made this prompt available for other users of my institute, so it's pretty easy for them to run Sopa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants