usegalaxy.org/ludwig_applications.yaml{.lock} #880

paulocilasjr · 2024-10-31T15:49:34Z

Add Ludwig-based Deep Learning Tools and Config File Generator

This PR introduces a suite of tools based on the Ludwig framework, allowing users to easily create and use deep-learning models without extensive code requirements.

Included Tools:

5 Ludwig-based tools:
Each tool serves a unique purpose within the deep learning model lifecycle, from data preprocessing to model evaluation.
Note: the tools are currently running through Docker.
1 Config File Generator:
This additional tool, though not Ludwig-based, assists users in creating the required config.yaml files. These configuration files are essential for several Ludwig tools, providing a streamlined setup process.

Installation sequence for `tool-installers`

Test using @galaxybot test this
Inspect CI output for expected changes
Deploy using @galaxybot deploy this if test install was successful
Merge this PR

natefoo

Well written tools, mostly looks good to me. A few thoughts, none of them critical:

Explicitly overriding $TMP* like this is probably not a good idea, Galaxy defaults to tmp space in the job directory, and admins often override this as needed for particular destinations or tools.
I see Ludwig has this option to disable multithreading, which is exposed in the tool as an option for reproducibility purposes. But if Ludwig has an option for controlling the number of threads, I don't see it. What may happen is it gets scheduled on a node with 64 cores but is only allocated a fraction of those, assumes it can use them all, and blows up. If there is any way to pass in the number of cores, that would be great, otherwise we might have to get creative in scheduling.
Minor, but ${dataset.element_identifier} is typically the dataset name I believe? So this can result in some weirdness when creating those symlinks, but should be safe at least since they are quoted.
Also minor but lot of those pwd calls can probably just be replaced by ., unless Ludwig changes the cwd internally.
This might fail under Pulsar, I am not sure if there is a "preferred" way of looking at tool stdout like this but the IUC channel probably has an answer.
Should this be a yaml.safe_dump()?
Quoting construction in ludwig_visualize.yml is a bit creative but I think ok, but if an IUC person has a look at that as well that would be great.

bgruening · 2024-11-04T23:11:06Z

@paulocilasjr thanks! A few comments from my side:

The tests could also use some asserts, as the simsize comparison is not very strict.
the format="auto" on outputs should be avoided if possible. It will disable certain features in workflows.
Is there any reason you need to use extra_files_path?

bernt-matthias · 2024-11-05T09:05:13Z

Use . instead of pwd
element_identifier needs to be sanitized (see here for an example)
Is ludwig_model a proper Galaxy datatype?
If you unzip here https://github.com/goeckslab/Galaxy-Ludwig/blob/00f7da552741b6d3f576833329a9e18173ae5fd8/tools/ludwig_evaluate.xml#L16 you do not have any control over the output path name - I think this should be controlled.
Seems that the tool uses multithreading but does not allow any control over it. This will lead to problems.
It would be great to add min and max to numeric parameters.
Typo: "Randonness"

usegalaxy.org/ludwig_applications.yaml{.lock}

99ced7f

paulocilasjr requested a review from a team as a code owner October 31, 2024 15:49

natefoo reviewed Nov 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usegalaxy.org/ludwig_applications.yaml{.lock} #880

usegalaxy.org/ludwig_applications.yaml{.lock} #880

paulocilasjr commented Oct 31, 2024 •

edited

Loading

natefoo left a comment

bgruening commented Nov 4, 2024

bernt-matthias commented Nov 5, 2024

usegalaxy.org/ludwig_applications.yaml{.lock} #880

Are you sure you want to change the base?

usegalaxy.org/ludwig_applications.yaml{.lock} #880

Conversation

paulocilasjr commented Oct 31, 2024 • edited Loading

Add Ludwig-based Deep Learning Tools and Config File Generator

Included Tools:

Installation sequence for tool-installers

natefoo left a comment

Choose a reason for hiding this comment

bgruening commented Nov 4, 2024

bernt-matthias commented Nov 5, 2024

paulocilasjr commented Oct 31, 2024 •

edited

Loading

Installation sequence for `tool-installers`