Progress on rasdaman (Deep Learning) UDFs #2

KathiSchleidt · 2023-05-16T11:39:49Z

What's the status on creating rasdaman UDFs?
The requirements were discussed in Bremen, should be clear. If not, please ask!
Details in the UC2 presentation from Bremen.

ocampos16 · 2023-05-17T08:09:28Z

@KathiSchleidt as of right now we are still working on the following:

Linking the Python pytorch implementation from Rob into the UDF mechanism. The idea is the replace the existing c++ implementation so that python can be used instead, this will definitely simplify future UDF implementations as well as reduce development time.
Saving a trained model as a collection in rasdaman for further reference from other UDFs.
Designing a catalog mechanism for listing and linking what models can be used with what UDFS.

We will keep you updated with our results as they come.

KathiSchleidt · 2023-05-17T12:39:34Z

@ocampos16

Very cool! Think being able to create Python based UDF will make this much easier for "normal" users! :)
ah... what's a collection in rasdaman?
This work should be coordinated with what @sMorrone is doing on D4.3 Processing Resource Metadata

More generally (and maybe contained in points 2&3), how can a user see what UDF are available? Or can users only access their own UDF?

ocampos16 · 2023-05-17T15:31:09Z

@KathiSchleidt

Indeed I believe the same that is why we are focusing all efforts towards this solution.
It means storing the model inside rasdaman. A collection in rasdaman is equivalent to a table in a relational database.
@sMorrone maybe we can have a quick concall to discuss how we relate your catalog with what rasdaman could provide

More generally (and maybe contained in points 2&3), how can a user see what UDF are available?
-> There is a query in rasdaman query language, rasql, that is specifically designed to list all available UDFs, regardless of the user. I believe that in a web environment using WCS, WCPS, or WMS would be preferred, this part I need to check with @pebau because this involves a standard, if not then we need to think of another solution.
Or can users only access their own UDF?
-> So far any user can access all the UDFs rasql and WCPS, is this acceptable to you?

KathiSchleidt · 2023-05-17T15:47:07Z

On providing a listing of available UDF, to my view, WCPS getCapabilitities would be my first candidate, in addition to exposing via the processing resource metadata. Please include me on the call sorting this!

On all users being able to access existing UDF, works for me. We should check with the UC partners just to be sure, but pretty sure we won't have the issues we have with sensitive data on sensitive models.

robknapen · 2023-05-17T16:04:03Z

ML models trained on sensitive data might need restricted access as well. For instance depending on the user agreement of the data (what derived products are allowed, often not clearly specified for ML models), or wether the training of the model has sufficiently hidden the sensitive (input) data points (otherwise an ML expert might be able to extract them from the model, as a kind off reverse engineering).

robknapen · 2023-05-18T08:46:48Z

@ocampos16 Out of curiosity (also relates to 'how to catalogue' and 'what might be restricted'): Do you intend to treat a trained model as a whole, or to split it up into the computational graph and the trained parameters?

pebau · 2023-05-19T11:40:10Z

@robknapen (chiming in here) dissecting a model is a rabbit hole from our perspective, and I can see no advantage - we would treat a model always as a black box.

pebau · 2023-05-19T18:09:12Z

@robknapen

ML models trained on sensitive data might need restricted access as well.

Accepted, at some time access control will be necessary - just not at this stage where we have only 1 anyway :)

KathiSchleidt · 2023-05-22T16:11:38Z

@robknapen turning @pebau statement around, do you see a situation where we provide the same model with 2 sets of trained parameters?

robknapen · 2023-05-22T18:15:14Z

Sure, for example the same CNN model that we used so far can be trained for other (semantic segmentation) tasks (similar though, since the model architecture expects 28 features as input), or it can be trained for a different region. Both would use the same model architecture (= computational graph), but learn different weights. Splitting these two is the basis for what is known as transfer learning in ML. So for inference you can have a model architecture and load it with matching weights and biases for a number of similar prediction tasks. [For sure this is more difficult to implement than a pure black box approach and there might be no short term benefits.]

Libraries such as Tensorflow, Keras, and PyTorch all have methods that support this type of working with deep learning models. The usually long training times makes it a rather common approach to quickly start experimenting.

pebau · 2023-09-05T17:34:09Z

status: pytorch-based UDFs work, Jupyterhub almost installed (need Rob's help for completion -> Mohit will contact)

KathiSchleidt · 2023-09-11T11:17:58Z

@robknapen am I correct that if you have a model trained on 2 different datasets, you'd provide this as 2 different models (most of the info the same, but different input data, maybe different spatial validity)?

robknapen · 2023-09-11T11:47:54Z

@KathiSchleidt Yes, the models learn to represent the different datasets. When they are 'too different', it will result in distinct models. When the datasets are different but still similar, a single, more robust, model can be trained on them. So there can be exceptions :-)

KathiSchleidt · 2023-09-11T16:09:24Z

@robknapen any insight as to what impact these exceptions have on the a/p resource metadata? There, we have the following fields forseen:

Input data:URI 1..* : Link to input data/metadata, helpful for a better understanding of context and domain.
Characteristics of input data: CharacterString 1 : This field contains a textual description of the main characteristics of each input data to the resource. This field will also include e.g., description of sampling techniques, version of the data (if multiple versions are available), and, in the case of ML resources, also the percentages of training, validation and testing sets. This field may contain details on the suitability of the resource for the chosen geographic area and thematic context.

Can you use these to describe what you'd need to know?

robknapen · 2023-09-11T18:26:37Z

@KathiSchleidt I think so. In some cases I would mention an existing (trained) model (or its saved weights) as ‘input data’, and use ‘characteristics’ to explain how it was used.

(Maybe we need a better minimum length for ‘characteristics’? 1 Character doesn’t seem very helpful to me. I would prefer either 0, or enforce some longer text (200+ characters?).)

KathiSchleidt · 2023-09-12T09:09:53Z

@robknapen

shouldn't we differentiate between:
- Input Data: data the model has been trained on
- Configuration/weights: how the model has been parameterized
on Characteristics of input data, this is of type CharacterString, so free text. This has worried me, as difficult to explain the individual inputs in such a block, but my request to align the cardinality with Input Data was not taken into account

@sMorrone

Should we add an entry for model configuration/weights?
Should we align the cardinality of the input data description with that of the input data?

robknapen · 2023-09-12T10:13:46Z

@KathiSchleidt Yes, we can split it into configuration/initialisation data and input (training) data, to make the difference in purpose more clear.

sMorrone · 2023-09-12T10:32:48Z

@KathiSchleidt

we agree on adding an entry for model configuration/weights & will do asap
pertaining to the "align the cardinality of the input data description with that of the input data", current solution we have implemented (a couple of months ago) is using bulleted lists in which each entry is paired with its characteristics. Pic below refers to current online a/p md request form.

When the MD is displayed in the catalog, this solution turns out in what can be seen in pic below

@robknapen @KathiSchleidt does this work for you?

pebau · 2023-12-15T09:39:45Z

Summarizing the status of rasdaman UDFs:

trained models + datacube regions of interest can be passed for evaluation to pytorch using the UDF mechanism; the corresponding UDF package nn is deployed, it offers function predict() for this purpose.
general python UDFs can be created through a create function statement and copying the code into the rasdaman UDF space (those users who have worked on this already have a login, other prospective users please contact us to create a login).
an example model provided by WER has been deployed as a proof of concept on https://fairicube.rasdaman.com .

Let me know if you feel something missing on pytorch UDFs.

jetschny · 2024-01-19T09:04:59Z

Jivitesh is now assigned to look into the python UDF implementation (testing and verification). this will provide another UC view and can serve as validation.

jetschny · 2024-05-28T08:08:34Z

in light of the new issue which formulate the requirements for more ML models in short

#57

I will close this ticket.

KathiSchleidt assigned robknapen, pebau, ocampos16 and KathiSchleidt May 16, 2023

robknapen changed the title ~~Progress on rasdaman UDFs~~ Progress on rasdaman (Deep Learning) UDFs May 16, 2023

ocampos16 assigned sMorrone May 17, 2023

sMorrone assigned cozzolinoac11 May 17, 2023

jetschny assigned jivitesh-sharma Jan 19, 2024

robknapen unassigned ocampos16 Jan 19, 2024

jetschny mentioned this issue Jan 19, 2024

Python for rasdaman UDFs #14

Closed

jetschny added the FAIRiCUBE Hub FAIRiCUBE Hub main interface development label Feb 7, 2024

jetschny closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress on rasdaman (Deep Learning) UDFs #2

Progress on rasdaman (Deep Learning) UDFs #2

KathiSchleidt commented May 16, 2023

ocampos16 commented May 17, 2023

KathiSchleidt commented May 17, 2023

ocampos16 commented May 17, 2023 •

edited

Loading

KathiSchleidt commented May 17, 2023

robknapen commented May 17, 2023

robknapen commented May 18, 2023

pebau commented May 19, 2023

pebau commented May 19, 2023

KathiSchleidt commented May 22, 2023

robknapen commented May 22, 2023

pebau commented Sep 5, 2023

KathiSchleidt commented Sep 11, 2023

robknapen commented Sep 11, 2023

KathiSchleidt commented Sep 11, 2023

robknapen commented Sep 11, 2023

KathiSchleidt commented Sep 12, 2023

robknapen commented Sep 12, 2023

sMorrone commented Sep 12, 2023 •

edited

Loading

pebau commented Dec 15, 2023

jetschny commented Jan 19, 2024

jetschny commented May 28, 2024

Progress on rasdaman (Deep Learning) UDFs #2

Progress on rasdaman (Deep Learning) UDFs #2

Comments

KathiSchleidt commented May 16, 2023

ocampos16 commented May 17, 2023

KathiSchleidt commented May 17, 2023

ocampos16 commented May 17, 2023 • edited Loading

KathiSchleidt commented May 17, 2023

robknapen commented May 17, 2023

robknapen commented May 18, 2023

pebau commented May 19, 2023

pebau commented May 19, 2023

KathiSchleidt commented May 22, 2023

robknapen commented May 22, 2023

pebau commented Sep 5, 2023

KathiSchleidt commented Sep 11, 2023

robknapen commented Sep 11, 2023

KathiSchleidt commented Sep 11, 2023

robknapen commented Sep 11, 2023

KathiSchleidt commented Sep 12, 2023

robknapen commented Sep 12, 2023

sMorrone commented Sep 12, 2023 • edited Loading

pebau commented Dec 15, 2023

jetschny commented Jan 19, 2024

jetschny commented May 28, 2024

ocampos16 commented May 17, 2023 •

edited

Loading

sMorrone commented Sep 12, 2023 •

edited

Loading