Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLFLOW MR & HuggingFace & KFMR logical model comparison #94

Open
2 of 3 tasks
mzhl1111 opened this issue May 15, 2024 · 5 comments
Open
2 of 3 tasks

MLFLOW MR & HuggingFace & KFMR logical model comparison #94

mzhl1111 opened this issue May 15, 2024 · 5 comments
Assignees

Comments

@mzhl1111
Copy link

mzhl1111 commented May 15, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like
finalize the design for schema of middle layer that transform outer MR logical model to KFMR
MLFlow ref

Schema KFMR MLFlow Existed Huggingface integratoin
Registered Model Git repo, CR repo/name, Hugging Face Repo, Name String Name string + (mlflow artifact)version can offer a run id (can be seen as a repo inside mlflow) Hugging Face repo
Model Version A git tag, A container image tag, string represent user input user input
Model Artifact ONIX, pickle, .ckpt file download by API(pass the URL) down load with API
Doc Artifact A README.md file If exists as an artifact of the same run, can be downloaded If exists in the repo, can be downloaded

Discussion needed

  • Mlflow has multiple flavors and when storing artifacts it automatically generates a folder that includes:
    1. files for environment setup, including requirements.txt, pythonvenv.yaml and conda.yaml. Do we need this type of data for Kserve?
    2. MLmodel file that stores model metadata including the schema that can convert to model_format, model_format_version, do we grab these inform from the artifact or keep it as required user-input
      for example: image
    3. The model meta data can be retrieved using :
model_version_details = client.get_model_version(name='some_model_name', version=int(some_ver))
model = mlflow.pyfunc.load_model(model_version_details.source)
model.metadata.flavors['some_flavor'] # there are 2 keys for flavors use the one other than "python_function"}
image image

Note: the structure inside the map can be different

  • assessment of mapping table of "MLFlow flavour" compared to "Kserve model format"

    1. There is no documation about what if a "flavour" is supported by the Mlflow Model Registry, it is focused on if a specific version of this "flavour" is compatible with auto_logging.
  • Accessibility of Mlflow download link for downstream usages of the URI (ex, this format "mlflow-artifacts:/0/ebbbb937c23449d695f8146c4a8241ff/artifacts/sklearn-model")

    1. Not possible to directly use the so called atrifact_uri that mlflow offers Reason
    2. new finding that can try https://stackoverflow.com/a/71688558
    3. If we assume the user sets default-artifact-root to public storage like s3
mlflow server \
  --backend-store-uri postgresql://user:password@localhost:5432/mlflowdb \
  --artifacts-destination s3://bucket \
  --host 0.0.0.0 \
  --port 5000 
image The download URL is can be get directly image
@mzhl1111
Copy link
Author

/assign @mzhl1111

dhirajsb added a commit to dhirajsb/model-registry-kfp that referenced this issue May 22, 2024
* Create dependabot.yml

Enable dependabot version updates.

* Update dependabot.yml

Added Python pip package-ecosystem.
@mzhl1111
Copy link
Author

mzhl1111 commented May 22, 2024

draft version "mzhl1111@abd4242"

updated (unit test added)
mzhl1111@a5a2edd

tarilabs pushed a commit to tarilabs/model-registry that referenced this issue Jul 25, 2024
[pull] main from kubeflow:main
@mzhl1111
Copy link
Author

mzhl1111 commented Aug 5, 2024

To track the model artifact from mlflow, we need to follow the following steps

get the "Source" URL in mlflow

This can be done directly via the mlflow Api (Green), but lead us only to the parent directory of the model artifact(gray). The image is an example of a registered model with Pytorch format.
image
what we want is the file in the blue file, thus we need the metadata
image

To locate the model artifact we need to contact the source URL and the filename we get from the metadata.

Issue we have.

Different model format/flavor have different info-storing structures of the metadata.

The following is sklearn format, and it uses different keys in metadata to store the filename of the model artifact
image

model artifact name in metadata may change

I found the touch model artifact changed from pickled_model.pth to model.pth, but now this is hard-coded in our code, because we can only get '/data' from metadata and concat with source URL.

If we use the mlflow Runtime as the downstream format for Kserve to use, which they support, we do not need to do these hard coding parts to get an absolute directory of the model artifact.

Copy link

github-actions bot commented Nov 4, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 24, 2024
@tarilabs tarilabs reopened this Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants