Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] For loops in nested pipelines have clashing component names #11484

Open
m-dz opened this issue Dec 23, 2024 · 0 comments
Open

[bug] For loops in nested pipelines have clashing component names #11484

m-dz opened this issue Dec 23, 2024 · 0 comments

Comments

@m-dz
Copy link

m-dz commented Dec 23, 2024

Environment

  • Python 3.10.13

  • KFP SDK version:
    2.11.0 (current on 23/12/2014)

Steps to reproduce

Simple example with:

  • 1 "main pipeline" having a loop which runs "pipeline sub"
  • "Pipeline sub" with 2 pipelines "aaa" and "bbb" with for loops each

What I think is happening when pipelines are being merged:

  1. Pipeline "aaa" has a for loop which is renamed to for-loop-1 (irrespective of its name in the pipeline definition)
  2. This aaa is merged to the "main" pipeline, which already has component specs for "for-loop-1", and now is getting new comp specs for "for-loop-1-2"
  3. Pipeline "bbb" has a for loop which is again renamed to for-loop-1 (again, pipeline definition name is ignored)
  4. "bbb" is merged to "main", which OVERWRITES comp specs for "for-loop-1-2"
  5. Now, we have the following components:
    • "comp-for-loop-1"
    • "comp-for-loop-1-2" with "bbb"'s task
    • "comp-pipeline-aaa" referring to "comp-for-loop-1-2" (incorrectly)
    • "comp-pipeline-bbb" referring to "comp-for-loop-1-2" again

Expected result

  1. Each for loop will have its own component
  2. Each for loop task will refer to a correct component

Expected behaviour can be obtained by making two relatively simple changes in the TasksGroup class:

  1. Adding self.name = name in https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/tasks_group.py#L69 , which provides a name for the component to use
  2. Removing the _make_name_unique method (https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/tasks_group.py#L87) from it, which stops the name to be overwritten

Materials and reference

The following 2 files reproduce the issue:

# pipelines/pipeline_main.py

from typing import List

from kfp.compiler import compiler
from kfp.dsl import pipeline, component, ParallelFor

from pipelines.pipeline_sub import pipeline_sub


@component(base_image='python:3.10')
def get_ids_for_main_loop() -> List[int]:
    return [1,2,3]


@pipeline(name="main")
def main_pipeline():
    get_ids_for_main_loop_task = get_ids_for_main_loop()

    # This is getting `comp-for-loop-1`
    main_loop = ParallelFor(
        items=get_ids_for_main_loop_task.output,
        name="main-loop-zzz",
    )
    with main_loop as _ignore:
        pipeline_sub()


if __name__ == '__main__':
    compiler.Compiler().compile(
        pipeline_func=main_pipeline,
        package_path='pipeline_main.yaml',
    )

# pipelines/pipeline_sub.py

from typing import List

from kfp.dsl import ParallelFor, component, pipeline


@component(base_image='python:3.10')
def get_ids_for_loop_aaa() -> List[int]:
    return [1,2,3]


@component(base_image='python:3.10')
def get_ids_for_loop_bbb() -> List[int]:
    return [7,8,9]


@component(base_image='python:3.10')
def print_ids_aaa(_id: int):
    print(f'Id aaa: {_id}')


@component(base_image='python:3.10')
def print_ids_bbb(_id: int):
    print(f'Id bbb: {_id}')


@pipeline(name="pipeline_aaa")
def pipeline_aaa():
    get_ids_for_loop_aaa_task = get_ids_for_loop_aaa()

    loop_aaa = ParallelFor(
        items=get_ids_for_loop_aaa_task.output,
        name="loop-aaa-zzz",
    )
    with loop_aaa as _id:
        print_ids_aaa(_id=_id)


@pipeline(name="pipeline_bbb")
def pipeline_bbb():
    get_ids_for_loop_bbb_task = get_ids_for_loop_bbb()

    loop_bbb = ParallelFor(
        items=get_ids_for_loop_bbb_task.output,
        name="loop-bbb-zzz",
    )
    with loop_bbb as _id:
        print_ids_bbb(_id=_id)


@pipeline(name="pipeline_sub")
def pipeline_sub():
    pipeline_aaa_task = pipeline_aaa()  # This is getting `comp-for-loop-1-2` when merged to main
    pipeline_bbb_task = pipeline_bbb()  # This is OVERWRITING `comp-for-loop-1-2` when merged to main...
    pipeline_bbb_task.after(pipeline_aaa_task)

Labels

/area components


Impacted by this bug? Give it a 👍.

@m-dz m-dz added the kind/bug label Dec 23, 2024
@m-dz m-dz changed the title [bug] <Bug Name> [bug] For loops in nested pipelines have clashing names Dec 23, 2024
@m-dz m-dz changed the title [bug] For loops in nested pipelines have clashing names [bug] For loops in nested pipelines have clashing component names Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant