Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose available ResourceFlavors from the ClusterQueue in the LocalQueue status. #3143

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mbobrovskyi
Copy link
Contributor

@mbobrovskyi mbobrovskyi commented Sep 26, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Expose available ResourceFlavors from the ClusterQueue in the LocalQueue status.

Which issue(s) this PR fixes:

Fixes #3122

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Exposed available ResourceFlavors from the ClusterQueue in the LocalQueue status.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Sep 26, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mbobrovskyi
Once this PR has been reviewed and has the lgtm label, please assign alculquicondor for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 26, 2024
Copy link

netlify bot commented Sep 26, 2024

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit d63f561
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66f651775d9c6200087c1d85
😎 Deploy Preview https://deploy-preview-3143--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mbobrovskyi
Copy link
Contributor Author

/cc @alculquicondor

availableFlavors := set.New[kueue.ResourceFlavorReference]()
for _, rg := range cqImpl.ResourceGroups {
for _, fl := range rg.Flavors {
if _, ok := c.resourceFlavors[fl]; ok {
Copy link
Contributor Author

@mbobrovskyi mbobrovskyi Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question, should we check that the ResourceFlavor exists? Or just enough the list of available on CQ? In the first case, we would also need to reconcile LocalQueues after updating the ResourceFlavors.

@mimowo @alculquicondor WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding a few thoughts: on running a quick example, where RF is not available, but CQ and LQ are created, the LQ status appears to be:

Status:
  Admitted Workloads:  0
  Conditions:
    Last Transition Time:  2024-09-27T02:46:32Z
    Message:               Can't submit new workloads to clusterQueue
    Observed Generation:   1
    Reason:                ClusterQueueIsInactive
    Status:                False
    Type:                  Active
  Flavor Usage:
    Name:  default-flavor
    Resources:
      Name:   cpu
      Total:  0
      Name:   memory
      Total:  0
  Flavors Reservation:
    Name:  default-flavor
    Resources:
      Name:             cpu
      Total:            0
      Name:             memory
      Total:            0
  Pending Workloads:    0
  Reserving Workloads:  0

The source of truth for populating the status of LQ is the CQ. And the LQ also explicitly bubbles up that the ClusterQueue is in inactive status. Considering the same approach here, it is probably enough to look at the CQ spec for the referenced flavours to populate the status, as we are anyway bubbling up the errors from CQ. One thing that we may change to be more explicit could be to rename the field to be - availableFlavorsInClusterQueue to be more explicit that it refers to the snapshot from the cluster queue's spec, rather than the resource flavour's status on cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that we may change to be more explicit could be to rename the field to be - availableFlavorsInClusterQueue to be more explicit that it refers to the snapshot from the cluster queue's spec, rather than the resource flavour's status on cluster?

Yeah, availableFlavorsInClusterQueue sounds better. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varshaprasad96 thank you for the input.

In the KEP I propose a slightly more flexible API as a list of extendable objects: (see related threads: one and two).

Then, we could in the future convey more information (say about the RF status). As the model is more flexible I think we don't need to necessarily indicate the source of the information in the field name. I think we could just describe the meaning in API comment, and shorten the name. WDYT?

@mbobrovskyi
Copy link
Contributor Author

/retest pull-kueue-test-integration-main

@k8s-ci-robot
Copy link
Contributor

@mbobrovskyi: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

  • /test pull-kueue-build-image-main
  • /test pull-kueue-test-e2e-main-1-28
  • /test pull-kueue-test-e2e-main-1-29
  • /test pull-kueue-test-e2e-main-1-30
  • /test pull-kueue-test-e2e-main-1-31
  • /test pull-kueue-test-integration-main
  • /test pull-kueue-test-kjobctl
  • /test pull-kueue-test-multikueue-e2e-main
  • /test pull-kueue-test-scheduling-perf-main
  • /test pull-kueue-test-unit-main
  • /test pull-kueue-verify-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-kueue-build-image-main
  • pull-kueue-test-e2e-main-1-28
  • pull-kueue-test-e2e-main-1-29
  • pull-kueue-test-e2e-main-1-30
  • pull-kueue-test-e2e-main-1-31
  • pull-kueue-test-integration-main
  • pull-kueue-test-multikueue-e2e-main
  • pull-kueue-test-scheduling-perf-main
  • pull-kueue-test-unit-main
  • pull-kueue-verify-main

In response to this:

/retest pull-kueue-test-integration-main

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mbobrovskyi
Copy link
Contributor Author

mbobrovskyi commented Sep 26, 2024

/test pull-kueue-test-integration-main

Due to #3144.

@mbobrovskyi mbobrovskyi force-pushed the feature/expose-flavors-in-local-queue-status branch from c9a0f42 to 61629ef Compare September 26, 2024 13:40
@tenzen-y
Copy link
Member

Is there any decision to implement this API in the LocalQueue level?

@alculquicondor
Copy link
Contributor

I don't see any high-level concerns. Do you have any?

@mbobrovskyi mbobrovskyi force-pushed the feature/expose-flavors-in-local-queue-status branch from 61629ef to eba6ab5 Compare September 26, 2024 23:55
@mbobrovskyi mbobrovskyi changed the title Expose available ResourceFlavors in LocalQueue Status. Expose available ResourceFlavors from the ClusterQueue in the LocalQueue status. Sep 27, 2024
@mimowo
Copy link
Contributor

mimowo commented Oct 1, 2024

I don't have any high-level concerns about the approach. However, from the process perspective it might be better to include a small one-pager KEP for it? WDYT @alculquicondor @tenzen-y ?

@alculquicondor
Copy link
Contributor

Yeah, it doesn't hurt

@tenzen-y
Copy link
Member

tenzen-y commented Oct 1, 2024

I don't have any high-level concerns about the approach. However, from the process perspective it might be better to include a small one-pager KEP for it? WDYT @alculquicondor @tenzen-y ?

Basically, I agree with this enhancement.
But, I would like to seek the possibility of API since we may want to expose other information related to ResourceFlavor something like topology or nodeSelector.

@tenzen-y
Copy link
Member

tenzen-y commented Oct 1, 2024

I don't have any high-level concerns about the approach. However, from the process perspective it might be better to include a small one-pager KEP for it? WDYT @alculquicondor @tenzen-y ?

Basically, I agree with this enhancement. But, I would like to seek the possibility of API since we may want to expose other information related to ResourceFlavor something like topology or nodeSelector.

I did not indicate that we should expose that information. But, evaluations would be worth it.

@mbobrovskyi
Copy link
Contributor Author

I don't have any high-level concerns about the approach. However, from the process perspective it might be better to include a small one-pager KEP for it? WDYT @alculquicondor @tenzen-y ?

Created #3181.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose Flavors in LocalQueue Status
7 participants