Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate DRA job configs from a Jinja template #34010

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bart0sh
Copy link
Contributor

@bart0sh bart0sh commented Dec 19, 2024

This is a quick&dirty attempt to generate dynamic-resource-allocation-canary.yaml from the Jinja template, mostly RFC.

/cc @pohly @kannon92 @SergeyKanzhelev @haircommander

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/config Issues or PRs related to code in /config size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Dec 19, 2024
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 7c3a83c to 2f75bbd Compare December 19, 2024 13:37
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 3 times, most recently from d030275 to c5999e8 Compare December 19, 2024 14:13
[ci-node-e2e-cgrpv1-crio-dra]
job_type = pr
description = Runs E2E node tests for Dynamic Resource Allocation beta features with CRI-O using cgroup v1
cluster = k8s-infra-prow-build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for not using the eks-prow-build-cluster?

If not, then cluster can go to DEFAULT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason is they are used this way in the current job. I'l get rid of cluster variable and use eks-prow-build-cluster for all jobs.

BTW, there is a difference in the kind jobs:

@@ -80,20 +74,15 @@
         command:
         - runner.sh
         args:
-        - /bin/bash
+        - /bin/sh
         - -xc
-        - |
-          set -ex
-          make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test"
-          curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind
-          kind build node-image --image=dra/node:latest .
-          trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT
-          # Which DRA features exist can change over time.
-          features=( $(grep '"DRA' pkg/features/kube_features.go | sed 's/.*"\(.*\)"/\1/') )
-          echo "Enabling DRA feature(s): ${features[*]}."
-          # Those additional features are not in kind.yaml, but they can be added at the end.
-          kind create cluster --retain --config <(cat test/e2e/dra/kind.yaml; for feature in ${features}; do echo "  ${feature}: true"; done) --image dra/node:latest
-          KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=1h hack/ginkgo-e2e.sh -ginkgo.label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Alpha, Beta, DynamicResourceAllocation$(for feature in ${features}; do echo , ${feature}; done)} && !Flaky && !Slow"
+        - >
+          make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test" &&
+          curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind &&
+          kind build node-image --image=dra/node:latest . &&
+          trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT &&
+          kind create cluster --retain --config test/e2e/dra/kind.yaml --image dra/node:latest &&
+          KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=2h30m hack/ginkgo-e2e.sh -ginkgo.label-filter='Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky'

Is it possible to use the same arguments for both? If so, which one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unified that in #33993 with an if check:

if ${with_all_features:-false}; then
# Which DRA features exist can change over time.
features=( $(grep '"DRA' pkg/features/kube_features.go | sed 's/.*"\(.*\)"/\1/') )
echo "Enabling DRA feature(s): ${features[*]}."
# Those additional features are not in kind.yaml, but they can be added at the end.
kind create cluster --retain --config <(cat test/e2e/dra/kind.yaml; for feature in ${features}; do echo " ${feature}: true"; done) --image dra/node:latest
KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=1h hack/ginkgo-e2e.sh -ginkgo.label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Alpha, Beta, DynamicResourceAllocation$(for feature in ${features}; do echo , ${feature}; done)} && !Flaky && !Slow"
else
kind create cluster --retain --config test/e2e/dra/kind.yaml --image dra/node:latest
KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=2h30m hack/ginkgo-e2e.sh -ginkgo.label-filter='Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky'
fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, applied. PTAL.

# on a kind cluster with containerd updated to a version with CDI support.
#
# Compared to ci-kind-dra, this one enables all DRA-related features.
[ci-kind-dra-all]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it so that we have common settings for normal periodics, normal presubmits, and canary presubmits?

There's still going to be a lot of duplication if we have to have three copies of this section and the ones below.

The same applies to the actual .jinja template. The entries in the periodics and presubmits should be built from a single source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This makes sense. Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Now gen.py generates 3 files: dynamic-resource-allocation-canary.yaml, dynamic-resource-allocation-pull.yaml and dynamic-resource-allocation-ci.yaml from dynamic-resource-allocation.conf and dynamic-resource-allocation.jinja

PTAL.

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 3 times, most recently from 3259e4d to 499379c Compare December 20, 2024 12:38
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 20, 2024
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 499379c to 2e1e253 Compare December 20, 2024 15:08
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 20, 2024
Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very promising.

How to solve indention was my biggest concern when thinking about how to use Jinja. I am not sure whether this is addressed here (need to check test results).

# limitations under the License.

.PHONY: generate
generate-jobs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't match.

job_type = node
description = Runs E2E node tests for Dynamic Resource Allocation beta features with CRI-O using cgroup v1
testgrid_dashboards = sig-node-cri-o, sig-node-dynamic-resource-allocation
skip_report = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any job with skip_report = true? I don't think this needs to be configurable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ git grep -B5 'skip_report: true'
sig-node-presubmit.yaml-  - name: pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
sig-node-presubmit.yaml-    cluster: k8s-infra-prow-build
sig-node-presubmit.yaml-    optional: true
sig-node-presubmit.yaml-    always_run: false
sig-node-presubmit.yaml-    run_if_changed: 'test/e2e/node/pod_resize.go|pkg/kubelet/kubelet.go|pkg/kubelet/kubelet_pods.go|pkg/kubelet/kuberuntime/kuberuntime_manager.go'
sig-node-presubmit.yaml:    skip_report: true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, "for our jobs". We should only make those things configurable which we need to be configurable - it'll be shorter and more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

testgrid_dashboards = sig-node-cri-o, sig-node-dynamic-resource-allocation
skip_report = false
image_config_file = /home/prow/go/src/k8s.io/test-infra/jobs/e2e_node/crio/latest/image-config-cgroupv1-serial.yaml
inject_ssh_public_key = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: this can depend on the job type in the template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can. not all presubmit jobs have this. It depends on a distro/image as far as I remember.

{%- if "containerd" in job_name %}
{%- set testgrid_dashboards = testgrid_dashboards + ", sig-node-containerd" %}
{%- endif %}
- name: {{job_name}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So indention is the same for both periodic and presubmits?

The test bot seems to be stuck, but I suspect that a YAML linter would complain about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, fortunately the indentation is the same for presubmits and periodics:

presubmits:
  kubernetes/kubernetes:
  - name: pull-kubernetes-e2e-containerd-gce
periodics:
  # This jobs runs e2e.test with a focus on tests for the Dynamic Resource Allocation feature (currently beta)
  # on a kind cluster with containerd updated to a version with CDI support.
  - name: ci-kind-dra

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if the lists were indented in the canonical way, it would be:

periodics:
- name: ci-kind-dra

YAML doesn't care, but there are stylecheckers which might.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I took that snipped from the existing yaml.
And CI doesn't complain about wrong indentation for this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It runs yamllint, but that doesn't care, so we are good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a last resort we can reindent in gen.py if it's really needed. It will be a little bit ugly though.
I suspect/hope that periodic and presubmit configs have the same indentation level in purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a YAML perspective, the nesting level is different.

I don't remember anymore where, but there are other jobs where the indention is different, which is very annoying when copy-pasting from presubmit to periodic or vice-versa. That made me think that it's enforced. It's not, so it indeed makes much more sense to use the same indention even if it's not "quite right" for periodics.

testgrid-tab-name: {{job_name}}
description: {{description}}
testgrid-alert-email: {{testgrid_alert_email}}
fork-per-release: "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canaries shouldn't get forked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,115 @@
{%- if beginning %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this file be moved into a templates directory, as in kops?

When I look at the PR sidebar, I currently see four files with the identical dynamic-resource-allocation... as name. Even if we shorten that to dra-, keeping the source file separate would make it stand out more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it can be moved. Should I move .conf file as well?

Personally, I'd prefer flat structure with shorter names, e.g.
dra.conf
dra.jinja
dra-canary.yaml
dra-pull.yaml
dra-ci.yaml

And I hope that this approach can be used for all sig-node jobs and the final list of files will be something like this:
jobs.conf
jobs.jinja
jobs-canary.yaml
jobs-pull.yaml
jobs-ci.yaml

@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 20, 2024

@pohly @kannon92 @SergeyKanzhelev @haircommander

Looks very promising.

Thank you. After fixing review comments, I'm going to remove -pull and -ci yamls from this PR, so we can only test -canary.
It would be great if SIG-Node folks would look at this and confirm that this approach is at least acceptable.

I personally like it. Using it would allow us to

  • have presubmit job for every periodic
  • keep them synchronized
  • easily generate canary jobs for testing purposes (e.g. kubetest2)
  • make less mistakes as job configs are automatically generated
  • do less typing and copypasting :)
    etc.

WDYT guys?

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 2 times, most recently from 4234630 to 28eda1b Compare December 20, 2024 21:33
@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 21, 2024

/test pull-test-infra-verify-lint

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 5f2ee95 to d21a1da Compare December 21, 2024 00:34
@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 21, 2024

/test pull-test-infra-verify-lint

@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 21, 2024

/test pull-test-infra-unit-test

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 7 times, most recently from 8cfad8b to 362ab7c Compare December 22, 2024 20:01
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 22, 2024
@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 23, 2024

/test pull-test-infra-unit-test

@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 23, 2024

/test pull-test-infra-verify-lint

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 362ab7c to 238c7ec Compare December 24, 2024 13:03
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 24, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bart0sh
Once this PR has been reviewed and has the lgtm label, please assign mpherman2 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bart0sh bart0sh changed the title generate job config from Jinja templates generate DRA canary job config from a Jinja template Dec 24, 2024
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 238c7ec to 07dd545 Compare December 24, 2024 13:05
@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 25, 2024

/retest

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 07dd545 to f79d989 Compare December 25, 2024 15:59
@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 25, 2024

@pohly here is a diff between current DRA presubmits and DRA canary jobs:

--- dra-presubmits.orig.yaml	2024-12-25 17:06:04.254985370 +0200
+++ dra-canary.yaml	2024-12-25 17:52:31.159139987 +0200
@@ -1,26 +1,24 @@
 presubmits:
   kubernetes/kubernetes:
-  - name: pull-kubernetes-kind-dra
-    cluster: k8s-infra-prow-build
+  - name: canary-kind-dra
+    cluster: eks-prow-build-cluster
     skip_branches:
     - release-\d+\.\d+  # per-release image
-    annotations:
-      testgrid-dashboards: sig-node-presubmits, sig-node-dynamic-resource-allocation
-      testgrid-tab-name: pr-kind-dra
-    decorate: true
-    path_alias: k8s.io/kubernetes
-    # Not relevant for most PRs.
     always_run: false
-    # This covers most of the code related to dynamic resource allocation.
-    # Periodic variant: ci-kind-dra
     run_if_changed: /(dra|dynamicresources|resourceclaim|deviceclass|resourceslice|resourceclaimtemplate|dynamic-resource-allocation|pkg/apis/resource|api/resource)/.*.go
     optional: true
-    decoration_config:
-      timeout: 90m
     labels:
       preset-service-account: "true"
       preset-dind-enabled: "true"
       preset-kind-volume-mounts: "true"
+    annotations:
+      testgrid-dashboards: sig-node-dynamic-resource-allocation, sig-node-presubmits
+      description: Runs E2E tests for Dynamic Resource Allocation beta features against a Kubernetes master cluster created with sigs.k8s.io/kind
+      testgrid-alert-email: [email protected],[email protected]
+    decorate: true
+    decoration_config:
+      timeout: 90m
+    path_alias: k8s.io/kubernetes
     spec:
       containers:
       - image: gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241218-d4b51bc3e8-master
@@ -28,95 +26,83 @@
         - runner.sh
         args:
         - /bin/sh
-        - -xc
-        - >
-          make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test" &&
-          curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind &&
-          kind build node-image --image=dra/node:latest . &&
-          trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT &&
-          kind create cluster --retain --config test/e2e/dra/kind.yaml --image dra/node:latest &&
-          KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=1h hack/ginkgo-e2e.sh -ginkgo.label-filter='Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow'
-
+        - -xce
+        - |
+          make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test"
+          curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind
+          kind build node-image --image=dra/node:latest .
+          trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT
+          # Which DRA features exist can change over time.
+          features=( $(grep '"DRA' pkg/features/kube_features.go | sed 's/.*"\(.*\)"/\1/') )
+          echo "Enabling DRA feature(s): ${features[*]}."
+          # Those additional features are not in kind.yaml, but they can be added at the end.
+          kind create cluster --retain --config <(cat test/e2e/dra/kind.yaml; for feature in ${features}; do echo "  ${feature}: true"; done) --image dra/node:latest
+          KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=1h hack/ginkgo-e2e.sh -ginkgo.label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Alpha, Beta, DynamicResourceAllocation$(for feature in ${features}; do echo , ${feature}; done)} && !Flaky && !Slow"
         # docker-in-docker needs privileged mode
         securityContext:
           privileged: true
         resources:
-          requests:
-            # these are both a bit below peak usage during build
-            # this is mostly for building kubernetes
-            memory: "9000Mi"
-            # during the tests more like 3-20m is used
-            cpu: 2000m
           limits:
-            memory: "9000Mi"
-            cpu: 2000m
+            cpu: 2
+            memory: 9Gi
+          requests:
+            cpu: 2
+            memory: 9Gi
 
-  - name: pull-kubernetes-kind-dra-all
-    cluster: k8s-infra-prow-build
+  - name: canary-kind-dra-all
+    cluster: eks-prow-build-cluster
     skip_branches:
     - release-\d+\.\d+  # per-release image
-    annotations:
-      testgrid-dashboards: sig-node-presubmits, sig-node-dynamic-resource-allocation
-      testgrid-tab-name: pr-kind-dra-all
-    decorate: true
-    path_alias: k8s.io/kubernetes
-    # Not relevant for most PRs.
     always_run: false
-    # This covers most of the code related to dynamic resource allocation.
-    # Periodic variant: ci-kind-dra-all
     run_if_changed: /(dra|dynamicresources|resourceclaim|deviceclass|resourceslice|resourceclaimtemplate|dynamic-resource-allocation|pkg/apis/resource|api/resource)/.*.go
-    # The tests might still be flaky or this job might get triggered accidentally for
-    # an unrelated PR.
     optional: true
-    decoration_config:
-      timeout: 90m
     labels:
       preset-service-account: "true"
       preset-dind-enabled: "true"
       preset-kind-volume-mounts: "true"
+    annotations:
+      testgrid-dashboards: sig-node-dynamic-resource-allocation, sig-node-presubmits
+      description: Runs E2E tests for Dynamic Resource Allocation alpha and beta features against a Kubernetes master cluster created with sigs.k8s.io/kind
+      testgrid-alert-email: [email protected],[email protected]
+    decorate: true
+    decoration_config:
+      timeout: 90m
+    path_alias: k8s.io/kubernetes
     spec:
       containers:
       - image: gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241218-d4b51bc3e8-master
         command:
         - runner.sh
         args:
-        - /bin/bash
-        - -xc
+        - /bin/sh
+        - -xce
         - |
-          set -ex
           make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test"
           curl -sSL https://kind.sigs.k8s.io/dl/latest/linux-amd64.tgz | tar xvfz - -C "${PATH%%:*}/" kind
           kind build node-image --image=dra/node:latest .
           trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT
-          # Which DRA features exist depends on the PR that is being tested.
+          # Which DRA features exist can change over time.
           features=( $(grep '"DRA' pkg/features/kube_features.go | sed 's/.*"\(.*\)"/\1/') )
           echo "Enabling DRA feature(s): ${features[*]}."
           # Those additional features are not in kind.yaml, but they can be added at the end.
           kind create cluster --retain --config <(cat test/e2e/dra/kind.yaml; for feature in ${features}; do echo "  ${feature}: true"; done) --image dra/node:latest
           KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} GINKGO_TIMEOUT=1h hack/ginkgo-e2e.sh -ginkgo.label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Alpha, Beta, DynamicResourceAllocation$(for feature in ${features}; do echo , ${feature}; done)} && !Flaky && !Slow"
-
         # docker-in-docker needs privileged mode
         securityContext:
           privileged: true
         resources:
-          requests:
-            # these are both a bit below peak usage during build
-            # this is mostly for building kubernetes
-            memory: "9000Mi"
-            # during the tests more like 3-20m is used
-            cpu: 2000m
           limits:
-            memory: "9000Mi"
-            cpu: 2000m
+            cpu: 2
+            memory: 9Gi
+          requests:
+            cpu: 2
+            memory: 9Gi
 
-  - name: pull-kubernetes-node-e2e-crio-cgrpv1-dra
-    cluster: k8s-infra-prow-build
+  - name: canary-node-e2e-cgrpv1-crio-dra
+    cluster: eks-prow-build-cluster
     skip_branches:
     - release-\d+\.\d+  # per-release image
     always_run: false
-    # Automatically testing with one container runtime in one configuration is sufficient to detect basic problems in kubelet early.
-    # CRI-O was picked because it was solid for testing so far.
-    # Periodic variant: ci-node-e2e-crio-cgrpv1-dra-features
     run_if_changed: (/dra/|/dynamicresources/|/resourceclaim/|/deviceclass/|/resourceslice/|/resourceclaimtemplate/|/dynamic-resource-allocation/|/pkg/apis/resource/|/api/resource/|/test/e2e_node/dra_).*\.(go|yaml)
     optional: true
     skip_report: false
@@ -126,8 +112,9 @@
       preset-pull-kubernetes-e2e: "true"
       preset-pull-kubernetes-e2e-gce: "true"
     annotations:
-      testgrid-dashboards: sig-node-cri-o, sig-node-presubmits, sig-node-dynamic-resource-allocation
-      testgrid-tab-name: pr-node-kubelet-crio-cgrpv1-dra
+      testgrid-dashboards: sig-node-dynamic-resource-allocation, sig-node-presubmits, sig-node-cri-o
+      description: Runs E2E node tests for Dynamic Resource Allocation beta features with CRI-O using cgroup v1
+      testgrid-alert-email: [email protected],[email protected]
     decorate: true
     decoration_config:
       timeout: 90m
@@ -156,23 +143,21 @@
         env:
         - name: IGNITION_INJECT_GCE_SSH_PUBLIC_KEY_FILE
           value: "1"
+        - name: GOPATH
+          value: /go
         resources:
-          requests:
-            cpu: 4
-            memory: 6Gi
           limits:
-            cpu: 4
-            memory: 6Gi
+            cpu: 2
+            memory: 9Gi
+          requests:
+            cpu: 2
+            memory: 9Gi
 
-  - name: pull-kubernetes-node-e2e-crio-cgrpv2-dra
-    cluster: k8s-infra-prow-build
+  - name: canary-node-e2e-cgrpv2-crio-dra
+    cluster: eks-prow-build-cluster
     skip_branches:
     - release-\d+\.\d+  # per-release image
     always_run: false
-    # Automatically testing with one container runtime in one configuration is sufficient to detect basic problems in kubelet early.
-    # CRI-O was picked because it was solid for testing so far.
-    # Periodic variant: ci-node-e2e-cgrpv2-crio-dra
-    # run_if_changed: (/dra/|/dynamicresources/|/resourceclaim/|/deviceclass/|/resourceslice/|/resourceclaimtemplate/|/dynamic-resource-allocation/|/pkg/apis/resource/|/api/resource/|/test/e2e_node/dra_).*\.(go|yaml)
     optional: true
     skip_report: false
     labels:
@@ -181,8 +166,9 @@
       preset-pull-kubernetes-e2e: "true"
       preset-pull-kubernetes-e2e-gce: "true"
     annotations:
-      testgrid-dashboards: sig-node-cri-o, sig-node-presubmits, sig-node-dynamic-resource-allocation
-      testgrid-tab-name: pr-node-kubelet-crio-cgrpv2-dra
+      testgrid-dashboards: sig-node-dynamic-resource-allocation, sig-node-presubmits, sig-node-cri-o
+      description: Runs E2E node tests for Dynamic Resource Allocation beta features with CRI-O using cgroup v2
+      testgrid-alert-email: [email protected],[email protected]
     decorate: true
     decoration_config:
       timeout: 90m
@@ -211,31 +197,32 @@
         env:
         - name: IGNITION_INJECT_GCE_SSH_PUBLIC_KEY_FILE
           value: "1"
+        - name: GOPATH
+          value: /go
         resources:
-          requests:
-            cpu: 4
-            memory: 6Gi
           limits:
-            cpu: 4
-            memory: 6Gi
+            cpu: 2
+            memory: 9Gi
+          requests:
+            cpu: 2
+            memory: 9Gi
 
-  - name: pull-kubernetes-node-e2e-containerd-1-7-dra
-    cluster: k8s-infra-prow-build
+  - name: canary-node-e2e-containerd-1-7-dra
+    cluster: eks-prow-build-cluster
     skip_branches:
     - release-\d+\.\d+  # per-release image
     always_run: false
-    # Automatically testing with one container runtime in one configuration is sufficient to detect basic problems in kubelet early.
-    # CRI-O was picked because it was solid for testing so far.
-    # Periodic variant: ci-node-e2e-containerd-1-7-dra
-    # run_if_changed: (/dra/|/dynamicresources/|/resourceclaim/|/deviceclass/|/resourceslice/|/resourceclaimtemplate/|/dynamic-resource-allocation/|/pkg/apis/resource/|/api/resource/|/test/e2e_node/dra_).*\.(go|yaml)
     optional: true
     skip_report: false
     labels:
       preset-service-account: "true"
       preset-k8s-ssh: "true"
+      preset-pull-kubernetes-e2e: "true"
+      preset-pull-kubernetes-e2e-gce: "true"
     annotations:
-      testgrid-dashboards: sig-node-presubmits, sig-node-dynamic-resource-allocation
-      testgrid-tab-name: pr-node-kubelet-containerd-dra
+      testgrid-dashboards: sig-node-dynamic-resource-allocation, sig-node-presubmits, sig-node-containerd
+      description: Runs E2E node tests for Dynamic Resource Allocation beta features with containerd
+      testgrid-alert-email: [email protected],[email protected]
     decorate: true
     decoration_config:
       timeout: 90m
@@ -254,16 +241,16 @@
         args:
         - --deployment=node
         - --gcp-zone=us-west1-b
-        - '--node-test-args=--feature-gates=DynamicResourceAllocation=true --service-feature-gates=DynamicResourceAllocation=true --runtime-config=api/beta=true --container-runtime-endpoint=unix:///run/containerd/containerd.sock --container-runtime-process-name=/usr/bin/containerd --container-runtime-pid-file= --kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/containerd.service" --extra-log="{\"name\": \"containerd.log\", \"journalctl\": [\"-u\", \"containerd\"]}"'
+        - '--node-test-args=--feature-gates=DynamicResourceAllocation=true --service-feature-gates=DynamicResourceAllocation=true --runtime-config=api/beta=true --container-runtime-endpoint=unix:///run/containerd/containerd.sock --container-runtime-process-name=/usr/bin/containerd --container-runtime-pid-file= --kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/containerd.service --kubelet-cgroups=/system.slice/kubelet.service" --extra-log="{\"name\": \"containerd.log\", \"journalctl\": [\"-u\", \"containerd\"]}"'
         - --node-tests=true
         - --provider=gce
         - '--test_args=--timeout=1h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow"'
         - --timeout=65m
         - --node-args=--image-config-file=/home/prow/go/src/k8s.io/test-infra/jobs/e2e_node/dra/image-config-containerd-1.7.yaml
         resources:
-          requests:
-            cpu: 4
-            memory: 6Gi
           limits:
-            cpu: 4
-            memory: 6Gi
+            cpu: 2
+            memory: 9Gi
+          requests:
+            cpu: 2
+            memory: 9Gi

@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 25, 2024

/retest

@bart0sh bart0sh changed the title generate DRA canary job config from a Jinja template generate DRA job configs from a Jinja template Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: PRs - Needs Reviewer
Development

Successfully merging this pull request may close these issues.

4 participants