Migrate CRI-O jobs away from `kubernetes_e2e.py` #32567

saschagrunert · 2024-05-06T09:04:12Z

The kubernetes_e2e.py script is deprecated and we should use kubetest2 instead.

All affected tests are listed in https://testgrid.k8s.io/sig-node-cri-o

cc @kubernetes/sig-node-cri-o-test-maintainers

Ref: https://github.com/kubernetes/test-infra/tree/master/scenarios, #20760

The text was updated successfully, but these errors were encountered:

haircommander · 2024-05-06T13:57:33Z

/sig node

k8s-triage-robot · 2024-08-04T14:13:56Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

saschagrunert · 2024-08-05T07:01:42Z

/remove-lifecycle stale

kannon92 · 2024-08-21T17:20:38Z

/triage accepted
/priority important-longterm

elieser1101 · 2024-09-05T12:07:51Z

Does this still need help? can i start looking at it?

saschagrunert · 2024-09-05T12:14:26Z

@elieser1101 I'd appreciate your eyes on that. 🙏

elieser1101 · 2024-09-05T12:33:50Z

/assign

bart0sh · 2024-12-11T09:41:22Z

@kannon92 @elieser1101 @haircommander
Looking at failing splitfs and imagefs pr jobs, I noticed random test failures inside a container with this error message:

sh: error while loading shared libraries: /lib/libc.so.6: cannot apply additional memory protection after relocation: Permission denied"

This seems to be a core issue causing jobs to fail.

Unfortunately I can't reproduce it in my environment. Here is how I run kubetest2 for splitfs tests:

$ GCE_SSH_PUBLIC_KEY_FILE=/home/ed/.ssh/google_compute_engine.pub KUBE_SSH_USER=core IGNITION_INJECT_GCE_SSH_PUBLIC_KEY_FILE=1 JENKINS_GCE_SSH_PRIVATE_KEY_FILE=/home/ed/.ssh/google_compute_engine kubetest2-gce --test=node --down=false -- --parallelism=8 --gcp-zone=us-west1-b --gcp-project=service-mesh-296815 --repo-root=. --image-config-file=/home/prow/go/src/k8s.io/test-infra/jobs/e2e_node/crio/latest/image-config-cgroupv2-splitfs.yaml --delete-instances=false --test-args='--container-runtime-endpoint=unix:///var/run/crio/crio.sock --container-runtime-process-name=/usr/local/bin/crio --container-runtime-pid-file= --kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --kubelet-cgroups=/system.slice/kubelet.service" --extra-log="{\"name\": \"crio.log\", \"journalctl\": [\"-u\", \"crio\"]}"' --skip-regex='\[Flaky\]|\[Slow\]|\[Serial\]' --focus-regex='\[NodeConformance\]|\[NodeFeature:.+\]|\[NodeFeature\]' 2>&1 | tee /tmp/log

I suspect that this could be caused by the host/vm kernel and container resource restrictions, but I don't know how to specify upper level instance image (gcr.io/k8s-staging-test-infra/kubekins-e2e:v20241128-8df65c072f-master) and container resources (cpu 4 and memory 6Gi) when running kubetest2 locally.

Any ideas how to proceed further?

haircommander · 2024-12-11T20:41:52Z

do you have access to the nodes you've provisioned @bart0sh ? can I poke around? basically, we want to be able to run ausearch -m AVC -ts recent after the failure to see what was being blocked, then we can update the selinux policy we create in ignition to include the new option

bart0sh · 2024-12-11T20:59:03Z

@haircommander yes, I have access to the nodes, but I can't reproduce the error there :(

bart0sh · 2024-12-11T21:01:17Z

btw, decreasing parallelism seem to help a bit: https://testgrid.k8s.io/sig-node-presubmits#pr-crio-cgrpv2-splitfs-e2e-kubetest2&width=90

elieser1101 · 2024-12-11T21:11:25Z

Trigered the job couple of times and seem to be improved(still failed at some point), also takes longer to complete. We could test the imagefs one with the same approach @bart0sh but i guess the right fix includes the selinux cahnge?

bart0sh · 2024-12-11T21:24:52Z

I'm not sure about it. selinux configuration is the same for kubetest2 and old jobs, but only kubetest2 jobs fail.

bart0sh · 2024-12-12T14:39:06Z

@elieser1101

We could test the imagefs one with the same approach

decreasing parallelism improved imagefs test. Previously I didn't see any successful job runs. With the change I can see at least one so far.

bart0sh · 2024-12-13T10:27:56Z

I'm still wondering why I can't repro error while loading shared libraries: /lib/libc.so.6: cannot apply additional memory protection after relocation: Permission denied in my environment. I'm using the same kubetest2 command line parameters and the same image configs, .ign file, instance type, gcp zone etc. Even using --processes=100 command line option doesn't help to trigger the error.

bart0sh · 2024-12-13T23:15:50Z

Unfortunately using more powerful instance didn't change much for imagefs job. I can still see the same error in the logs.

bart0sh · 2024-12-18T01:58:19Z

@elieser1101 I can see a lot of green kubetest2 jobs in the test grid. Is there anything that prevents replacing kubernetes_e2e.py jobs with them? I did it for splitfs and imagefs jobs as I was involved in fixing them. I can do it for the rest of jobs if needed.

elieser1101 · 2024-12-18T12:45:45Z

@bart0sh thank you very much for the splitfs/imagefs that was a great finding

What would come next is to validate that the kubetest2 are actually working. Meaning, I noticed that some of the jobs are completing but are skipping all the specs. We would like to ensure we are running the jobs properly before replacing the kubernetes_e2e.py jobs.

At the moment im loking at the DRA ones wich were missing some kubetest2 features and this

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 6, 2024

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 6, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 4, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 5, 2024

SergeyKanzhelev added this to SIG Node CI/Test Board Aug 11, 2024

github-project-automation bot moved this to Triage in SIG Node CI/Test Board Aug 11, 2024

kannon92 moved this from Triage to Issues - To do in SIG Node CI/Test Board Aug 21, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Aug 21, 2024

k8s-ci-robot assigned elieser1101 Sep 5, 2024

bart0sh mentioned this issue Dec 11, 2024

pr-crio-cgrpv2-splitfs-e2e-kubetest2: decrease parallelism #33926

Merged

bart0sh mentioned this issue Dec 11, 2024

pr-crio-cgrpv2-imagefsfs-e2e-kubetest2: decrease parallelism #33929

Merged

This was referenced Dec 11, 2024

DRA focus regex replace --label-filter #33934

Closed

missing --label-filter kubernetes-sigs/kubetest2#285

Closed

bart0sh mentioned this issue Dec 13, 2024

pr-crio-cgrpv2-imagefsfs-e2e-kubetest2: use n1-standard-4 machine #33944

Merged

This was referenced Dec 13, 2024

use --label-filter for dra tests #33947

Closed

accept GINKGO_FLAGS for test-e2e-node.sh kubernetes/kubernetes#129215

Merged

dra uses --ginkgo-flags #33948

Open

add ginkgo-flags to node tester kubernetes-sigs/kubetest2#286

Merged

bart0sh mentioned this issue Dec 18, 2024

replace pull-crio-cgrpv2-imagefs-separatedisktest with kubetest2 job #33994

Merged

This was referenced Dec 18, 2024

splitfs-separate-disk-kubetest2: use empty skip-regex #34005

Merged

replace pull-crio-cgroupv2-splitfs-separate-disk with kubetest2 job #34021

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate CRI-O jobs away from `kubernetes_e2e.py` #32567

Migrate CRI-O jobs away from `kubernetes_e2e.py` #32567

saschagrunert commented May 6, 2024 •

edited

Loading

haircommander commented May 6, 2024

k8s-triage-robot commented Aug 4, 2024

saschagrunert commented Aug 5, 2024

kannon92 commented Aug 21, 2024

elieser1101 commented Sep 5, 2024

saschagrunert commented Sep 5, 2024

elieser1101 commented Sep 5, 2024

bart0sh commented Dec 11, 2024

haircommander commented Dec 11, 2024

bart0sh commented Dec 11, 2024

bart0sh commented Dec 11, 2024

elieser1101 commented Dec 11, 2024

bart0sh commented Dec 11, 2024

bart0sh commented Dec 12, 2024

bart0sh commented Dec 13, 2024

bart0sh commented Dec 13, 2024

bart0sh commented Dec 18, 2024

elieser1101 commented Dec 18, 2024

Migrate CRI-O jobs away from kubernetes_e2e.py #32567

Migrate CRI-O jobs away from kubernetes_e2e.py #32567

Comments

saschagrunert commented May 6, 2024 • edited Loading

haircommander commented May 6, 2024

k8s-triage-robot commented Aug 4, 2024

saschagrunert commented Aug 5, 2024

kannon92 commented Aug 21, 2024

elieser1101 commented Sep 5, 2024

saschagrunert commented Sep 5, 2024

elieser1101 commented Sep 5, 2024

bart0sh commented Dec 11, 2024

haircommander commented Dec 11, 2024

bart0sh commented Dec 11, 2024

bart0sh commented Dec 11, 2024

elieser1101 commented Dec 11, 2024

bart0sh commented Dec 11, 2024

bart0sh commented Dec 12, 2024

bart0sh commented Dec 13, 2024

bart0sh commented Dec 13, 2024

bart0sh commented Dec 18, 2024

elieser1101 commented Dec 18, 2024

Migrate CRI-O jobs away from `kubernetes_e2e.py` #32567

Migrate CRI-O jobs away from `kubernetes_e2e.py` #32567

saschagrunert commented May 6, 2024 •

edited

Loading