Container failed to start because of cgroups issue #7510
Labels
bug
Something isn't working
triage/needs-investigation
Issues that need to be investigated before triaging
Description
Observed Behavior:
After some time that a node is running pods, suddenly the node fails to start new pods.
The instance type is m7a.medium.
This is what I'm seeing in the pod:
The node did run some tasks successfully, but when that issue happen, the pod is stuck in "Running" state, and the only way to exit this state is to manually delete the node, and then wait for a new node to be scheduled and run the pod.
This is not the first time it happens, it usually happens after a while, but not immediately. I don't understand why is this cgroups issue.
I'm not sure if it is related to Karpenter or not, could be an EKS bug (Maybe AMI bug?). But I can say for sure this issue started when we started to use Karpenter. We didn't use this type of instance before though - so this could be related as well.
What can cause this error:
and in kubernetes events I see:
Cgroup v1 support is in maintenance mode, please migrate to Cgroup v2.
In EC2NodeClass I have:
Expected Behavior:
Pod should run successfully.
Versions:
Chart Version: karpenter-1.0.6
Kubernetes Version (
kubectl version
):Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"31", GitVersion:"v1.31.2-eks-7f9249a", GitCommit:"1316e23bda3256fab6fbead2f22f6811dde77fb6", GitTreeState:"clean", BuildDate:"2024-10-23T23:38:37Z", GoVersion:"go1.22.8", Compiler:"gc", Platform:"linux/amd64"}
The text was updated successfully, but these errors were encountered: