-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod stuck in ContainerCreating after upgrading cluster to 1.29 #2980
Comments
Hey @zendesk-yumingdeng , I noticed the log message: It looks like all IPs are exhausted. Could you let me know how many pods are running on the new node that was brought up? Also, what kind of node is it in terms of capacity? TIA! |
Hi, We are facing the same issue. In a specific environment we have 3 nodes running c6g/c7g instances with medium size. VPC CNI is used with Security Groups per pod. If we have 22 pods running, the 23rd cannot start with the same error:
Logs from ipam:
According to this document, each instance should accommodate 8 pods per node. However, another page uses the formula This issue started when we migrated the cluster from K8s version 1.28 to 1.30 and bumped the CNI version from 1.14.1 to 1.18.2. |
we are having the same issue pods get stuck due to all the network interfaces having the maximum numbers of IPs. The CNI plugin should not allow pods to be scheduled on nodes that don't have any ip capacity (k8s 1.30, CNI 1.18.2) |
Same issue on 1.29 and 1.18.2. Would be nice if the plugin would not try to schedule where there is no or very little IP capacity. |
seeing the same thing. anyone know what causes it or has a fix? |
i fixed it by setting |
Same issue on 1.29 and 1.18.3. |
The original issue here was
It means, the not enough IP was available the node.
For others who are experiencing, this this occur after any upgrade? |
Between the VPC CNI 1.14.x and later versions, there have changes to reduce the number of EC2 API calls (#2640) that sometimes inadvertently interfered with the previous behavior. Using the proper values for |
@emcay I think aws/containers-roadmap#2189 is related to what you are describing. |
We had this issue come back after upgrading recently. We're now k8s 1.31, cni 1.19.0 (previously k8s 1.30, cni 1.18.2). The only non-default configuration we run is
We're keeping an eye out and will do |
What happened:
We are experiencing something similar to #2970, after upgrading our in-house clusters to 1.29.
After a new node is brought up (not this does not happen to every node), some pods that were scheduled to the node are stuck in the
ContainerCreating
status with the below event message:aws-cni
pod logs on the node/var/log/aws-routed-eni/plugin.log
/var/log/aws-routed-eni/ipamd.log
Environment:
kubectl version
): v1.29.6cat /etc/os-release
): Ubuntu 22.04.4 LTSuname -a
): Linux 6.5.0-1022-aws Name in example yaml is confusing #22~22.04.1-Ubuntu SMP Fri Jun 14 19:23:09 UTC 2024 aarch64 aarch64 aarch64 GNU/LinuxThe text was updated successfully, but these errors were encountered: