Cannot disrupt Node: state node is nominated for a pending pod #7521

vb-atelio · 2024-12-11T22:27:09Z

Description

Observed Behavior:
Karpenter refused to drain a node(instance type: m7i.12xlarge) when it's clearly underutilized(has 8 pods running) with reason: state node is nominated for a pending pod. When I run kubectl get pods --all-namespaces --field-selector=status.phase=Pending I see that there are no pending pods.

Expected Behavior:
Karpenter should be disrupting this node and draining it and scheduling these pods on another node or atleast show the correct reason on why it's not able to drain the node

Reproduction Steps (Please include YAML):
nodepool.yaml

Name:         default
Namespace:
Labels:       <none>
Annotations:  compatibility.karpenter.sh/v1beta1-nodeclass-reference: {"name":"default"}
              karpenter.sh/nodepool-hash: 12063359807553009501
              karpenter.sh/nodepool-hash-version: v3
API Version:  karpenter.sh/v1
Kind:         NodePool
Metadata:
  Creation Timestamp:  2024-10-29T06:33:00Z
  Generation:          24
  Resource Version:    47771428
  UID:                 857db43c-c406-4952-8648-d363b9079f63
Spec:
  Disruption:
    Budgets:
      Nodes:               50%
    Consolidate After:     0s
    Consolidation Policy:  WhenEmptyOrUnderutilized
  Limits:
    Count:   50
    Cpu:     4k
    Memory:  4000Gi
  Template:
    Metadata:
      Labels:
        Type:  karpenter
    Spec:
      Expire After:  720h
      Node Class Ref:
        Group:  karpenter.k8s.aws
        Kind:   EC2NodeClass
        Name:   default
      Requirements:
        Key:       karpenter.sh/capacity-type
        Operator:  In
        Values:
          on-demand
        Key:       node.kubernetes.io/instance-type
        Operator:  In
        Values:
          m7i.12xlarge

Versions:

Chart Version: 1.0.2
Kubernetes Version (kubectl version):1.30

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

jigisha620 · 2024-12-12T15:59:15Z

Hi @vb-atelio,
Can you share detailed logs from when this happened? How did you determine that the node was underutilized? Did you monitor node usage during this period? If yes, can you please share it?

tufitko · 2024-12-19T07:14:08Z

@jigisha620
I have the same problem. I'll try to describe it:

A node is marked for deletion due to expiration, but it hosts a pod with the karpenter.sh/do-not-disrupt annotation and an attached volume. Karpenter waits for the volume to detach before proceeding with the node deletion. (Karpenter will wait infinitely while pod is running, also Karpenter wont evict this pod (ref) )

At the same time, Karpenter nominates the pod from the node marked for deletion to another node. For example, the nomination logic can be found here.

The new node receiving the nominated pod might be empty or underutilized, but due to the presence of the nominated pod, Karpenter cannot disrupt it.

Karpenter version: 1.1.1

tufitko · 2024-12-23T07:31:04Z

@jigisha620 any updates here?

vb-atelio added bug Something isn't working needs-triage Issues that need to be triaged labels Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot disrupt Node: state node is nominated for a pending pod #7521

Cannot disrupt Node: state node is nominated for a pending pod #7521

vb-atelio commented Dec 11, 2024

jigisha620 commented Dec 12, 2024

tufitko commented Dec 19, 2024 •

edited

Loading

tufitko commented Dec 23, 2024

Cannot disrupt Node: state node is nominated for a pending pod #7521

Cannot disrupt Node: state node is nominated for a pending pod #7521

Comments

vb-atelio commented Dec 11, 2024

Description

jigisha620 commented Dec 12, 2024

tufitko commented Dec 19, 2024 • edited Loading

tufitko commented Dec 23, 2024

tufitko commented Dec 19, 2024 •

edited

Loading