Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot disrupt Node: state node is nominated for a pending pod #7521

Open
vb-atelio opened this issue Dec 11, 2024 · 3 comments
Open

Cannot disrupt Node: state node is nominated for a pending pod #7521

vb-atelio opened this issue Dec 11, 2024 · 3 comments
Labels
bug Something isn't working needs-triage Issues that need to be triaged

Comments

@vb-atelio
Copy link

Description

Observed Behavior:
Karpenter refused to drain a node(instance type: m7i.12xlarge) when it's clearly underutilized(has 8 pods running) with reason: state node is nominated for a pending pod. When I run kubectl get pods --all-namespaces --field-selector=status.phase=Pending I see that there are no pending pods.

Expected Behavior:
Karpenter should be disrupting this node and draining it and scheduling these pods on another node or atleast show the correct reason on why it's not able to drain the node

Reproduction Steps (Please include YAML):
nodepool.yaml

Name:         default
Namespace:
Labels:       <none>
Annotations:  compatibility.karpenter.sh/v1beta1-nodeclass-reference: {"name":"default"}
              karpenter.sh/nodepool-hash: 12063359807553009501
              karpenter.sh/nodepool-hash-version: v3
API Version:  karpenter.sh/v1
Kind:         NodePool
Metadata:
  Creation Timestamp:  2024-10-29T06:33:00Z
  Generation:          24
  Resource Version:    47771428
  UID:                 857db43c-c406-4952-8648-d363b9079f63
Spec:
  Disruption:
    Budgets:
      Nodes:               50%
    Consolidate After:     0s
    Consolidation Policy:  WhenEmptyOrUnderutilized
  Limits:
    Count:   50
    Cpu:     4k
    Memory:  4000Gi
  Template:
    Metadata:
      Labels:
        Type:  karpenter
    Spec:
      Expire After:  720h
      Node Class Ref:
        Group:  karpenter.k8s.aws
        Kind:   EC2NodeClass
        Name:   default
      Requirements:
        Key:       karpenter.sh/capacity-type
        Operator:  In
        Values:
          on-demand
        Key:       node.kubernetes.io/instance-type
        Operator:  In
        Values:
          m7i.12xlarge

Versions:

  • Chart Version: 1.0.2
  • Kubernetes Version (kubectl version):1.30
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@vb-atelio vb-atelio added bug Something isn't working needs-triage Issues that need to be triaged labels Dec 11, 2024
@jigisha620
Copy link
Contributor

Hi @vb-atelio,
Can you share detailed logs from when this happened? How did you determine that the node was underutilized? Did you monitor node usage during this period? If yes, can you please share it?

@tufitko
Copy link

tufitko commented Dec 19, 2024

@jigisha620
I have the same problem. I'll try to describe it:

A node is marked for deletion due to expiration, but it hosts a pod with the karpenter.sh/do-not-disrupt annotation and an attached volume. Karpenter waits for the volume to detach before proceeding with the node deletion. (Karpenter will wait infinitely while pod is running, also Karpenter wont evict this pod (ref) )

At the same time, Karpenter nominates the pod from the node marked for deletion to another node. For example, the nomination logic can be found here.

The new node receiving the nominated pod might be empty or underutilized, but due to the presence of the nominated pod, Karpenter cannot disrupt it.

Karpenter version: 1.1.1

@tufitko
Copy link

tufitko commented Dec 23, 2024

@jigisha620 any updates here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

3 participants