Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NodeRepair featureGate #7491

Open
fe80 opened this issue Dec 6, 2024 · 2 comments
Open

Use NodeRepair featureGate #7491

fe80 opened this issue Dec 6, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@fe80
Copy link

fe80 commented Dec 6, 2024

Description

How can the docs be improved?

Hello,

I try to use the featureGate NodeRepair, this is correctly enable on my controller pod (FEATURE_GATES: SpotToSpotConsolidation=false,NodeRepair=true), but that doesn't work like I've expected.

We just need to enable the feature gate for work ? I pretty sure we need to wait 30min before the node was considerate as unready but I don't find any documentation on this point.

Also it's look on node object or nodeclaims.karpenter.sh ? Because my node is ready but not my node claim:

╰─➤ k get node ip-10-34-6-212.eu-west-3.compute.internal
NAME                                        STATUS   ROLES    AGE    VERSION
ip-10-34-6-212.eu-west-3.compute.internal   Ready    <none>   129m   v1.30.6-eks-94953ac

╰─➤ k get nodeclaims.karpenter.sh std-linux-cpu-h4g4s                       
NAME                  TYPE          CAPACITY   ZONE         NODE                                        READY     AGE
std-linux-cpu-h4g4s   m7i.8xlarge   spot       eu-west-3a   ip-10-34-6-212.eu-west-3.compute.internal   Unknown   129m

Regards,

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@fe80 fe80 added documentation Improvements or additions to documentation needs-triage Issues that need to be triaged labels Dec 6, 2024
@engedaam
Copy link
Contributor

engedaam commented Dec 10, 2024

The node repair feature works by looking at the node readiness. Karpenter only act against nodes that are not ready for 30 min. The nodeclaim being unknown is a known bug that are planning on pushing to fix. The issue there is the status of how karpenter is marking the status rather then any issue with the nodeclaim #7494

@engedaam engedaam removed the needs-triage Issues that need to be triaged label Dec 10, 2024
@fe80
Copy link
Author

fe80 commented Dec 13, 2024

Hello,

With 1.1.1 I've still have Unknown if my node still have the statupTaint

╰─➤ k get nodeclaims.karpenter.sh std-linux-core-dpxmc                      
NAME                   TYPE         CAPACITY   ZONE         NODE                                          READY     AGE
std-linux-core-dpxmc   t3a.medium   spot       eu-west-3b   ip-10-157-24-133.eu-west-3.compute.internal   Unknown   51m

╰─➤ k get node ip-10-157-24-133.eu-west-3.compute.internal
NAME                                          STATUS   ROLES    AGE   VERSION
ip-10-157-24-133.eu-west-3.compute.internal   Ready    <none>   50m   v1.30.7-eks-59bf375

╰─➤ k get nodes ip-10-157-24-133.eu-west-3.compute.internal -o json | jq '.spec.taints | length'      
1

The node repair feature works by looking at the node readiness

So they have no possibility to automatically destroy the nodeclaim if it's not at True ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants