Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Node name instead of Machine name on removal #46

Merged
merged 2 commits into from
Aug 28, 2024

Conversation

HomayoonAlimohammadi
Copy link
Contributor

@HomayoonAlimohammadi HomayoonAlimohammadi commented Aug 27, 2024

Summary

This PR fixes the issue with CAPI rolling upgrades. The problem was that we were passing the wrong node-name to k8sd remove-node endpoint.

Description

CAPI rolling upgrades were failing, we narrowed down the problem, figured that the node removal process is not happening as expected (i.e. you can reproduce the problem by scaling down the control plane replicas of the workload cluster). Finally it was evident that the /capi/remove-node endpoint of k8sd was returning an error indicating that it can not find the node it was called to delete. That is because we were passing the machine.Name to k8sd /capi/remove-node endpoint, while as can be seen in the output of kubectl get nodes (of the workload cluster), we need to pass the node name.

@HomayoonAlimohammadi HomayoonAlimohammadi requested a review from a team as a code owner August 27, 2024 18:00
@addyess
Copy link

addyess commented Aug 27, 2024

@HomayoonAlimohammadi i came to review this -- but if you could please add some description and references to why this change is necessary. Sorry, i don't have enough context to see how this is an obvious change. thanks

return fmt.Errorf("machine %s has no node reference", machine.Name)
}

nodeName := machine.Status.NodeRef.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work finding the root cause. I only have one question, otherwise this looks good. Is Status.NodeRef set automatically by CAPI's Machine controller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is part of the CAPI contract.

@HomayoonAlimohammadi
Copy link
Contributor Author

@HomayoonAlimohammadi i came to review this -- but if you could please add some description and references to why this change is necessary. Sorry, i don't have enough context to see how this is an obvious change. thanks

I'm really sorry @addyess, you're right, the PR was rushed out of excitement! :D
Tried to update the description accordingly, lemme know if I can elaborate further.

pkg/ck8s/workload_cluster.go Outdated Show resolved Hide resolved
pkg/ck8s/workload_cluster.go Outdated Show resolved Hide resolved
Copy link
Contributor

@bschimke95 bschimke95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HomayoonAlimohammadi HomayoonAlimohammadi merged commit f9d6a96 into main Aug 28, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants