Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to reconcile allocated Pod IPs. #3113

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hbhasker
Copy link

What type of PR is this?

improvement

Which issue does this PR fix?:

Fixes #3109

What does this PR do / Why do we need it?:
The CNI today only reconciles its datastore with existing pods at startup but never again. Sometimes its possible that IPAMD goes out of sync with the kubelet's view of the pods running on the node if it fails or is temporarily unreachable by the CNI plugin handling the DelNetwork call from the contrainer runtime.

In such cases the CNI continues to consider the pods IP allocated and will not free it as it will never see a DelNetwork again. This results in CNI failing to assign IP's to new pods.

This change adds a reconcile loop which periodically (once a minute) reconciles its allocated IPs with existence of pod's veth devices. If the veth device is not found then it free's up the corresponding allocation making the IP available for reuse.

Fixes #3109

Testing done on this change:

Added unit tests.

Will this PR introduce any new dependencies?:

No

Will this break upgrades or downgrades? Has updating a running cluster been tested?:
No

Does this change require updates to the CNI daemonset config files to work?:

No

Does this PR introduce any user-facing change?:

No

- Reconcile allocated Pod IPs against existing host veth devices periodically to clean up stale IP allocations [Fixes #3109].

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@hbhasker hbhasker requested a review from a team as a code owner November 13, 2024 17:50
@orsenthil
Copy link
Member

The CNI today only reconciles its datastore with existing pods at startup but never again.

This is done periodically. That's how the datastore keeps track of the ips used, available and does a ENI call if the additional ENI is needed.

@hbhasker
Copy link
Author

The CNI today only reconciles its datastore with existing pods at startup but never again.

This is done periodically. That's how the datastore keeps track of the ips used, available and does a ENI call if the additional ENI is needed.

I guess I meant reconcile the allocated IPs against running pods. It doesn't check in steady state that a pod is still around when its IP is allocated. The assumption is that if the pod was deleted the IPAMD would have been called by the CNI plugin. But I think there are some cases where this assumption breaks down, like if the CNI->ipamd grpc call fails for any reason. There are probably other edge cases but I couldn't fully reason out what could cause this to happen.

@hbhasker
Copy link
Author

ping for a review!

@@ -77,6 +77,9 @@ func _main() int {
// Pool manager
go ipamContext.StartNodeIPPoolManager()

// Pod IP allocation reconcile loop to clear up dangling pod IPs.
go ipamContext.PodIPAllocationReconcileLoop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an initial review, I am hesitant to add this additional 1 minute delay for the IP Sync in Reconcile Loop. I am not sure on the need for this. I will try understand the problem you encountered and see if there is any other approach to resolve this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean by 1 minute delay. This is not adding to the reconcile loop that reconciles pod IP allocation against IPs assigned to the node but instead is just cleaning up pod IPs that may not be in-use anymore because the corresponding pod is gone. This is a separate goroutine that runs a loop that just does that and should not add any delay to the regular IP reconcile loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, the need for this separate routine itself is introducing some concerns here. We will see why this is strictly needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any update on this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

The CNI today only reconciles its datastore with existing pods at
startup but never again. Sometimes its possible that IPAMD goes
out of sync with the kubelet's view of the pods running on the
node if it fails or is temporarily unreachable by the CNI plugin
handling the DelNetwork call from the contrainer runtime.

In such cases the CNI continues to consider the pods IP allocated
and will not free it as it will never see a DelNetwork again. This
results in CNI failing to assign IP's to new pods.

This change adds a reconcile loop which periodically (once a minute)
reconciles its allocated IPs with existence of pod's veth devices. If
the veth device is not found then it free's up the corresponding
allocation making the IP available for reuse.

Fixes aws#3109
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CNI ipamd reconcilation of in-use addresses
2 participants