Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with handling unmanaged ENIs with IPv6 only #3122

Merged
merged 1 commit into from
Dec 3, 2024

Conversation

gavinbunney
Copy link
Contributor

@gavinbunney gavinbunney commented Nov 22, 2024

What type of PR is this?
bug

Which issue does this PR fix?:

We have unmanaged trunk ENIs attached to our worker nodes which are designed for ipv6 only networks. These ENIs are tagged with node.k8s.amazonaws.com/no_manage, however the listing of attached ENIs happens before the IPAMD process filters those ENIs. As such, the metadata retrieval process fails when looking up the ipv4 address details for these ENIs, and causes the aws-k8s-agent process to exit with an initialization failure.

In this log, eni-084b51xxxxxx / 0e:7c:f9:xx:xx:xx is the aws-vpc-cni managed ENI, and eni-0b2cf8b6xxxxxx / 0e:f3:73:xx:xx:xx is our managed eni:

{"level":"debug","ts":"2024-11-22T16:49:26.314Z","caller":"awsutils/awsutils.go:1317","msg":"Total number of interfaces found: 2 "}
{"level":"debug","ts":"2024-11-22T16:49:26.314Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0e:7c:f9:xx:xx:xx"}
{"level":"debug","ts":"2024-11-22T16:49:26.316Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-084b51xxxxxx, MAC 0e:7c:f9:xx:xx:xx, device 0"}
{"level":"debug","ts":"2024-11-22T16:49:26.318Z","caller":"awsutils/awsutils.go:567","msg":"Found IPv6 addresses associated with interface. This is not efa-only interface"}
{"level":"debug","ts":"2024-11-22T16:49:26.322Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0e:f3:73:xx:xx:xx"}
{"level":"debug","ts":"2024-11-22T16:49:26.324Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-0b2cf8b6xxxxxx, MAC 0e:f3:73:xx:xx:xx, device 1"}
{"level":"debug","ts":"2024-11-22T16:49:26.326Z","caller":"awsutils/awsutils.go:567","msg":"Found IPv6 addresses associated with interface. This is not efa-only interface"}
{"level":"warn","ts":"2024-11-22T16:49:26.327Z","caller":"awsutils/imds.go:376","msg":"failed to retrieve network/interfaces/macs/0e:f3:73:xx:xx:xx/subnet-ipv4-cidr-block from instance metadata EC2MetadataError: failed to make EC2Metadata request\n<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\t \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n  <title>404 - Not Found</title>\n </head>\n <body>\n  <h1>404 - Not Found</h1>\n </body>\n</html>\n\n\tstatus code: 404, request id: "}
{"level":"error","ts":"2024-11-22T16:49:26.327Z","caller":"aws-k8s-agent/main.go:42","msg":"Initialization failure: ipamd init: failed to retrieve attached ENIs info: DescribeAllENIs: failed to get local ENI metadata: get attached ENIs: failed to retrieve ENI metadata for ENI: 0e:f3:73:xx:xx:xx: EC2MetadataError: failed to make EC2Metadata request\n<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\t \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n  <title>404 - Not Found</title>\n </head>\n <body>\n  <h1>404 - Not Found</h1>\n </body>\n</html>\n\n\tstatus code: 404, request id: "}

What does this PR do / Why do we need it?:

This PR better handles missing IPv4 information for ENIs which are IPv6 only (using same pattern that the IPv6 IPs lookup uses)

Testing done on this change:

Added unit tests to cover the new paths. Running in our EKS cluster the ENIs are now retrieved successfully (In this log, eni-084b51xxxxxx / 0e:7c:f9:xx:xx:xx is the aws-vpc-cni managed ENI, and eni-0b2cf8b6xxxxxx / 0e:f3:73:xx:xx:xx is our managed eni):

{"level":"debug","ts":"2024-11-22T17:45:13.815Z","caller":"awsutils/awsutils.go:1325","msg":"Total number of interfaces found: 2 "}
{"level":"debug","ts":"2024-11-22T17:45:13.815Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0e:7c:f9:8b:06:fd"}
{"level":"debug","ts":"2024-11-22T17:45:13.817Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-084b51xxxxxx, MAC 0e:7c:f9:xx:xx:xx, device 0"}
{"level":"debug","ts":"2024-11-22T17:45:13.818Z","caller":"awsutils/awsutils.go:567","msg":"Found IPv6 addresses associated with interface. This is not efa-only interface"}
{"level":"debug","ts":"2024-11-22T17:45:13.819Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0e:f3:73:xx:xx:xx"}
{"level":"debug","ts":"2024-11-22T17:45:13.821Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-0b2cf8b6xxxxxx, MAC 0e:f3:73:xx:xx:xx, device 1"}
{"level":"debug","ts":"2024-11-22T17:45:13.822Z","caller":"awsutils/awsutils.go:567","msg":"Found IPv6 addresses associated with interface. This is not efa-only interface"}
{"level":"info","ts":"2024-11-22T17:45:14.003Z","caller":"ipamd/ipamd.go:424","msg":"Got network card index 0 for ENI eni-084b51xxxxxx"}
{"level":"info","ts":"2024-11-22T17:45:14.004Z","caller":"ipamd/ipamd.go:424","msg":"eni-084b51xxxxxx is of type: interface"}
{"level":"info","ts":"2024-11-22T17:45:14.004Z","caller":"ipamd/ipamd.go:424","msg":"Got network card index 0 for ENI eni-0b2cf8b6xxxxxx"}
{"level":"info","ts":"2024-11-22T17:45:14.004Z","caller":"ipamd/ipamd.go:424","msg":"eni-0b2cf8b6xxxxxx is of type: trunk"}
{"level":"debug","ts":"2024-11-22T17:45:14.004Z","caller":"ipamd/ipamd.go:385","msg":"DescribeAllENIs success: ENIs: 2, tagged: 2"}

Will this PR introduce any new dependencies?:
n/a

Will this break upgrades or downgrades? Has updating a running cluster been tested?:
Tested with upgrading inplace without issues

Does this change require updates to the CNI daemonset config files to work?:
n/a

Does this PR introduce any user-facing change?:
n/a

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gavinbunney gavinbunney requested a review from a team as a code owner November 22, 2024 19:23
@gavinbunney
Copy link
Contributor Author

@orsenthil @jaydeokar Would you be able to take a look? We are attempting to work around issues with our primarily-ipv6 network so would be good to get this merged in

@orsenthil
Copy link
Member

@gavinbunney - I will review this shortly. It is my radar.

// This assumes we only have one trunk attached to the node..
if interfaceType == "trunk" {
// The primary trunk eni requires an IPv4 address
if interfaceType == "trunk" && len(eniMetadata.IPv4Addresses) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only change and assumption (even the previous that assumed only one trunk and now we are making assertion that one trunk with ipv4 address) that is causing a bit of concern to me.

What if you don't have this && len(eniMetadata.IPv4Addresses) > 0 assertion. Wouldn't the rest of the changes be sufficient? I see that getENIMetadata is now protected and you wont run into issue with ipamd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is in the caller from ipamd when it's checking for a trunk eni, it uses the passed back metadataResult.TrunkENI to search for it; in our case, it would pickup our managed ENI and not the aws-vpc-cni one - ref:

if metadataResult.TrunkENI != "" {
for _, eni := range metadataResult.ENIMetadata {
if eni.ENIID == metadataResult.TrunkENI {
if err := c.setupENI(eni.ENIID, eni, true, false); err == nil {
log.Infof("ENI %s set up", eni.ENIID)
return true
} else {
log.Debugf("failed to setup ENI %s: %v", eni.ENIID, err)
return false
}
}
}
}

An alternative might be to pull up the filtering of ENIs into the awsutils.go file itself so it's filtered right when fetching from the AWS APIs, so they are dropped beforehand, so the aws-vpc-cni wouldn't need to worry about unique setup (like the other fix in this PR where there are no IPv4 addresses).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function func (c *IPAMContext) checkForTrunkENI() bool is only called from

if c.enablePodENI && c.dataStore.GetTrunkENI() == "" {

and is gated through c.enablePodENI .

I assume you had ENABLE_POD_ENI and IPV6, and instead of AWS created trunk, this call returned your trunk and that caused a problem for you. A problem that is not shown in the traceback above. Is that correct?


In,
https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/b5a054e051e0a223001c3008a6f1245fbbbcf176/pkg/aws/ec2/api/helper.go#L125C1-L130C3

when we create a trunk interface, we do not specify EnablePrimaryIpv6 which will guarantee that there will always be IPv4 Address. So, checking for IPV4 address won't break the behavior.

if interfaceType == "trunk" && len(eniMetadata.IPv4Addresses) > 0 is, but this introducing some special information due a to unique cluster configuration.

How about using tagMap[eniMetadata.ENIID] = convertSDKTagsToTags(ec2res.TagSet) early and verifying that the trunk as no_manage tag? Would that be more explicit ?

Other approaches like filtering early, or even documenting a bit further for clarity is fine with me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it thanks. Let me have another look at filtering it out. I'll pull that change out of this PR in the meantime, so can get the IPv4 fixes merged in

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this in master here - #3156

Copy link
Member

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@orsenthil orsenthil merged commit 5daa885 into aws:master Dec 3, 2024
4 checks passed
@gavinbunney gavinbunney deleted the gavin/unmanaged-eni branch December 3, 2024 17:47
}
var ec2ip4s []*ec2.NetworkInterfacePrivateIpAddress
var subnetV4Cidr string
if ipv4Available {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem I realized much later is, the way in which we check for ipv4Available and ipv6Available - seems to depend on the order in which imdsFields are returned.

	for _, field := range macImdsFields {
		if field == "local-ipv4s" {
			imdsIPv4s, err := cache.imds.GetLocalIPv4s(ctx, eniMAC)
			if err != nil {
				awsAPIErrInc("GetLocalIPv4s", err)
				return ENIMetadata{}, err
			}
			if len(imdsIPv4s) > 0 {
				ipv4Available = true
				log.Debugf("Found IPv4 addresses associated with interface. This is not efa-only interface")
				break
			}
		}
		if field == "ipv6s" {
			imdsIPv6s, err := cache.imds.GetIPv6s(ctx, eniMAC)
			if err != nil {
				awsAPIErrInc("GetIPv6s", err)
			} else if len(imdsIPv6s) > 0 {
				ipv6Available = true
				log.Debugf("Found IPv6 addresses associated with interface. This is not efa-only interface")
				break
			}
		}
	}

Previously we assumed the ENI (or primary) always will have an IPV4 and went with that approach. Now, since we are checking ipv4Available explicitly, if for some reason order of checking of imds returned IPV6 and not ipv4 (due to break condition in the above code. then this introduces bug.

We will need to remove the break condition in the above loop to remain compatible with the previous behavior, that is, to always get the IPV4 address for the primary ENI.

orsenthil added a commit to orsenthil/amazon-vpc-cni-k8s that referenced this pull request Dec 18, 2024
orsenthil added a commit to orsenthil/amazon-vpc-cni-k8s that referenced this pull request Dec 18, 2024
orsenthil added a commit that referenced this pull request Dec 18, 2024
orsenthil added a commit that referenced this pull request Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants