-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues with handling unmanaged ENIs with IPv6 only #3122
Conversation
@orsenthil @jaydeokar Would you be able to take a look? We are attempting to work around issues with our primarily-ipv6 network so would be good to get this merged in |
@gavinbunney - I will review this shortly. It is my radar. |
pkg/awsutils/awsutils.go
Outdated
// This assumes we only have one trunk attached to the node.. | ||
if interfaceType == "trunk" { | ||
// The primary trunk eni requires an IPv4 address | ||
if interfaceType == "trunk" && len(eniMetadata.IPv4Addresses) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only change and assumption (even the previous that assumed only one trunk and now we are making assertion that one trunk with ipv4 address) that is causing a bit of concern to me.
What if you don't have this && len(eniMetadata.IPv4Addresses) > 0
assertion. Wouldn't the rest of the changes be sufficient? I see that getENIMetadata
is now protected and you wont run into issue with ipamd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is in the caller from ipamd when it's checking for a trunk eni, it uses the passed back metadataResult.TrunkENI
to search for it; in our case, it would pickup our managed ENI and not the aws-vpc-cni one - ref:
amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go
Lines 1317 to 1329 in 64748b4
if metadataResult.TrunkENI != "" { | |
for _, eni := range metadataResult.ENIMetadata { | |
if eni.ENIID == metadataResult.TrunkENI { | |
if err := c.setupENI(eni.ENIID, eni, true, false); err == nil { | |
log.Infof("ENI %s set up", eni.ENIID) | |
return true | |
} else { | |
log.Debugf("failed to setup ENI %s: %v", eni.ENIID, err) | |
return false | |
} | |
} | |
} | |
} |
An alternative might be to pull up the filtering of ENIs into the awsutils.go
file itself so it's filtered right when fetching from the AWS APIs, so they are dropped beforehand, so the aws-vpc-cni wouldn't need to worry about unique setup (like the other fix in this PR where there are no IPv4 addresses).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function func (c *IPAMContext) checkForTrunkENI() bool
is only called from
amazon-vpc-cni-k8s/pkg/ipamd/ipamd.go
Line 644 in 5c471fe
if c.enablePodENI && c.dataStore.GetTrunkENI() == "" { |
and is gated through c.enablePodENI
.
I assume you had ENABLE_POD_ENI
and IPV6
, and instead of AWS created trunk, this call returned your trunk and that caused a problem for you. A problem that is not shown in the traceback above. Is that correct?
when we create a trunk interface, we do not specify EnablePrimaryIpv6
which will guarantee that there will always be IPv4 Address. So, checking for IPV4 address won't break the behavior.
if interfaceType == "trunk" && len(eniMetadata.IPv4Addresses) > 0
is, but this introducing some special information due a to unique cluster configuration.
How about using tagMap[eniMetadata.ENIID] = convertSDKTagsToTags(ec2res.TagSet)
early and verifying that the trunk as no_manage
tag? Would that be more explicit ?
Other approaches like filtering early, or even documenting a bit further for clarity is fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it thanks. Let me have another look at filtering it out. I'll pull that change out of this PR in the meantime, so can get the IPv4 fixes merged in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing this in master here - #3156
a634305
to
516bc3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
516bc3b
to
f07558b
Compare
} | ||
var ec2ip4s []*ec2.NetworkInterfacePrivateIpAddress | ||
var subnetV4Cidr string | ||
if ipv4Available { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem I realized much later is, the way in which we check for ipv4Available
and ipv6Available
- seems to depend on the order in which imdsFields are returned.
for _, field := range macImdsFields {
if field == "local-ipv4s" {
imdsIPv4s, err := cache.imds.GetLocalIPv4s(ctx, eniMAC)
if err != nil {
awsAPIErrInc("GetLocalIPv4s", err)
return ENIMetadata{}, err
}
if len(imdsIPv4s) > 0 {
ipv4Available = true
log.Debugf("Found IPv4 addresses associated with interface. This is not efa-only interface")
break
}
}
if field == "ipv6s" {
imdsIPv6s, err := cache.imds.GetIPv6s(ctx, eniMAC)
if err != nil {
awsAPIErrInc("GetIPv6s", err)
} else if len(imdsIPv6s) > 0 {
ipv6Available = true
log.Debugf("Found IPv6 addresses associated with interface. This is not efa-only interface")
break
}
}
}
Previously we assumed the ENI (or primary) always will have an IPV4 and went with that approach. Now, since we are checking ipv4Available
explicitly, if for some reason order of checking of imds returned IPV6 and not ipv4 (due to break condition in the above code. then this introduces bug.
We will need to remove the break condition in the above loop to remain compatible with the previous behavior, that is, to always get the IPV4 address for the primary ENI.
This reverts commit 7b46f6b.
This reverts commit 7b46f6b.
What type of PR is this?
bug
Which issue does this PR fix?:
We have unmanaged trunk ENIs attached to our worker nodes which are designed for ipv6 only networks. These ENIs are tagged with
node.k8s.amazonaws.com/no_manage
, however the listing of attached ENIs happens before the IPAMD process filters those ENIs. As such, the metadata retrieval process fails when looking up the ipv4 address details for these ENIs, and causes theaws-k8s-agent
process to exit with an initialization failure.In this log,
eni-084b51xxxxxx
/0e:7c:f9:xx:xx:xx
is the aws-vpc-cni managed ENI, andeni-0b2cf8b6xxxxxx
/0e:f3:73:xx:xx:xx
is our managed eni:What does this PR do / Why do we need it?:
This PR better handles missing IPv4 information for ENIs which are IPv6 only (using same pattern that the IPv6 IPs lookup uses)
Testing done on this change:
Added unit tests to cover the new paths. Running in our EKS cluster the ENIs are now retrieved successfully (In this log,
eni-084b51xxxxxx
/0e:7c:f9:xx:xx:xx
is the aws-vpc-cni managed ENI, andeni-0b2cf8b6xxxxxx
/0e:f3:73:xx:xx:xx
is our managed eni):Will this PR introduce any new dependencies?:
n/a
Will this break upgrades or downgrades? Has updating a running cluster been tested?:
Tested with upgrading inplace without issues
Does this change require updates to the CNI daemonset config files to work?:
n/a
Does this PR introduce any user-facing change?:
n/a
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.