Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI driver does not take into account Dynamically provisioned VPC CNIs for allocatable count calculation. #2249

Open
prad9192 opened this issue Nov 29, 2024 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@prad9192
Copy link

prad9192 commented Nov 29, 2024

/kind bug

What happened?

The allocatableCount reported by CSINode on EKS clusters doesn't accurately take into account the actual number of ENIs attached to the nodes for calculating allocatable.count. The calculation only considers statically attached ENIs present at node bootstrap and doesn't account for the ENIs dynamically allocated by the VPC CNI. This leads to a static allocatableCount that doesn't update as the VPC CNI attaches more ENIs to accommodate new workloads.

What you expected to happen?

The allocatableCount should dynamically take into account the actual number of ENIs attached to the node, including both static ENIs and those dynamically provisioned by the VPC CNI. This would provide an accurate representation allocation.count for CSINode.

 apiVersion: storage.k8s.io/v1
    kind: CSINode
    metadata:
      annotations:
        storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/vsphere-volume
      creationTimestamp: "2024-10-21T09:02:47Z"
      name: xxxxx
      ownerReferences:
        - apiVersion: v1
          kind: Node
          name: xxxxx
          uid: 2da9c59d-1ac2-42bd-9e06-4ec01127153e
    spec:
      drivers:
        - allocatable:
            count: 25
          name: ebs.csi.aws.com
          nodeID: i-xxxx
          topologyKeys:
            - kubernetes.io/os
            - topology.ebs.csi.aws.com/zone
            - topology.kubernetes.io/zone

How to reproduce it (as minimally and precisely as possible)?

  1. Create an EKS cluster with nodes using an instance type (e.g., r6 instances).
  2. Observe the initial allocatableCount of ENIs reported on CSINode resource. This value will be based on the instance's maximum ENI limit minus the initial ENIs attached + EBS volumes at bootstrap.
  3. Deploy workloads that require the VPC CNI to attach additional ENIs to the nodes.
  4. Observe that the allocatableCount remains static even though the actual number of attached ENIs has increased.

Anything else we need to know?:

This issue can lead to inaccurate resource reporting, and difficulties in managing workloads. Leading to below errors on workloads.

  Warning  FailedAttachVolume  80s (x12 over 56m)  attachdetach-controller  (combined from similar events): AttachVolume.Attach failed for volume "pvc-0c2a501a-bb06-4c9b-95aa-4cda4fb6aac2" : rpc error: code = Internal desc = Could not attach volume "vol-07b6e18e94978a87f" to node "i-0240e6b849f452539": WaitForAttachmentState AttachVolume error, expected device but be attached but was attaching, volumeID="vol-07b6e18e94978a87f", instanceID="i-0240e6b849f452539", Device="/dev/xvdam", err=operation error EC2: AttachVolume, https response error StatusCode: 400, RequestID: 7edb421b-9dc2-4001-af1e-73d628fabfb5, api error VolumeInUse: vol-07b6e18e94978a87f is already attached to an instance

Environment

  • Kubernetes version (use kubectl version):
    • Client Version: v1.31.2
    • Kustomize Version: v5.4.2
    • Server Version: v1.29.10-eks-7f9249a
  • Driver version: Amazon EBS CSI Driver version: v1.34.0-eksbuild.1
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 29, 2024
@ElijahQuinones
Copy link
Member

ElijahQuinones commented Dec 2, 2024

Hi @prad9192

This is a known issue that we are actively working on. Currently the CSINode Allocatable property is immutable and as such we can not change it after startup to account for newly attached ENIs. We have purposed KEP-4876 to address this issue and we are targeting 1.33 for alpha. For some workarounds for this issue please see our FAQ.

@prad9192
Copy link
Author

/status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants