Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation about cilium and aws-ebs-csi-driver #7451

Open
sylr opened this issue Nov 27, 2024 · 2 comments
Open

Add documentation about cilium and aws-ebs-csi-driver #7451

sylr opened this issue Nov 27, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation lifecycle/stale triage/needs-information Marks that the issue still needs more information to properly triage

Comments

@sylr
Copy link
Contributor

sylr commented Nov 27, 2024

Description

How can the docs be improved?

We have had issues in our clusters caused by dandling EBS volumes attachements on karpenter disrupted nodes.

Because of the taint karpenter.sh/disrupted:NoSchedule applied by karpenter on disrputed nodes, the cilium and aws-ebs-csi-driver agents where evicted before the aws-ebs-csi-driver could properly unmount the PVC and remove the volumeattachments.storage.k8s.io.

To resolve our issue we had to add the following tolerations to cilium and aws-ebs-csi-driver multiple daemonsets:

- effect: NoSchedule
  key: karpenter.sh/disrupted
  operator: Exists
@sylr sylr added documentation Improvements or additions to documentation needs-triage Issues that need to be triaged labels Nov 27, 2024
@z0rc
Copy link

z0rc commented Nov 28, 2024

aws-ebs-csi-driver installed as eks addon already tolerates all taints. aws-ebs-csi-driver installed as a helm chart already does so.

In addition karpenter v1 watches VolumeAttachments to ensure they are detached prior to shutdown, read up at https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/faq.md#6-minute-delays-in-attaching-volumes.

WRT Cilium and tolerations. Taint already explained at https://karpenter.sh/docs/concepts/disruption/#termination-controller, if you believe it's worth mentioning specific services there, then PRs are open.

@jonathan-innis jonathan-innis added triage/needs-information Marks that the issue still needs more information to properly triage and removed needs-triage Issues that need to be triaged labels Dec 10, 2024
Copy link
Contributor

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation lifecycle/stale triage/needs-information Marks that the issue still needs more information to properly triage
Projects
None yet
Development

No branches or pull requests

3 participants