Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to discover AMI by alias #7544

Open
maxsxu opened this issue Dec 19, 2024 · 3 comments
Open

Unable to discover AMI by alias #7544

maxsxu opened this issue Dec 19, 2024 · 3 comments
Assignees
Labels
bug Something isn't working triage/accepted Indicates that the issue has been accepted as a valid issue

Comments

@maxsxu
Copy link

maxsxu commented Dec 19, 2024

Description

Observed Behavior:

{"level":"ERROR","time":"2024-12-19T15:32:00.620Z","logger":"controller","message":"Reconciler error","commit":"3298d91","controller":"nodeclass.status","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","EC2NodeClass":{"name":"default"},"namespace":"","name":"default","reconcileID":"a7f7f4f5-0c9b-4c29-a11f-263dafa81493","error":"getting amis, getting AMI queries, failed to discover AMIs for alias \"al2023@v20241213\""}

Expected Behavior:

Able to discover AMI by alias.

Reproduction Steps (Please include YAML):

Using following EC2NodeÇlass will cause this issue:

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  role: test-cluster-ng-role
  amiSelectorTerms:
    - alias: al2023@v20241213   # Or al2023@latest
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "test-cluster"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "test-cluster"

Will result following status

status:
  conditions:
  - lastTransitionTime: "2024-12-19T17:58:19Z"
    message: object is awaiting reconciliation
    observedGeneration: 1
    reason: AwaitingReconciliation
    status: Unknown
    type: AMIsReady
  - lastTransitionTime: "2024-12-19T17:58:19Z"
    message: ""
    observedGeneration: 1
    reason: SubnetsReady
    status: "True"
    type: SubnetsReady
  - lastTransitionTime: "2024-12-19T17:58:19Z"
    message: ""
    observedGeneration: 1
    reason: SecurityGroupsReady
    status: "True"
    type: SecurityGroupsReady
  - lastTransitionTime: "2024-12-19T17:58:19Z"
    message: ""
    observedGeneration: 1
    reason: InstanceProfileReady
    status: "True"
    type: InstanceProfileReady
  - lastTransitionTime: "2024-12-19T17:58:19Z"
    message: AMIsReady=Unknown
    observedGeneration: 1
    reason: ReconcilingDependents
    status: Unknown
    type: Ready
  instanceProfile: test-cluster_4961034478281494142
  securityGroups:
  - id: sg-0acaf62b6e189b666
    name: test-cluster-node-20241218154546808000000001
  subnets:
  - id: subnet-0e4f85f692b99ffff
    zone: us-west-1c
    zoneID: usw1-az3
  - id: subnet-0494a6c43245d6666
    zone: us-west-1a
    zoneID: usw1-az1

The workaround for me is specifying the AMI ID and AMI Family with following spec:

spec:
  amiFamily: AL2023
  amiSelectorTerms:
  - id: ami-0784a08b412b69c00

Versions:

  • Chart Version: 1.1.1
  • Kubernetes Version (kubectl version): 1.30.7
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@maxsxu maxsxu added bug Something isn't working needs-triage Issues that need to be triaged labels Dec 19, 2024
@jmdeal
Copy link
Contributor

jmdeal commented Dec 19, 2024

Are you able to check your CloudTrail logs for failed GetParameter calls made by Karpenter? Can you also verify that you have sufficient permissions for ssm:GetParameter on your KarpenterControllerPolicy (ref)?

@jmdeal jmdeal added triage/needs-information Marks that the issue still needs more information to properly triage and removed needs-triage Issues that need to be triaged labels Dec 19, 2024
@maxsxu
Copy link
Author

maxsxu commented Dec 20, 2024

Thanks @jmdeal 👍 I've fixed it and the root cause is my Karpenter role is restricted to perform ssm actions by a permissions boundary.

And I suggest append the the permission errors to the controller log, so that we can better know what happened.

@jmdeal
Copy link
Contributor

jmdeal commented Dec 20, 2024

Agreed, they shouldn't omitted. Do you want to update the title to reflect the issue, e.g. something along the lines of "authorization failures aren't provided in logs for amiSelectorTerm aliases"?

@jmdeal jmdeal added triage/accepted Indicates that the issue has been accepted as a valid issue and removed triage/needs-information Marks that the issue still needs more information to properly triage labels Dec 20, 2024
@jmdeal jmdeal self-assigned this Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/accepted Indicates that the issue has been accepted as a valid issue
Projects
None yet
Development

No branches or pull requests

2 participants