-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dry run changes #7525
base: main
Are you sure you want to change the base?
dry run changes #7525
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled.
|
@@ -25,6 +25,7 @@ const ( | |||
ConditionTypeAMIsReady = "AMIsReady" | |||
ConditionTypeInstanceProfileReady = "InstanceProfileReady" | |||
ConditionTypeValidationSucceeded = "ValidationSucceeded" | |||
ConditionTypeNotDegraded = "NodeclassNotDegraded" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that an authorization is a hard-stop (as-in, we can't create any resources for sure if we don't have proper permissions, I see this as a validation failure, not a Degraded case)
In general, I would think about the classification as:
- Degraded: Something where we suspect that there is an issue with the launch config but aren't 100% sure -- this informs a user to look at the NodePool but we will keep launching nodes to keep trying
- ValidationFailed: Something where we know that there is an issue with the launch config -- this inform a user to look at the NodePool and we will stop launching nodes with this NodePool
return reconcile.Result{}, nil | ||
} | ||
nodeClaims := &karpv1.NodeClaimList{} | ||
if err := d.kubeClient.List(ctx, nodeClaims, nodeclaimutils.ForNodeClass(nodeClass)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should get real NodeClaims for this -- we shouldn't require actual nodes to have been launched to determine that there is an issue with our authorization -- we should be able to "mock" a NodeClaim for the launch and then use that to execute the dry-run
@@ -242,8 +242,13 @@ func (p *DefaultProvider) launchInstance(ctx context.Context, nodeClass *v1.EC2N | |||
} else { | |||
createFleetInput.OnDemandOptions = &ec2types.OnDemandOptionsRequest{AllocationStrategy: ec2types.FleetOnDemandAllocationStrategyLowestPrice} | |||
} | |||
|
|||
createFleetInput.DryRun = lo.ToPtr(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This updates the existing provider -- I don't think you want to update this call for everything -- you just want to pass an option to the create that ensures that it succeeds under dry-run.
if err := d.kubeClient.List(ctx, nodeClaims, nodeclaimutils.ForNodeClass(nodeClass)); err != nil { | ||
return reconcile.Result{}, fmt.Errorf("listing nodeclaims that are using nodeclass, %w", err) | ||
} | ||
_, err := d.instanceProvider.Create(ctx, nodeClass, &nodeClaims.Items[0], nodeClass.Spec.Tags, lo.Must(d.resolveInstanceTypes(ctx, &nodeClaims.Items[0], nodeClass))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would require an update to the CloudProvider interface, so I'm not 100% convinced that it's a good idea, but it would be kinda nice if we could just run a cloudProvider.Create() under dry-run and then execute all of the underlying config just the same -- I guess what this would really take would be to inject an EC2API that just always executes everything under dry-run every time -- but this basically makes sure that we run through the full "launch sequence" without launching an actual instance
return reconcile.Result{}, nil | ||
} | ||
|
||
func (d *Degraded) resolveInstanceTypes(ctx context.Context, nodeClaim *karpv1.NodeClaim, nodeClass *v1.EC2NodeClass) ([]*corecloudprovider.InstanceType, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not ideal that we have to rewrite this function again with this change
Fixes #N/A
Description
How was this change tested?
Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.