-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive schedule strategy for UnitedDeployment #1720
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ea7f490
to
e9fd8d4
Compare
e58f279
to
1cc7c87
Compare
75b4117
to
c462401
Compare
c462401
to
68e8262
Compare
48f50fd
to
9cb391b
Compare
cba277a
to
a50e39e
Compare
b37c453
to
300e67c
Compare
pkg/controller/uniteddeployment/adapter/advanced_statefulset_adapter.go
Outdated
Show resolved
Hide resolved
79686a4
to
fb5c448
Compare
… adapter 1. adapter.go: GetReplicaDetails returns pods in the subset 2. xxx_adapter.go: return pods implementation ⬆️ 3. allocator.go: about safeReplica 4. pod_condition_utils.go: extract PodUnscheduledTimeout function from workloadwpread 5. reschedule.go: PodUnscheduledTimeout function extracted 6. subset.go: add some field to Subset object to carry related information 7. subset_control.go: store subset pods to Subset object 8. uniteddeployment_controller.go 1. add requeue feature to check failed pods 2. subset unschedulable status management 9. uniteddeployment_types.go: API change 10. uniteddeployment_update.go: sync unschedulable to CR Signed-off-by: AiRanthem <[email protected]>
fb5c448
to
378185c
Compare
Signed-off-by: AiRanthem <[email protected]>
numSubset := len(ac.Spec.Topology.Subsets) | ||
minReplicasMap := make(map[string]int32, numSubset) | ||
maxReplicasMap := make(map[string]int32, numSubset) | ||
notPendingReplicasMap := getSubsetRunningReplicas(nameToSubset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz change the local variable and logs e.g. notPendingReplicasMap, notPendingReplicas to runningReplicas...
|
||
if controllerutil.AddFinalizer(instance, UnitedDeploymentFinalizer) { | ||
klog.InfoS("adding UnitedDeploymentFinalizer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add uniteddeployment name
// to avoid memory leak | ||
klog.InfoS("cleaning up UnitedDeployment", "unitedDeployment", request) | ||
ResourceVersionExpectation.Delete(instance) | ||
if err = r.updateUnitedDeploymentInstance(instance); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the only clean-up work is to to delete expectation, it is not necessary to add a finalizer. we can just delete the expectation in the event handler. It save the works of add/del finalizer.
} | ||
|
||
// make sure latest version is observed | ||
ResourceVersionExpectation.Observe(instance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can observe the instance in the update handler of event handler
@@ -124,3 +125,95 @@ func TestReconcile(t *testing.T) { | |||
defer c.Delete(context.TODO(), instance) | |||
g.Eventually(requests, timeout).Should(gomega.Receive(gomega.Equal(expectedRequest))) | |||
} | |||
|
|||
func TestUnschedulableStatusManagement(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrite the ut using sub test and without gomega
if err != nil { | ||
r.recorder.Event(instance.DeepCopy(), corev1.EventTypeWarning, fmt.Sprintf("Failed%s", eventTypeDupSubsetsDelete), err.Error()) | ||
return nil, fmt.Errorf("fail to manage duplicate Subset of UnitedDeployment %s/%s: %s", instance.Namespace, instance.Name, err) | ||
} | ||
|
||
// If the Fixing scheduling strategy is used, the unschedulable state for all subsets remains false and | ||
// the UnschedulableStatus of Subsets are not managed. | ||
if instance.Spec.Topology.ScheduleStrategy.IsAdaptive() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider moving this code block to Reconcile, getNameToSubset seems to be a simple func that return subset struct from its name, it should not contains complex subset management logic
subset.Status.UnschedulableStatus.PendingPods++ | ||
} | ||
if checkAfter > 0 { | ||
durationStore.Push(unitedDeploymentKey, checkAfter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it better to only enqueue the earliest to be timeout pod ?
@@ -79,6 +80,17 @@ func (r *ReconcileUnitedDeployment) manageSubsets(ud *appsv1alpha1.UnitedDeploym | |||
if updateErr == nil { | |||
SetUnitedDeploymentCondition(newStatus, NewUnitedDeploymentCondition(appsv1alpha1.SubsetUpdated, corev1.ConditionTrue, "", "")) | |||
} else { | |||
// If using an Adaptive scheduling strategy, when the subset is scaled out leading to the creation of new Pods, | |||
// future potential scheduling failures need to be checked for rescheduling. | |||
var newPodCreated = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really necessary to trigger the requeue here given that manageUnschedulableStatusForExistingSubset had already push some duration time.
@@ -132,6 +144,11 @@ func (r *ReconcileUnitedDeployment) manageSubsetProvision(ud *appsv1alpha1.Unite | |||
return nil | |||
}) | |||
if createdErr == nil { | |||
// When a new subset is created, regardless of whether it contains newly created Pods, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really necessary to trigger the requeue here given that manageUnschedulableStatusForExistingSubset had already push some duration time.
@@ -348,10 +449,14 @@ func (r *ReconcileUnitedDeployment) classifySubsetBySubsetName(ud *appsv1alpha1. | |||
return mapping | |||
} | |||
|
|||
func (r *ReconcileUnitedDeployment) updateStatus(instance *appsv1alpha1.UnitedDeployment, newStatus, oldStatus *appsv1alpha1.UnitedDeploymentStatus, nameToSubset *map[string]*Subset, nextReplicas, nextPartition *map[string]int32, currentRevision, updatedRevision *appsv1.ControllerRevision, collisionCount int32, control ControlInterface) (reconcile.Result, error) { | |||
func (r *ReconcileUnitedDeployment) updateStatus(instance *appsv1alpha1.UnitedDeployment, newStatus, oldStatus *appsv1alpha1.UnitedDeploymentStatus, nameToSubset *map[string]*Subset, nextReplicas, nextPartition *map[string]int32, currentRevision, updatedRevision *appsv1.ControllerRevision, collisionCount int32, control ControlInterface) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is too many func params in the updateStatus, consider move the calculateStatus to Reconcile, and make updateStatus accept only the instance, new and old status.
Ⅰ. Describe what this PR does
Added an adaptive scheduling strategy to UnitedDeployment. During scaling up, if a subset causes some Pods to be unschedulable for certain reasons, the unschedulable Pods will be rescheduled to other partitions. During scaling down, if elastic allocation is used (i.e., the subset is configured with min/max), each partition will retain the ready Pods as much as possible without exceeding the maximum capacity, rather than strictly scaling down in reverse order of the Subset list.
Ⅱ. Does this pull request fix one issue?
fixes #1673
Ⅲ. Describe how to verify it
Use the yaml below to create a UD with subset-b unschedulable.
subset-c -> subset-b -> subset-a
Ⅳ. Special notes for reviews