-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate Node Affinity Scheduling #179
Comments
/assign |
I have been researching this alongside my normal day-to-day work. I should have some ideas to discuss shortly. |
There are discussions in SIG Cloud Provider concerning affinity and anti-affinity scheduling as it pertains to the underlying infrastructure. Going to table this for now and see if we can provide input to that effort in order to bite off this feature/functionality. |
/lifecycle frozen |
Going to start taking a look at this again. We are re-visit since consensus from SIG Cloud Provider was to do it out of core k8s. Should still target for post-1.0 release. |
I have been working on this proposal since the beginning of the year (attached). I am working on putting a patch for the same. |
@sujeet-banerjee I read through the doc and it looks like the doc is in relation to cluster api. Maybe I am missing something... There definitely needs to be an understanding about VMs and which physical hosts they are on, but the issue is that the scheduler doesn't know about the backing infrastructure when it comes time to scheduling pods on those worker nodes. The VMs themselves can move around within the cluster because of DRS, node failure and etc. It's also more than compute, but this also concerns fault domains on storage like VSAN. As example, let's assume you have the VMs distributed based on your doc in an ideal configuration, if you target a statefulset to be deployed to a certain region/zone, it's possible that all pods get placed on different (or even the same) VMs but they are placed in the same VSAN fault domain. If that particular fault domain, dies you will end up losing all your data. This is one such problem this issue is planning on addressing. The doc has a very cluster api centric view of how to address the problem looking at infrastructure upwards, but this proposed component needs to look at workload placement from the pod view downward. Maybe this proposal can help improve the ease of pod scheduling but having VM sit on hosts in an ideal fashion, but we are still talking about pod scheduling and workload placement within those VMs at the end of the day. |
Hi folks, is there any update here? |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
What happened:
Based in part on the work done here:
https://github.com/vmware/vsphere-affinity-scheduling-plugin
The text was updated successfully, but these errors were encountered: