Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Node Affinity Scheduling #179

Open
davidvonthenen opened this issue Apr 3, 2019 · 8 comments
Open

Investigate Node Affinity Scheduling #179

davidvonthenen opened this issue Apr 3, 2019 · 8 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@davidvonthenen
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature

What happened:
Based in part on the work done here:
https://github.com/vmware/vsphere-affinity-scheduling-plugin

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 3, 2019
@davidvonthenen
Copy link
Contributor Author

/assign

@davidvonthenen
Copy link
Contributor Author

I have been researching this alongside my normal day-to-day work. I should have some ideas to discuss shortly.

@davidvonthenen
Copy link
Contributor Author

There are discussions in SIG Cloud Provider concerning affinity and anti-affinity scheduling as it pertains to the underlying infrastructure. Going to table this for now and see if we can provide input to that effort in order to bite off this feature/functionality.

@frapposelli frapposelli added this to the Next milestone Jun 5, 2019
@frapposelli frapposelli added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jul 3, 2019
@frapposelli
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 3, 2019
@davidvonthenen
Copy link
Contributor Author

davidvonthenen commented Aug 5, 2019

Going to start taking a look at this again. We are re-visit since consensus from SIG Cloud Provider was to do it out of core k8s.

Should still target for post-1.0 release.

@sujeet-banerjee
Copy link

sujeet-banerjee commented Aug 6, 2019

I have been working on this proposal since the beginning of the year (attached). I am working on putting a patch for the same.
Spec_changes_for_AntiAffinity.docx
Test_n_Demo.pdf

@davidvonthenen
Copy link
Contributor Author

@sujeet-banerjee I read through the doc and it looks like the doc is in relation to cluster api. Maybe I am missing something... There definitely needs to be an understanding about VMs and which physical hosts they are on, but the issue is that the scheduler doesn't know about the backing infrastructure when it comes time to scheduling pods on those worker nodes. The VMs themselves can move around within the cluster because of DRS, node failure and etc. It's also more than compute, but this also concerns fault domains on storage like VSAN.

As example, let's assume you have the VMs distributed based on your doc in an ideal configuration, if you target a statefulset to be deployed to a certain region/zone, it's possible that all pods get placed on different (or even the same) VMs but they are placed in the same VSAN fault domain. If that particular fault domain, dies you will end up losing all your data. This is one such problem this issue is planning on addressing.

The doc has a very cluster api centric view of how to address the problem looking at infrastructure upwards, but this proposed component needs to look at workload placement from the pod view downward. Maybe this proposal can help improve the ease of pod scheduling but having VM sit on hosts in an ideal fashion, but we are still talking about pod scheduling and workload placement within those VMs at the end of the day.

@frapposelli frapposelli removed this from the Next milestone Sep 4, 2019
@davidvonthenen davidvonthenen added this to the Next milestone Feb 5, 2020
@s0uky
Copy link

s0uky commented Oct 12, 2023

Hi folks, is there any update here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

5 participants