-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add go vet
to the Makefile and fix its errors
#691
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@anishasthana I think |
Failed again. Same failing test "MCAD CPU Preemption Test"... 😢 It seems it is failing in other PRs as well. |
Hmm @asm582 are you aware of any recent issues with CI? |
I observed when porting the test suite to MCAD v2 that many of the tests only work as expected if the capacity of the cluster is calculated to be about 2 CPUs. If the cluster has more capacity, then the tests that expect certain AppWrappers to not be runnable or to be preempted because higher-priority AppWrappers will have taken all the available capacity fail. The failing test here is one of those. Looking at the logs, I see that the controller thinks the available capacity is about 6 CPUs
In the v2 test suite, I am redoing all of these tests to be based on a % of total CPU capacity instead of using hardwired resource values to avoid this fragility. |
Thank you for the PR. I am curious, if we are doing %-based CPU calculation then are we diverging from the way Scheduler does accounting? and would that be ok? |
I usually set 2 CPUs and 8 GB RAM as resource setting in podman or docker desktop and they would run to completion locally. I am not sure if we changed resource requirement in the CI system |
To be clear, tests like this are what are fragile in my opinion: https://github.com/project-codeflare/multi-cluster-app-dispatcher/blob/main/test/e2e/queue.go#L97-L125 because they are using hardwired CPU requests (not CPU requests that are calculated dynamically by the test as a fraction of the available capacity on the actual cluster when it is run). Here it creates an AppWrapper with a 1100 CPU request (2 pods at 550 each) and then assumes that if it creates a 854 CPU request (2 pods at 427 each..the function name is misleading) it won't fit. If docker is limited to 2 CPUs, then the worker nodes have 4000 CPU capacity, Kubernetes + the mcad operator take around 2100 CPU and the math works as expected. The slightest change in available resource (or in the resources used by the system pods) and the test doesn't work as expected. |
@dgrove-oss thanks for the detailed explanation of the root cause of the CI failure! |
Issue link
What changes have been made
go vet
to the MakefileVerification steps
Checks