-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Brupop is stuck at RebootedIntoUpdate state #650
Comments
I'll take a look. Do you mind sharing which version of Brupop this is using? |
This configuration:
Means that your host "staged" the update. It's installed to a the alternate disk partition and Bottlerocket is ready to flip to it upon reboot. The host is attemping to move into the In order to enter the rebooted state, the host:
Yes, I think so. Brupop respects your PDB, so at the moment it's probably attempting to evict a protected pod, but Kubernetes is not allowing any disruptions. The reason why would become more clear if you shared your PDB's spec, and more information about what pods are running and where (as well as their current status). If you want more logs from Brupop's side, the drain would be completed by one of Brupop's |
PDB's Spec for Two of my services - these services are running on the same node which is stuck at RebootedIntoUpdate state.
Any solution for this kind of situation? It’s very unlikely that all services will always run in the desired state, especially in lower environments where people frequently conduct experiments and testing. Looks like if the service's current running pod count is 0, then Allowed disruptions will also be 0, and brupop never completes its upgrade task and always will be in a stuck state |
The interface that Brupop uses to interact with PDBs is that it makes an eviction request to the Kubernetes API, then that API responds specially depending on the state of the target pod, PDBs, etc. Here's the code that handles draining and PDBs.. So basically:
There's not really additional information provided during this interaction, Brupop assumes that the PDBs configuration must be satisfied and therefor waits to attempt the eviction again when it's possible that the cluster state has changed such that the PDB will no longer be dissatisfied. I suppose my advice here would be that the cluster needs to return to a state in which Brupop's drain would not appear to Kubernetes as though it were disrupting a PDB. Perhaps the unhealthy service should trigger a rollback to a healthy state? |
Another alternative for lower-stakes dev environments could be to specifically remove the PDBs in those environments. |
I understand that Brupop respects the PDB, but what's the point of respecting the PDB of services that are running 0 pods because of CrashLoopBackOff and have restarted 100+ times? It's very unlikely that all the pods should be in the running state all the time, and if pods are not in the running state, the allowed disruptions will be 0 always. hence Brupop will be stuck in the upgrade process. In my case, Brupop is stuck because of the pods showing CrashLoopBackOff status. Note: - Below example is different than CrashLoopBackOff
|
Brupop has been stuck at
RebootedIntoUpdate
for a long time, and nothing can be seen in the logs related to this status and error.Logs -
Note:- Out of 3 nodes, 1 node was updated successfully, this was the second node.
and we have PDBs. It will not work with PDBs ??
Is this because of this configuration
The text was updated successfully, but these errors were encountered: