Added
- Added an optional ServiceMonitor for prometheus to the Helm chart (#559 thanks @dani-CO-CN!)
- Added the ability to overwrite resource limits and requests to the Helm chart (#560 thanks @dani-CO-CN!)
- Added the ability to overwrite priorityClass to the Helm chart (#567 thanks @dani-CO-CN!)
- Added the ability to apply additional pod labels to the Helm chart (#617 thanks @danielvincenzi!)
Fixed
- Fixed a bug which could cause nodes to remain cordoned after an update (#631)
- Fixed a bug that could cause the agent to require a few restarts before stabilizing (#568)
Misc
- Numerous dependency upgrades and documentation fixes (#638, #622, #602, #585 thanks @umizoom!, #550 thanks @mooneeb!)
Added
- Added the ability to configure log formatting and filtering options via Helm values (#503, #512, #516)
- Added the ability to configure pod placement via Helm values (#513, #516)
Fixed
- Improved rate-limiting and backoff between the agent and the local Bottlerocket update API (#496, #505)
- Improved rate-limiting and backoff between the agent and Brupop's apiserver (#505)
- Added backoff to requests to Kubernetes
watch
APIs (#506) - Fixed an issue that caused Prometheus metrics to include stale data (#511)
- Fixed incorrect resource requests for the agent pod in the Helm chart (#504)
- Removed unnecessary rbac permissions granted to the apiserver and controller (#507)
Misc
Added
- Refactored deployment and yaml generation to use Helm templating (#126)
- Improved time window to become cron expression based scheduler (#343), (#428)
Fixed
- Removed unnecessary dep on older "time" crate from chrono (#415)
- Fixed metrics not working for ipv6 clusters (#406)
- Makefile: refactors brupop-image target (#418)
- Cargo: use env vars when calling cargo (#462)
Misc
Added
- Removed OpenSSL in favor of Rust-based TLS using rustls (#401)
- Updated TLS configurations to use leaf certs generated from root CA for brupop API server and agent (#340)
- Added resource request limits for all containers (#327)
Fixed
- Exposed the failure output for the
apiclient
when error occurs (#342) kube
clients are now created using the in-cluster DNS configuration (#373)- Removed deprecated Rust library APIs (#403)
- Integration tests now use IMDSv2 calls (#405)
Misc
- Numerous dependency upgrades and documentation fixes
- GitHub action workflows now use larger 16 core runners (#356)
Added
- Mechanism to constrain updates to a certain update time window (#241)
- Option to exclude node before draining - (#231)
- Port configuration (#315)
- Support for concurrent updates - (#238)
- Automatic prometheus scraping annotations for controller's service (#269)
- Use
ca.crt
in SSL - (#260) - Reload certificates periodically to ensure no service loss (#280)
- Replaced
bunyan
style logging in favor of human readable logs (#298) - Support webhook conversions from v2 to v1 (to support the Kubernetes pinwheel model) (#308)
- Support integration tests in AWS China region (#317) (#318)
Fixed
- Upgraded Bottlerocket SDK to consume fix for OpenSSl CVE-2022-3602 and CVE-2022-3786 (#331)
- Gracefully exit Brupop agent when rebooting node (#218)
- Clean up
bottlerocketshadows
when Brupop resources are removed from the cluster (#235) - Clarify crossbeam license (#250)
- Made error handling module specific (#279) (#291)
Misc
- Numerous dependency updates
- Fixed clippy linting / warnings (#267)
- Clear and remove GitHub actions cache (#268) (#286)
- Added step to integration tests to automatically add and delete cert-manager (#320)
- Added GitHub action step to catch changes to deployment manifest (#321)
Added
- Add support to protect controller from becoming unscheduleable (#14)
- Apply common k8s labels to all created resources (#113)
- Support SSL communication between brupop-agent and brupop-apiserver (#127)
- Handle update-reboot failures/ "crash loops" (#161), (#123)
- Update README for setting up SSL (#211)
Fixed
- Remove empty categories in Custom Resource spec (#205)
Added
- Add README on integration test tool (#166)
- Add integration testing subcommand Monitor which monitors new nodes for successful updates (#130)
- Support integration test for IPv6 cluster (#186)
- Improve integration testing subcommand Integration-test to creates the bottlerocket nodes via nodegroups (#162)
Fixed
Fixed:
- Fixed an issue where Node drains would hang indefinitely on StatefulSet Pods (#168), (#179)
- Added more restrictive checking of TokenReviewStatus during apiserver auth
Added:
- Added support for IPv6 cluster (#178)
Bottlerocket Update Operator (Brupop) 0.2.0 is a complete overhaul and rewrite of the update operator. It will, by default, continue to rely on Bottlerocket’s client-side update API to determine when to perform an update on any given node — foregoing any complex deployment velocity controls, and instead relying on the wave system built-in to update Bottlerocket. Compared to Brupop 0.1.0, Brupop 0.2.0 not only improves performance, but also increases observability while scoping down permissions required by the update operator agent.
When installed, the Bottlerocket update operator starts a controller deployment on one node, an agent daemon set on every Bottlerocket node, and an Update Operator API Server deployment. The controller orchestrates updates across your cluster, while the agent is responsible for periodically querying for Bottlerocket updates, draining the node, and performing the update when asked by the controller. Instead of having the independent controller and agent cooperate and pass messages via RPC, Brupop 0.2.0 associates a Custom Resource (called BottlerocketShadow) with each Bottlerocket node containing status information about the node, as well as a desired state. The agent performs all cluster object mutation operations via the API Server. Service Account Token Volume Projection is used in API Server instead of the usual Kubernetes rbac system for authorization to limit sufficient permissions for any node being able to modify any other nodes.
Brupop 0.2.0 also integrates with Prometheus by exposing an HTTP endpoint from which Prometheus can gather metrics, allowing customers insight into the actions that the operator is taking.
Fixed:
- Fixed a bug preventing nodes from being drained of certain pod deployments (#74)
- Add more detailed context handling (#71)
- Increased the amount of logging across the entirety of the operator (#68)
- Added Prometheus metrics support (#132)
- Added the ability to monitor cluster state by querying custom resources with kubectl (#101), (#85)
- Simplified license scan and build process to use a single Dockerfile (#147)
Removed:
- Deprecated updog platform integration in favor of Bottlerocket API (#60)
- Use ECR Public image instead of region-specific image (#65)
- Reduced memory and CPU limits for Agent pod. (#55)
- Updated kubernetes client version (#70)
- Updated Bottlerocket SDK version (#63)
To use the update API, nodes must be labeled with the 2.0.0
interface version:
bottlerocket.aws/updater-interface-version=2.0.0
To configure the use of the update API on all nodes in a cluster:
-
Ensure desired nodes are on bottlerocket
v0.4.1
or later -
Set the
updater-interface-version
to2.0.0
on nodes:
kubectl label node --overwrite=true $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}') bottlerocket.aws/updater-interface-version=2.0.0
-
Add SELinux process label allowing API accesses by agent #40
-
Fix deduplication filter in cases that could deadlock agent #41
- Add missing backtick in README instructions (#25)
- Add license info to the operator container images (#6)
- Specify passing
-c
towatch
in README instructions for monitoring node status (#27) - Bump bottlerocket-sdk version to v0.10.1 for building the update operator's binaries. (#21)
- Bump the version of the golang image to 1.14.1 to match the Go toolchain version in the bottlerocket-sdk. (#31)
This release includes a breaking change for users upgrading from v0.1.2:
- Change
platform-version
label toupdater-interface-version
for indicating updater interface version (#30)
Please apply the new label on your bottlerocket nodes if you wish to use v0.1.3 of the update operator:
kubectl label node $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}') bottlerocket.aws/updater-interface-version=1.0.0
To remove the deprecated label from the nodes:
kubectl label nodes --all "bottlerocket.aws/platform-version"-
Initial release of bottlerocket-update-operator - a Kubernetes operator that coordinates Bottlerocket updates on hosts in a cluster..
See the README for additional information.