Skip to content

Commit

Permalink
Add velero migration guide for KKP 2.26
Browse files Browse the repository at this point in the history
  • Loading branch information
xrstf committed Sep 30, 2024
1 parent 89574bd commit f455959
Showing 1 changed file with 130 additions and 15 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -31,27 +31,142 @@ KKP 2.26 ships a lot of major version upgrades for the Helm charts, most notably

Some of these updates require manual intervention or at least checking whether a given KKP system is affected by upstream changes. Please read the following sections carefully before beginning the upgrade.

#### Velero 1.13
### Velero 1.14

Velero was updated from 1.10 to 1.13, which includes a number of significant improvements internally.
Velero was updated from 1.10 to 1.14, which includes a number of significant improvements internally. In the same breath, KKP also replaced the custom Helm chart with the [official upstream Helm chart](https://github.com/vmware-tanzu/helm-charts/tree/main/charts/velero) for Velero.

* The default file-level backup tool was changed from Restic to Kopia. To keep backwards-compatibility, the KKP `velero` chart now explicitly configures Restic, but we expect that switching to Kopia will be mandatory in the future. Please use the `restic.uploaderType` variable in the `values.yaml` to switch to Kopia when desired.
Due to labelling changes, and in-place upgrade of Velero is not possible. It's recommended to delete the previous Helm release and install the chart fresh. Existing backups will not be affected by the switch, but Velero configuration created by Helm will be deleted and re-created (like `BackupStorageLocations`).

#### cert-manager 1.14
#### Configuration Changes

The switch to the upstream Helm chart requires adjusting the `values.yaml` used to install Velero. Most existing settings have a 1:1 representation in the new chart:

* `velero.podAnnotations` is now `velero.annotations`
* `velero.serverFlags` is now `velero.configuration.*` (each CLI flag is its own field in the YAML file, e.g. `serverFlags:["--log-format=json"]` would become `configuration.logFormat: "json"`)
* `velero.uploaderType` is now `velero.configuration.uploaderType`; note that the default has changed from restic to Kopia, see the next section below for more information.
* `velero.credentials` is now `velero.credentials.*`
* `velero.schedulesPath` is not available anymore, since putting additional files into a Helm chart before installing it is a rather unusual process. Instead, specify the desired schedules directly inside the `values.yaml` in `velero.schedules`
* `velero.backupStorageLocations` is now `velero.configuration.backupStorageLocation`
* `velero.volumeSnapshotLocations` is now `velero.configuration.volumeSnapshotLocation`
* `velero.defaultVolumeSnapshotLocations` is now `velero.configuration.defaultBackupStorageLocation`

{{< tabs name="Velero Helm Chart Upgrades" >}}
{{% tab name="old Velero Chart" %}}
```yaml
velero:
podAnnotations:
iam.amazonaws.com/role: arn:aws:iam::1234:role/velero

backupStorageLocations:
aws:
provider: aws
objectStorage:
bucket: myclusterbackups
config:
region: eu-west-1

# optionally define some of your volumeSnapshotLocations as the default;
# each element in the list must be a string of the form "provider:location"
defaultVolumeSnapshotLocations:
- aws:aws

# see https://velero.io/docs/v1.10/api-types/volumesnapshotlocation/
volumeSnapshotLocations:
aws:
provider: aws
config:
region: eu-west-1

uploaderType: restic

serverFlags:
- --log-format=json

credentials:
aws:
accessKey: itsme
secretKey: andthisismypassword

schedulesPath: schedules/*
```
{{% /tab %}}
{{% tab name="new Velero Chart" %}}
```yaml
velero:
annotations:
iam.amazonaws.com/role: arn:aws:iam::1234:role/velero

# schedules are no longer loaded from external files, but must be included inline
schedules:
hourly-cluster:
schedule: 0 * * * *
template:
includeClusterResources: true
includedNamespaces:
- '*'
snapshotVolumes: false
ttl: 168h # 7 days

configuration:
uploaderType: restic
logFormat: json

backupStorageLocation:
- name: aws
provider: aws
objectStorage:
bucket: myclusterbackups
config:
region: eu-west-1

volumeSnapshotLocation:
- name: aws
provider: aws
config:
region: eu-west-1

defaultBackupStorageLocation: aws

credentials:
useSecret: true
name: aws-credentials
secretContents:
cloud: |
[default]
aws_access_key_id=itsme
aws_secret_access_key=andthisismypassword
```
{{% /tab %}}
{{< /tabs >}}
#### Kopia replaces restic
The default file backup solution in Velero is now Kopia, replacing the previous implementation using restic. From a Velero user perspective this can be seen as an implementation detail and commands like `velero backup create` will continue to work as before. However, Kopia's data is stored in a new repository inside the backup storage location (for example if you used restic before and now switch to Kopia and use and S3 bucket for storage, you would end up with 3 directories in the bucket: `backups`, `restic` and `kopia`).

When migrating to Kopia, new backups will be made using it, but existing backups made using restic can still be restored. Once no old restic backups are required anymore, the `restic` directory in the backup storage can be deleted.

KKP's wrapper `velero` Helm chart is not configuring the backup tool, so it defaults to Kopia. If you wish to continue using restic, set `velero.uploaderType` to `restic` in your Helm `values.yaml` file. Note that restic support will eventually be removed from Velero, so a switch will be necessary at some point.

{{% notice note %}}
If you decide to switch to Kopia and do not need the restic repository anymore, consider rotating the repository password by updating the `velero-repo-credentials` Secret in the `velero` namespace. This should only be done before a new repository is created by Velero and changing it will also make all previously created repositories unavailable.
{{% /notice %}}

### cert-manager 1.14

The configuration syntax for cert-manager has changed slightly.

* Breaking: If you have `.featureGates` value set in `values.yaml`, the features defined there will no longer be passed to cert-manager webhook, only to cert-manager controller. Use the `webhook.featureGates` field instead to define features to be enabled on webhook.
* Potentially breaking: Webhook validation of CertificateRequest resources is stricter now: all `KeyUsages` and `ExtendedKeyUsages` must be defined directly in the CertificateRequest resource, the encoded CSR can never contain more usages that defined there.

#### oauth2-proxy (IAP) 7.6
### oauth2-proxy (IAP) 7.6

This upgrade includes one breaking change:

* A change to how auth routes are evaluated using the flags `skip-auth-route`/`skip-auth-regex`: the new behaviour uses the regex you specify to evaluate the full path including query parameters. For more details please read the [detailed PR description](https://github.com/oauth2-proxy/oauth2-proxy/issues/2271).
* The environment variable `OAUTH2_PROXY_GOOGLE_GROUP` has been deprecated in favor of `OAUTH2_PROXY_GOOGLE_GROUPS`. Next major release will remove this option.

#### Loki & Promtail 2.9 (Seed MLA)
### Loki & Promtail 2.9 (Seed MLA)

The Loki upgrade from 2.5 to 2.9 might be the most significant bump in this KKP release. Due to the amount of changes, it's necessary to delete the existing `loki` StatefulSet and letting Helm recreate it. Deleting the StatefulSet will not touch the PVCs and the new StatefulSet's pods will reuse the existing PVCs after the upgrade.

Expand All @@ -69,15 +184,15 @@ Before upgrading, review your `values.yaml` for Loki, as a number of syntax chan
* `loki.singleBinary.persistence.enableStatefulSetAutoDeletePVC` is set to `false` to ensure that when the StatefulSet is deleted, the PVCs will not also be deleted. This allows for easier upgrades in the
future, but if you scale down Loki, you would have to manually deleted the leftover PVCs.

#### Alertmanager 0.27 (Seed MLA)
### Alertmanager 0.27 (Seed MLA)

This version removes the `v1` API which was deprecated since 2019. If you have custom integrations with Alertmanager, ensure none of them use the now removed API.

#### blackbox-exporter 0.25 (Seed MLA)
### blackbox-exporter 0.25 (Seed MLA)

This version changes the `proxy_connect_header` configuration structure to match Prometheus (see [PR](https://github.com/prometheus/blackbox_exporter/pull/1008)); update your `values.yaml` accordingly if you configured this option.

#### helm-exporter 1.2.16 (Seed MLA)
### helm-exporter 1.2.16 (Seed MLA)

KKP 2.26 removes the custom Helm chart and instead now reuses the official [upstream chart](https://shanestarcher.com/helm-charts/). Before upgrading you must delete the existing Helm release in your cluster:

Expand All @@ -87,15 +202,15 @@ $ helm --namespace monitoring delete helm-exporter

Afterwards you can install the new release from the chart.

#### kube-state-metrics 2.12 (Seed MLA)
### kube-state-metrics 2.12 (Seed MLA)

As is typical for kube-state-metrics, the upgrade simple, but the devil is in the details. There were many minor changes since v2.8, please review [the changelog](https://github.com/kubernetes/kube-state-metrics/releases) carefully if you built upon metrics provided by kube-state-metrics:

* The deprecated experimental VerticalPodAutoscaler metrics are no longer supported, and have been removed. It's recommend to use CustomResourceState metrics to gather metrics from custom resources like the Vertical Pod Autoscaler.
* Label names were regulated to adhere with OTel-Prometheus standards, so existing label names that do not follow the same may be replaced by the ones that do. Please refer to [the PR](https://github.com/kubernetes/kube-state-metrics/pull/2004) for more details.
* Label and annotation metrics aren't exposed by default anymore to reduce the memory usage of the default configuration of kube-state-metrics. Before this change, they used to only include the name and namespace of the objects which is not relevant to users not opting in these metrics.

#### node-exporter 1.7 (Seed MLA)
### node-exporter 1.7 (Seed MLA)

This new version comes with a few minor backwards-incompatible changes:

Expand All @@ -104,7 +219,7 @@ This new version comes with a few minor backwards-incompatible changes:
* ntp collector was deprecated
* supervisord collector was deprecated

#### Prometheus 2.51 (Seed MLA)
### Prometheus 2.51 (Seed MLA)

Prometheus had many improvements and some changes to the remote-write functionality that might affect you:

Expand All @@ -115,7 +230,7 @@ Prometheus had many improvements and some changes to the remote-write functional
* Scraping:
* Do experimental timestamp alignment even if tolerance is bigger than 1% of scrape interval

#### nginx-ingress-controller 1.10
### nginx-ingress-controller 1.10

nginx v1.10 brings quite a few potentially breaking changes:

Expand All @@ -125,13 +240,13 @@ nginx v1.10 brings quite a few potentially breaking changes:
* dropped support for GeoIP (legacy), only GeoIP2 is supported
* The automatically generated `NetworkPolicy` from nginx 1.9.3 is now disabled by default, refer to https://github.com/kubernetes/ingress-nginx/pull/10238 for more information.

#### Dex 2.40
### Dex 2.39

The validation of username and password in the LDAP connector is much more strict now. Dex uses the [EscapeFilter](https://pkg.go.dev/gopkg.in/ldap.v1#EscapeFilter) function to check for special characters in credentials and prevent injections by denying such requests. Please ensure this is not an issue before upgrading.

Additionally, the custom `oauth` Helm chart in KKP has been deprecated and will be replaced with a new Helm chart, `dex`, which is based on the [official upstream chart](https://github.com/dexidp/helm-charts/tree/master/charts/dex). Administrators are advised to begin migrating to the new chart as soon as possible.

##### Migration Procedure
#### Migration Procedure

Most importantly, with this change the Kubernetes namespace where Dex is installed is also changed. Previously we installed Dex into the `oauth` namespace, but the new chart is meant to be installed into the `dex` namespace. This is the default the KKP installer will choose; if you install KKP manually you could place Dex into any namespace.

Expand Down

0 comments on commit f455959

Please sign in to comment.