From 55b17384cdd4042ffca8dcb8b5c7dec8f6729df9 Mon Sep 17 00:00:00 2001 From: bjee19 <139261241+bjee19@users.noreply.github.com> Date: Tue, 24 Oct 2023 11:31:37 -0700 Subject: [PATCH] Add event batch processing results and rerun reconfig test (#1186) (#1188) Cherry Picking https://github.com/nginxinc/nginx-gateway-fabric/pull/1186 onto 1.0 release --- tests/reconfig/results/1.0.0/1.0.0.md | 78 +++++++++++++++++++++++++++ tests/reconfig/results/v1.0.0.md | 61 --------------------- tests/reconfig/setup.md | 38 ++++++++----- 3 files changed, 102 insertions(+), 75 deletions(-) create mode 100644 tests/reconfig/results/1.0.0/1.0.0.md delete mode 100644 tests/reconfig/results/v1.0.0.md diff --git a/tests/reconfig/results/1.0.0/1.0.0.md b/tests/reconfig/results/1.0.0/1.0.0.md new file mode 100644 index 000000000..30524405a --- /dev/null +++ b/tests/reconfig/results/1.0.0/1.0.0.md @@ -0,0 +1,78 @@ +# Reconfiguration testing Results + + +- [Reconfiguration testing Results](#reconfiguration-testing-results) + - [Test environment](#test-environment) + - [Results Tables](#results-tables) + - [NGINX Reloads and Time to Ready](#nginx-reloads-and-time-to-ready) + - [Event Batch Processing](#event-batch-processing) + - [NumResources -> Total Resources](#numresources---total-resources) + - [Observations](#observations) + + +## Test environment + +GKE cluster: + +- Node count: 3 +- Instance Type: e2-medium +- k8s version: 1.27.3-gke.100 +- Zone: us-central1-c +- Total vCPUs: 6 +- Total RAM: 12GB +- Max pods per node: 110 + +NGF deployment: + +- NGF version: edge - git commit 29b45e38bacd7c4f22834938105e3cda4f29f6d1 +- NGINX Version: 1.25.2 + +## Results Tables + +### NGINX Reloads and Time to Ready + +| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms | +|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------| +| 1 | 30 | 1 | 1 | 2 | 191 | 100% | 100% | +| 1 | 150 | 2 | 2 | 2 | 440 | 50% | 100% | +| 2 | 30 | 50 | <1 | 93 | 162 | 100% | 100% | +| 2 | 150 | 208 | <1 | 396 | 281 | 96.46% | 100% | +| 3 | 30 | 1 | 1 | 93 | 129 | 100% | 100% | +| 3 | 150 | 1 | 1 | 453 | 130 | 100% | 100% | + + +### Event Batch Processing + +| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | +|-------------|--------------|-------------------|--------------------------------------|----------|-----------| +| 1 | 30 | 69 | 6.232 | 100% | 100% | +| 1 | 150 | 309 | 3.638 | 99.68% | 100% | +| 2 | 30 | 465 | 38.759 | 100% | 100% | +| 2 | 150 | 1941 | 68.539 | 98.51% | 100% | +| 3 | 30 | 374 | 36.834 | 99.73% | 99.73% | +| 3 | 150 | 1812 | 40.411 | 99.94% | 99.94% | + + +## NumResources -> Total Resources +| NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Total Resources | +| ------------ | -------- | ------- | --------------- | ---------- | ---------------- | -------------------- | ---------- | --------------- | +| x | 1 | 1 | 1 | x+1 | 2x | 2x | 3x | | +| 30 | 1 | 1 | 1 | 31 | 60 | 60 | 90 | 244 | +| 150 | 1 | 1 | 1 | 151 | 300 | 300 | 450 | 1204 | + +## Observations + +1. We are reloading after reconciling a ReferenceGrant even when there is no Gateway. This is because we treat every + upsert/delete of a ReferenceGrant as a change. This means we will regenerate NGINX config every time a ReferenceGrant + is created, updated (generation must change), or deleted, even if it does not apply to the accepted Gateway. + + Issue filed: https://github.com/nginxinc/nginx-gateway-fabric/issues/1124 + +2. We are reloading after reconciling a HTTPRoute even when there is no accepted Gateway and no config being generated. + + Issue filed: https://github.com/nginxinc/nginx-gateway-fabric/issues/1123 + +3. Majority of NGINX reloads were in the <= 500ms bucket, with all of them being in the <= 1000ms bucket. An increase + in the reload time based on number of configured resources resulting in NGINX configuration changes was observed. + +4. No errors (NGF or NGINX) were observed in any test run. diff --git a/tests/reconfig/results/v1.0.0.md b/tests/reconfig/results/v1.0.0.md deleted file mode 100644 index 803ede268..000000000 --- a/tests/reconfig/results/v1.0.0.md +++ /dev/null @@ -1,61 +0,0 @@ -# Reconfiguration testing Results - - -- [Reconfiguration testing Results](#reconfiguration-testing-results) - - [Test environment](#test-environment) - - [Results Table](#results-table) - - [NumResources -\> Total Resources](#numresources---total-resources) - - [Observations](#observations) - - -## Test environment - -GKE cluster: - -- Node count: 3 -- Instance Type: e2-medium -- k8s version: 1.27.4-gke.900 -- Zone: europe-west2-b -- Total vCPUs: 6 -- Total RAM: 12GB -- Max pods per node: 110 - -NGF deployment: - -- NGF version: edge - git commit 72b6c6ef8915c697626eeab88fdb6a3ce15b8da0 -- NGINX Version: 1.25.2 - -## Results Table - -| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | -| ----------- | ------------ | -------------------- | ------------------------ | ------------- | -------------------------- | -| 1 | 30 | 5 | 5 | 2 | 166 | -| 1 | 150 | 7 | 7 | 2 | 353 | -| 2 | 30 | 21 | <1 | 30 | 142 | -| 2 | 150 | 123 | <1 | 46 | 190 | -| 3 | 30 | <1 | <1 | 93 | 137 | -| 3 | 150 | 1 | 1 | 453 | 127 | - -## NumResources -> Total Resources -| NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Total Resources | -| ------------ | -------- | ------- | --------------- | ---------- | ---------------- | -------------------- | ---------- | --------------- | -| x | 1 | 1 | 1 | x+1 | 2x | 2x | 3x | | -| 30 | 1 | 1 | 1 | 31 | 60 | 60 | 90 | 244 | -| 150 | 1 | 1 | 1 | 151 | 300 | 300 | 450 | 1204 | - -## Observations - -1. We are reloading after reconciling a ReferenceGrant even when there is no Gateway. This is because we treat every - upsert/delete of a ReferenceGrant as a change. This means we will regenerate NGINX config every time a ReferenceGrant - is created, updated (generation must change), or deleted, even if it does not apply to the accepted Gateway. - - Issue filed: https://github.com/nginxinc/nginx-gateway-fabric/issues/1124 - -2. We are reloading after reconciling a HTTPRoute even when there is no accepted Gateway and no config being generated. - - Issue filed: https://github.com/nginxinc/nginx-gateway-fabric/issues/1123 - -3. All reloads were in the <500ms bucket. A slight increase in the reload time based on number of configured resources - resulting in NGINX configuration changes was observed. - -4. No errors (NGF or NGINX) were observed in any test run. diff --git a/tests/reconfig/setup.md b/tests/reconfig/setup.md index 4ad3e7048..52ded6150 100644 --- a/tests/reconfig/setup.md +++ b/tests/reconfig/setup.md @@ -13,8 +13,8 @@ ## Goals -- Measure how long it takes NGF to reconfigure NGINX when a number of Gateway API and referenced core Kubernetes - resources are created at once. +- Measure how long it takes NGF to reconfigure NGINX and update statuses when a number of Gateway API and + referenced core Kubernetes resources are created at once. - Two runs of each test should be ran with differing numbers of resources. Each run will deploy: - a single Gateway, Secret, and ReferenceGrant resources - `x+1` number of namespaces @@ -38,7 +38,8 @@ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v0.8.1/standard-install.yaml ``` -3. Deploy NGF from edge using Helm install (NOTE: For Test 1, deploy AFTER resources): +3. Deploy NGF from edge using Helm install and wait for LoadBalancer Service to be ready + (NOTE: For Test 1, deploy AFTER resources): ```console helm install my-release oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric --version 0.0.0-edge \ @@ -65,10 +66,20 @@ kubectl port-forward $GW_POD -n nginx-gateway 9113:9113 & ``` -6. Measure Time To Ready as described in each test, get the reload count, and get the average NGINX reload duration. - The average reload duration can be computed by taking the `nginx_gateway_fabric_nginx_reloads_milliseconds_sum` - metric value and dividing it by the `nginx_gateway_fabric_nginx_reloads_milliseconds_count` metric value. -7. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies or outliers. +6. Measure NGINX Reloads and Time to Ready Results + 1. TimeToReadyTotal as described in each test - NGF logs. + 2. TimeToReadyAvgSingle which is the average time between updating any resource and the + NGINX configuration being reloaded - NGF logs. + 3. NGINX Reload count - metrics. + 4. Average NGINX reload duration - metrics. + 1. The average reload duration can be computed by taking the `nginx_gateway_fabric_nginx_reloads_milliseconds_sum` + metric value and dividing it by the `nginx_gateway_fabric_nginx_reloads_milliseconds_count` metric value. +7. Measure Event Batch Processing Results + 1. Event Batch Total - metrics. + 2. Average Event Batch Processing duration - metrics. + 1. The average event batch processing duraiton can be computed by taking the `nginx_gateway_fabric_event_batch_processing_milliseconds_sum` + metric value and dividing it by the `nginx_gateway_fabric_event_batch_processing_milliseconds_count` metric value. +8. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies or outliers. ## Tests @@ -79,8 +90,8 @@ e.g. `cd scripts && bash create-resources-gw-last.sh 30`. The script will deploy backend apps and services, wait 60 seconds for them to be ready, and deploy 1 Gateway, 1 RefGrant, 1 Secret, and HTTPRoutes. 2. Deploy NGF - 3. Check logs for time it takes from start-up -> config written and NGINX reloaded. Get reload count and average reload - duration from metrics and logs. + 3. Measure TimeToReadyTotal as the time it takes from start-up -> config written and + NGINX reloaded. Measure the other results as described in steps 6-7 of the [Setup](#setup) section. ### Test 2: Start NGF, deploy Gateway, create many resources attached to GW @@ -89,9 +100,8 @@ 2. Run the provided script with the required number of resources, e.g. `cd scripts && bash create-resources-routes-last.sh 30`. The script will deploy backend apps and services, wait 60 seconds for them to be ready, and deploy 1 Gateway, 1 Secret, 1 RefGrant, and HTTPRoutes at the same time. - 3. Check logs for time it takes from NGF receiving first resource update -> final config written, and NGINX's final - reload. Check logs for average individual HTTPRoute TTR also. Get reload count and average reload duration from - metrics and logs. + 3. Measure TimeToReadyTotal as the time it takes from NGF receiving the first HTTPRoute resource update -> final + config written and NGINX reloaded. Measure the other results as described in steps 6-7 of the [Setup](#setup) section. ### Test 3: Start NGF, create many resources attached to a Gateway, deploy the Gateway @@ -101,5 +111,5 @@ e.g. `cd scripts && bash create-resources-gw-last.sh 30`. The script will deploy the namespaces, backend apps and services, 1 Secret, 1 ReferenceGrant, and the HTTPRoutes; wait 60 seconds for the backend apps to be ready, and then deploy 1 Gateway for all HTTPRoutes. - 3. Check logs for time it takes from NGF receiving gateway resource -> config written and NGINX reloaded. Get reload - count and average reload duration from metrics and logs. + 3. Measure TimeToReadyTotal as the time it takes from NGF receiving gateway resource -> config written and NGINX reloaded. + Measure the other results as described in steps 6-7 of the [Setup](#setup) section.