refactor: reduce memory usage #596

flavio · 2023-11-23T11:58:20Z

Reduce the memory consumption of Policy Server when multiple instances of the same Wasm module are loaded.

Thanks to this change, a worker will have only once instance of PolicyEvaluator (hence of wasmtime stack), per unique type of module.

Literally speaking, if a user has the apparmor policy deployed 5 times (different names, settings,...) only one instance of PolicyEvaluator will be allocated for it.

Warning: the optimization works at the worker level. Meaning that PolicyEvaluator are NOT sharing these instances between themselves.

This commit helps to address issue kubewarden/kubewarden-controller#528

Note 1: this depends on kubewarden/policy-evaluator#390

Benchmark data

I've collected data about the amount of memory consumed by 1 instance of policy server. The test has been done using the following policies being loaded.

policies.yml

pod-privileged-protect-mode:
  url: ghcr.io/kubewarden/tests/pod-privileged:v0.2.1
  policyMode: protect
  allowedToMutate: false
  settings: {}

pod-privileged-protect-mode2:
  url: ghcr.io/kubewarden/tests/pod-privileged:v0.2.1
  policyMode: protect
  allowedToMutate: false
  settings: {}

pod-privileged-protect-mode3:
  url: ghcr.io/kubewarden/tests/pod-privileged:v0.2.1
  policyMode: protect
  allowedToMutate: false
  settings: {}

pod-privileged-protect-mode4:
  url: ghcr.io/kubewarden/tests/pod-privileged:v0.2.1
  policyMode: protect
  allowedToMutate: false
  settings: {}

pod-privileged-monitor-mode:
  url: ghcr.io/kubewarden/tests/pod-privileged:v0.2.1
  policyMode: monitor
  allowedToMutate: false
  settings: {}

sleep:
  url: ghcr.io/kubewarden/tests/sleeping-policy:v0.1.0
  policyMode: protect
  allowedToMutate: false
  settings:
    sleepMilliseconds: 2000

disallow-service-loadbalancer:
  url: ghcr.io/kubewarden/tests/disallow-service-loadbalancer:v0.1.5
  policyMode: protect
  allowedToMutate: false

flux:
  url: ghcr.io/kubewarden/tests/go-wasi-template:v0.1.0
  policyMode: protect
  allowedToMutate: true
  settings:
    requiredAnnotations:
      "fluxcd.io/cat": "felix"

flux2:
  url: ghcr.io/kubewarden/tests/go-wasi-template:v0.1.0
  policyMode: protect
  allowedToMutate: true
  settings:
    requiredAnnotations:
      "fluxcd.io/cat": "felix"

verify1:
  url: ghcr.io/kubewarden/test/verify-image-signatures:v0.2.8
  policyMode: protect
  allowedToMutate: true
  settings:
    signatures:
      - image: "*"
        githubActions:
          owner: "kubewarden"
          repo: "app-example"

verify2:
  url: ghcr.io/kubewarden/test/verify-image-signatures:v0.2.8
  policyMode: protect
  allowedToMutate: true
  settings:
    signatures:
      - image: "*"
        githubActions:
          owner: "kubewarden"
          repo: "app-example"

verify3:
  url: ghcr.io/kubewarden/test/verify-image-signatures:v0.2.8
  policyMode: protect
  allowedToMutate: true
  settings:
    signatures:
      - image: "*"
        githubActions:
          owner: "kubewarden"
          repo: "app-example"

verify4:
  url: ghcr.io/kubewarden/test/verify-image-signatures:v0.2.8
  policyMode: protect
  allowedToMutate: true
  settings:
    signatures:
      - image: "*"
        githubActions:
          owner: "kubewarden"
          repo: "app-example"

This configuration file will load 13 policies:

4 instances of the "verify signatures" policy
5 instances of the "pod privileged" policy
2 instances of the go-wasi template policy (remember: this is a 20 MB policy!)
1 rego policy
1 regular rust policy

Measurement process

The policy server has been started with an increasing number of workers. I started from 1 worker and went up to 8.
For each worker value, 10 samplings have been taken.

The table below shows the data collected.

workers	Current (MB)	Optimized (MB)	Improvement %	Current StdDev	Optimized StdDev
1	621.00	530.38	-14.59	30.86	23.26
2	807.35	586.15	-27.40	39.24	18.24
3	983.33	649.17	-33.98	23.23	30.79
4	1172.28	752.59	-35.80	30.85	27.50
5	1358.67	831.02	-38.84	22.24	27.58
6	1535.16	914.64	-40.42	30.53	21.74
7	1694.57	980.81	-42.12	26.48	17.79
8	1897.01	1054.93	-44.39	27.44	22.69

This is a chart that shows the overall memory reduction brought by this change:

flavio · 2023-11-23T18:44:56Z

I've rebased against the latest changes merged into the main branch

src/workers/worker.rs

fabriziosestito

LGTM

src/workers/worker.rs

viccuad

LGTM! Top!

flavio · 2023-11-24T10:36:01Z

I've merged the changes into the main branch of policy-evaluator, I've updated this PR to consume policy-evaluator from a specific commit of the original repository.

I would prefer to not tag a new version of policy-evaluator right now. @fabriziosestito is going to open a new PR against policy-evaluator that will remove the majority of the locks. I would prefer to have his changes merged into the policy-evaluator main branch, and then tag a new version of the crate

flavio · 2023-11-24T14:24:11Z

Moving to blocked, waiting for the next iteration of policy-evaluator that is lock-free before merging this PR.

This allows a better organization of the code Signed-off-by: Flavio Castelli <[email protected]>

This allows us to uniquely identify a precompiled wasm module Signed-off-by: Flavio Castelli <[email protected]>

This is going to be useful also to perform the k6 tests. Signed-off-by: Flavio Castelli <[email protected]>

Reduce the memory consumption of Policy Server when multiple instances of the same Wasm module are loaded. Thanks to this change, a worker will have only once instance of `PolicyEvaluator` (hence of wasmtime stack), per unique type of module. Literally speaking, if a user has the `apparmor` policy deployed 5 times (different names, settings,...) only one instance of `PolicyEvaluator` will be allocated for it. Note: the optimization works at the worker level. Meaning that `PolicyEvaluator` are NOT sharing these instances between themselves. This commit helps to address issue kubewarden/kubewarden-controller#528 Signed-off-by: Flavio Castelli <[email protected]>

flavio · 2023-12-12T16:37:20Z

@fabriziosestito , @jvanz , @viccuad : I've rebased my changes against the main branch. I've to look into 3 integration tests that are currently failing, but you can start to review the changes

fabriziosestito

LGTM, just a couple of comments added

fabriziosestito · 2023-12-13T07:17:12Z

e2e-tests/test_data/policies.yaml

+      overwrite: false
+    supplemental_groups:
+      rule: RunAsAny
+      overwrite: false


why is this needed? we are going to remove venom tests soonish

I'm using them inside of k6 tests. This file should definitely be moved to the load-testing repo once we remove venom

src/workers/evaluation_environment.rs

src/workers/pool.rs

This commit significantly changes the internal architecture of Policy Server workers. This code takes advantage of the `PolicyEvaluatorPre` structure defined by the latest `policy-evalautor` crate. `PolicyEvaluatorPre` has different implementations, one per type of policy we support (waPC, Rego, WASI). Under the hood it holds a `wasmtime::InstancePre` instance that is used to quickly spawn a WebAssembly environment. Since `PolicyEvaluatorPre` instances are managed by the `EvaluationEnvironment` structure. `EvaluationEnvironment` takes care of deduplicating the WebAssembly modules defined inside of the `policies.yml` file, ensuring only one `PolicyEvaluatorPre` is created per policy. The `EvaluationEnvironment` struct provides `validate` and `validate_settings` methods. These methods create a fresh `PolicyEvaluator` instance by rehydrating its `PolicyEvaluatorPre` instance. Once the WebAssembly evaluation is done, the `PolicyEvaluator` instance is discarded. This is a big change compared to the previous approach, where each WebAssembly instance was a long lived object. This new architecture assures that each evaluation is done inside of a freshly created WebAssembly environment, which guarantees: - Policies leaking memory have a smaller impact on the memory consumption of the Policy Server process - Policy evaluation always starts with a clean slate, this is useful to prevent bugs caused by policies that are not written to be stateless In terms of memory optimizations, the `EvaluationEnvironment` is now an immutable object. That allows us to have one single instance of `EvaluationEnvironment` shared across all the worker threads, all without using mutex or locks. This significantly reduces the amount of memory required by the Policy Server instance, without impacting on the system performances. As an added bonus, a lot of code has been simplified during this transition. More code can be removed by future PRs. Signed-off-by: Flavio Castelli <[email protected]>

Introduce custom error types for `EvaluationEnvironment` and make use of them to properly identify not found policies. This is required to properly return a 404 response inside of the UI. This regression was detected by the integration tests. Signed-off-by: Flavio Castelli <[email protected]>

flavio · 2023-12-13T14:12:30Z

@fabriziosestito , @jvanz , @viccuad : you can take another look at the PR. I've applied the changes you requested with the latest commits

viccuad

🚀

flavio · 2023-12-14T18:07:51Z

Before merging this PR I want to summarize what happened.

Optimization number 1

The PR started with one optimization technique: ensure each worker of Policy Server (where a worker is a thread dedicated to Wasm evaluation) has just one instance of PolicyEvaluator (a WebAssembly sandbox) per WebAssembly module. That means that if a user has the same policy defined 5 times, each time with a different configuration, only one PolicyEvaluator will be created inside of a worker.
However, that still means that the same PolicyEvaluator is duplicated across each worker. Which explains why the memory consumption is directly proportional with the number of workers.

Optimization number 2

This is an iteration over the previous one. We changed drastically the way we manage WebAssembly runtimes. We rely on wasmtime::InstancePre to keep a deduplicated list of all the policies defined by the user.
This collection of instances is shared across all the workers, which leads to huge memory savings.

In addition to that, we changed the evaluation mode to be "on demand". That means that, once an AdmissionReview is received, a fresh PolicyEvaluator is created. The creation time is significantly reduced by using wasmtime::InstancePre.
Once the evaluation is done, the PolicyEvaluator instance is discarded. This leads to the deallocation of the memory used by the WebAssembly sandbox wrapped by PolicyEvaluator.

This architecture has two main advantages:

When dealing with policies that leak memory: reduce the impact they have on the long running Policy Server process
Ensure each evaluation starts with a clean slate. This prevents bugs caused by policies that are mistakenly leaving unclean state in between evaluations

Benchmarks

I've updated the table and the graph shown when the PR was open to include also the numbers of the second, and final, optimization.

The benchmarks have been done on the same machine, with the same set of policies and the same measurement process.

workers	Current	Optimized #1	Optimized #2	Improvement #1 %	Improvement #2 %	Current StdDev	Optimized #1 StdDev	Optimized #2 StdDev
1	621.00	530.38	526.073	-14.59	-15.29	30.86	23.26	20.15
2	807.35	586.15	535.173	-27.40	-33.71	39.24	18.24	13.49
3	983.33	649.17	532.808	-33.98	-45.82	23.23	30.79	17.44
4	1172.28	752.59	529.324	-35.80	-54.85	30.85	27.50	21.26
5	1358.67	831.02	536.239	-38.84	-60.53	22.24	27.58	25.76
6	1535.16	914.64	533.919	-40.42	-65.22	30.53	21.74	18.04
7	1694.57	980.81	524.528	-42.12	-69.05	26.48	17.79	19.15
8	1897.01	1054.93	541.903	-44.39	-71.43	27.44	22.69	9.08

This is the updated chart:

As you can see, the memory consumption has been significantly reduced. Moreover, the memory usage stays constant regardless of the number of workers instantiated.

Adapt the code to the new API exposed by latest version of policy-evaluator. On top of that, adding some extra unit tests. Signed-off-by: Flavio Castelli <[email protected]>

flavio requested a review from a team as a code owner November 23, 2023 11:58

flavio mentioned this pull request Nov 23, 2023

Policy Server: reduce memory usage kubewarden/kubewarden-controller#528

Closed

flavio force-pushed the reduce-memory-usage branch from b4da28c to 3d41704 Compare November 23, 2023 18:44

fabriziosestito reviewed Nov 24, 2023

View reviewed changes

src/workers/worker.rs Show resolved Hide resolved

fabriziosestito approved these changes Nov 24, 2023

View reviewed changes

viccuad reviewed Nov 24, 2023

View reviewed changes

src/workers/worker.rs Show resolved Hide resolved

viccuad approved these changes Nov 24, 2023

View reviewed changes

viccuad changed the title ~~reduce memory usage~~ refactor: reduce memory usage Nov 24, 2023

viccuad added the kind/tech-debt label Nov 24, 2023

flavio force-pushed the reduce-memory-usage branch from 3d41704 to 9ed2f92 Compare November 24, 2023 10:33

jvanz approved these changes Nov 24, 2023

View reviewed changes

flavio self-assigned this Nov 28, 2023

flavio added the kind/feature label Dec 12, 2023

flavio added 4 commits December 12, 2023 17:21

refactor: split worker-related files into a sub-module

f153b85

This allows a better organization of the code Signed-off-by: Flavio Castelli <[email protected]>

precompiled policy: compute digest

e691a8d

This allows us to uniquely identify a precompiled wasm module Signed-off-by: Flavio Castelli <[email protected]>

test: add more policies to the e2e environment

ae05f2e

This is going to be useful also to perform the k6 tests. Signed-off-by: Flavio Castelli <[email protected]>

flavio force-pushed the reduce-memory-usage branch from d912d22 to da81dc7 Compare December 12, 2023 16:35

fabriziosestito requested review from fabriziosestito, jvanz and viccuad December 12, 2023 17:10

fabriziosestito approved these changes Dec 13, 2023

View reviewed changes

flavio force-pushed the reduce-memory-usage branch 2 times, most recently from 94873a1 to 5bf6b3e Compare December 13, 2023 08:27

jvanz reviewed Dec 13, 2023

View reviewed changes

src/workers/pool.rs Outdated Show resolved Hide resolved

flavio force-pushed the reduce-memory-usage branch from f943c78 to 09199c7 Compare December 13, 2023 14:11

flavio force-pushed the reduce-memory-usage branch from 09199c7 to 0850f00 Compare December 13, 2023 14:14

jvanz approved these changes Dec 14, 2023

View reviewed changes

viccuad approved these changes Dec 14, 2023

View reviewed changes

refactor: adapt to new API of policy-evaluator

ad951ab

Adapt the code to the new API exposed by latest version of policy-evaluator. On top of that, adding some extra unit tests. Signed-off-by: Flavio Castelli <[email protected]>

flavio force-pushed the reduce-memory-usage branch from 0850f00 to ad951ab Compare December 14, 2023 18:12

flavio merged commit fee9dd7 into kubewarden:main Dec 14, 2023
8 checks passed

flavio deleted the reduce-memory-usage branch December 14, 2023 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: reduce memory usage #596

refactor: reduce memory usage #596

flavio commented Nov 23, 2023 •

edited

Loading

flavio commented Nov 23, 2023

fabriziosestito left a comment

viccuad left a comment

flavio commented Nov 24, 2023

flavio commented Nov 24, 2023

flavio commented Dec 12, 2023

fabriziosestito left a comment

fabriziosestito Dec 13, 2023

flavio Dec 13, 2023

flavio commented Dec 13, 2023

viccuad left a comment

flavio commented Dec 14, 2023

refactor: reduce memory usage #596

refactor: reduce memory usage #596

Conversation

flavio commented Nov 23, 2023 • edited Loading

Benchmark data

Measurement process

flavio commented Nov 23, 2023

fabriziosestito left a comment

Choose a reason for hiding this comment

viccuad left a comment

Choose a reason for hiding this comment

flavio commented Nov 24, 2023

flavio commented Nov 24, 2023

flavio commented Dec 12, 2023

fabriziosestito left a comment

Choose a reason for hiding this comment

fabriziosestito Dec 13, 2023

Choose a reason for hiding this comment

flavio Dec 13, 2023

Choose a reason for hiding this comment

flavio commented Dec 13, 2023

viccuad left a comment

Choose a reason for hiding this comment

flavio commented Dec 14, 2023

Optimization number 1

Optimization number 2

Benchmarks

flavio commented Nov 23, 2023 •

edited

Loading