Add a script to find regression test threshold for harnesses #4724

kaukabrizvi · 2024-08-21T23:38:55Z

Description of changes:

This change adds a find_threshold script which finds an appropriate threshold for test harnesses with non-determinism between runs taken into account. By running the test 100 times, recording the instruction counts for the test, and then finding the range of results as a percentage, this test outputs the recommended threshold to ensure confidence that when the threshold is exceeded, the result can be attributed to a performance regression. This ensures that our tests are not flaky while still providing accuracy in regression detection.

This change adds a contribution section to the README which describes the process for running the script to contribute to new test harnesses in the regression crate.

This change also includes setting the thresholds according to the output of this script for the existing tests. The results are included here:

Test	Min	Max	Range	% threshold
Handshake Range	79302470	79305518	3048	0.00384351
Set Config Range	446288077	447345779	1057702	0.23699983
Session Resumption Range	84545387	84549002	3615	0.00427581

Call-outs:

The script uses the same find_instruction_count function implemented in the regression tests. This is because it scrapes the test artifacts upon completion to store the instruction count results. The regression test is run 100 times and on each iteration the instruction counts are stored in a .csv file which gets stored in tests/regression
To find the test output file, the user must identify the commit id and test_name when invoking the script. This is necessary since the file storage schema currently stores tests in target/regression_artifacts/#commit_id/#test_name.annotated which vary across tests and commits.
The changes also include changes to the diff_percentage variable in DiffProfile::assert_performance by multiplying the previous value by a 100. The previous value did not accurately reflect the change as a percentage since it was dividing the difference by the total count which only gives a fraction of the actual percent value to compare against and to output in case of a regression.

Testing:

This change has not been formally tested. I ran the script to find the threshold values for existing tests and have included those changes in this PR

Is this a refactor change? If so, how have you proved that the intended behavior hasn't changed?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

github-actions · 2024-10-22T02:03:01Z

This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

kaukabrizvi and others added 4 commits August 21, 2024 21:16

Change diff and max_diff to a percentage

9f6e751

Find_threshold script for contributing harnesses

962455f

Update README.md

241bebf

Update README.md

fb3d999

kaukabrizvi marked this pull request as ready for review August 22, 2024 16:35

jmayclin requested review from jmayclin and maddeleine August 22, 2024 17:59

Merge branch 'main' into set-threshold

e0895f8

github-actions bot added the status/stale label Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a script to find regression test threshold for harnesses #4724

Add a script to find regression test threshold for harnesses #4724

kaukabrizvi commented Aug 21, 2024 •

edited

Loading

github-actions bot commented Oct 22, 2024

Add a script to find regression test threshold for harnesses #4724

Are you sure you want to change the base?

Add a script to find regression test threshold for harnesses #4724

Conversation

kaukabrizvi commented Aug 21, 2024 • edited Loading

Description of changes:

Call-outs:

Testing:

github-actions bot commented Oct 22, 2024

kaukabrizvi commented Aug 21, 2024 •

edited

Loading