Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a script to find regression test threshold for harnesses #4724

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kaukabrizvi
Copy link
Contributor

@kaukabrizvi kaukabrizvi commented Aug 21, 2024

Description of changes:

This change adds a find_threshold script which finds an appropriate threshold for test harnesses with non-determinism between runs taken into account. By running the test 100 times, recording the instruction counts for the test, and then finding the range of results as a percentage, this test outputs the recommended threshold to ensure confidence that when the threshold is exceeded, the result can be attributed to a performance regression. This ensures that our tests are not flaky while still providing accuracy in regression detection.

This change adds a contribution section to the README which describes the process for running the script to contribute to new test harnesses in the regression crate.

This change also includes setting the thresholds according to the output of this script for the existing tests. The results are included here:

Test Min Max Range % threshold
Handshake Range 79302470 79305518 3048 0.00384351
Set Config Range 446288077 447345779 1057702 0.23699983
Session Resumption Range 84545387 84549002 3615 0.00427581

Call-outs:

  • The script uses the same find_instruction_count function implemented in the regression tests. This is because it scrapes the test artifacts upon completion to store the instruction count results. The regression test is run 100 times and on each iteration the instruction counts are stored in a .csv file which gets stored in tests/regression
  • To find the test output file, the user must identify the commit id and test_name when invoking the script. This is necessary since the file storage schema currently stores tests in target/regression_artifacts/#commit_id/#test_name.annotated which vary across tests and commits.
  • The changes also include changes to the diff_percentage variable in DiffProfile::assert_performance by multiplying the previous value by a 100. The previous value did not accurately reflect the change as a percentage since it was dividing the difference by the total count which only gives a fraction of the actual percent value to compare against and to output in case of a regression.

Testing:

This change has not been formally tested. I ran the script to find the threshold values for existing tests and have included those changes in this PR

Is this a refactor change? If so, how have you proved that the intended behavior hasn't changed?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@kaukabrizvi kaukabrizvi marked this pull request as ready for review August 22, 2024 16:35
Copy link

This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant