Skip to content
Sven Nilsen edited this page Dec 21, 2015 · 11 revisions

This page contains resources and things to be aware of when profiling.

Do not make conclusions from profiling in debug mode

There is approximately a 3-6 speedup difference between debug and release mode.

Performance optimizations in release mode does not improve the performance uniformly across the program, so the bottleneck in debug mode might not be as in release mode.

Confirmation bias

A typical programmer mistake is basing decisions on hypothesis formed prior to testing.

In general, there are more ways a hypothesis can be wrong than there are ways it can be right.

When beliefs are formed prior to testing, it is easy to design tests to confirm the hypothesis, while lacking tests to check for cases that indicate the hypothesis is false.

Example:

Alice believes that drawing triangles clockwise is faster than drawing them counter clockwise. She thinks of a specific use case when clockwise might be faster, and then writes a test to check it. However, she forgot to think of other use cases where counter clockwise might be faster.

Optimize bottlenecks

Profiling is about finding where optimization is most beneficial.

The most essential thing about bottlenecks: Some improvements are order of magnitudes more important than others.

Only because improving X is possible, does not prove that it is the most important thing to do.

Example:

Alice knows that drawing triangles clockwise than counter clockwise might improve performance 50%. However, the 50% is only called once, so the actual improvement is 0.0005% of total time. If Alice used profiling to find the bottleneck first, she could improved performance with 0.05%. Optimizing the bottleneck is 100x more important in this case.

Performance is often not the only goal

Whenever there is a finite amount of multiple resources, a single goal might lead to waste of resources.

Because performance is easy to measure, it is also harder to communicate the importance of other goals.

Examples:

  • Ergonomics
  • Conceptually simple design
  • Finishing within 2 years? 5 years?
  • Cost of brain cycles when maintaining project
  • Portability

Different setup, different performance characteristics

This is a major reason why a single profiling test should NOT be used blindly to make major decisions:

  • Some hardware architectures are faster at certain operations than others
  • Background tasks creating noise when profiling
  • Different operating systems have in-deterministic differences in multi-thread behavior
  • Bugs in graphics drivers
  • Different versions of the Rust compiler
  • In-deterministic application logic

Example:

Alice has a desktop background that changes with an animated transition every 5 minutes. This causes the program to slow down in one case, leading Alice to draw the wrong conclusion about the results.

Benchmark mode

All performance measurements have errors, but this varies depending on the method used.

One of the few reliable ways to test an entire game engine is to run the game loop in "benchmark mode".

The benchmark mode runs the game loop without thread sleeps or user input. If the application logic is deterministic, then two runs of the same program should take approximately same amount of time.

Example:

let mut frames = (0..2000).into_iter();
for e in events.bench_mode(true) {
    if let Some(_) = e.render_args() {
        if frames.next().is_none() { break; }
    }
}

Caveats:

  • Only measures the performance for a specific scene
  • Can only be used when in-deterministic behavior does not influence results

Tools for profiling

OSX: Instruments

Profiling in Rust

hprof is a way to profile in Rust code.

  • Notice that measuring time adds overhead, so this method can be unreliable on fine-tuned profiling data

Analysis

Making decisions based on profiling can be hard, but here are some thumb rules:

  • Changing a single setting and compare cases that are otherwise equal can lead to useful insights
  • There is always an unknown number of factors that affects the results, so don't draw conclusions like "X will always be faster than Y for all cases" when the cases are unknown
  • If behavior is in-deterministic, then "X is faster than Y" can be interpreted as "X is usually faster than Y"
  • Some algorithms can be tested automatically for O(N)

Testing for O(N)

In some cases, you want to know how performance scales with the number of inputs to an algorithm.

This can be done using this spreadsheet.

The method works as following:

  1. Find the coefficient in equation f(x) = a + b * x where x is an action performed repeatedly for N items
  2. If the algorithm is O(N), predict what b should be for another test case and compare

The graph in the spread sheetshows whether the algorithm is above linear or below.

If the prediction is 100%, then the b factor can be used to predict performance for larger N.

Run 3 times, delete slowest one

This technique when sharing data is useful to reduce background noise from other tasks running on the same computer.