How to run a stable benchmark

Cite

FOSDEM VZW

Stinner, Victor

Formal Metadata

Title

How to run a stable benchmark

Title of Series

FOSDEM 2017

Number of Parts

611

Author

Stinner, Victor

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/42097 (DOI)

Publisher

FOSDEM VZW

Release Date

2018

Language

English

Production Year

2017

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Working on optimizations is a task more complex than expected on the firstlook. Any optimization must be measured to make sure that, in practice, itspeeds up the application task. Problem: it is very hard to obtain stablebenchmark results. The stability of a benchmark (performance measurement) is essential to be ableto compare two versions of the code and compute the difference (faster orslower?). An unstable benchmark is useless, and is a risk of giving a falseresult when comparing performance which could lead to bad decisions. I'm gonna show you the Python project "perf" which helps to launch benchmarks,but also to analyze them: compute the mean and the standard deviation onmultiple runs, render an histogram to visualize the probability curve, comparebetween multiple results, run again a benchmark to collect more samples, etc. The use case is to measure small isolated optimizations on CPython and makesure that they don't introduce performance regression in term of performance.