Tutorial: How to benchmark and profile your code?

Wikunia · 2021-03-23T15:20:25.915Z

Interested in benchmarking and profiling your code?
My new blog post walks you through it from highlevel benchmarking to getting deeper with profiling tools.
It’s quite high level so I avoided explaining the different lower level macros there but will do this in another post if interested:

opensourc.es – 21 Mar 21

Benchmarking and Profiling Julia Code

Find out how to benchmark and profile your Julia code! Find the spots that aren't as fast as expected to run with the speed.

Let me know your thoughts and enjoy reading

sijo · 2021-03-23T16:07:29.147Z

Very nice! A few remarks:

You might want to rename this topic: it sounds like you’re asking for help profiling your code… maybe “A new tutorial on benchmarking and profiling” or similar?
Regarding the first flamegraph screenshots and this part:

My screenshot does not show the full width but when you run it you can see that the filter! function takes more than 98% of the time. Which means it is the function we want to optimize.

It’s a bit confusing to have a screenshot that doesn’t illustrate the point (on the screenshot it looks like filter! spends almost all its time in != and <=). Maybe use a screenshot showing the full width of filter! and if the text is unreadable then, you could add the current screenshot as “magnifier”?
The last runtime plot which “looks quite funny” as you say, can make the reader skeptical that the third solution is really doing what it should do… Maybe a good opportunity to show that a logarithmic scale can be useful?

EDIT: just to clarify, for the first remark I meant the title here on Discourse.

Wikunia · 2021-03-23T16:16:40.925Z

Thanks for your thoughts. Will add those!

Skoffer · 2021-03-23T16:34:08.668Z

In this code

    # convert from nano seconds to seconds
    push!(ys, mean(t).time / 10^9)

Why are you using mean? It’s inconsistent with @btime behaviour which uses min. And it behave worse compared to median which is my second choice in such estimations.

Wikunia · 2021-03-23T16:45:50.862Z

I find min a bit strange but it depends on what you want to measure I guess. Is mean wrong?
I chose mean to have the average running time of the function. One could add error bars around it in the plot.

Skoffer · 2021-03-23T18:27:01.810Z

Well, the consensus is that min is the most adequate metric to measure actual code performance because the time of the code execution is always “time of the code itself + some random nonnegative noise from the operating system”. Since the second term is always nonnegative, when you take min you’ll get the closest estimate to the real time of the code execution. Median is slightly worse, mean is the worst of them all, since it is very skewed. Imagine, that in 10 runs you get 9 measurements with the time 1ms and 1 with the time 10s. Mean time would be 1s, which is definitely not representative of the actual execution time.

But anyway, whether you agree with it or not, it’s inconsistent to use and compare @btime and mean to profile the same code. It should be either one or another.

Wikunia · 2021-03-23T18:30:33.213Z

Thanks for the clarification @Skoffer . Will make the changes accordingly.

lmiq · 2021-03-24T00:36:19.792Z

Skoffer:

“time of the code itself + some random nonnegative noise

Once I benchmarked a code putting my laptop in the freezer. It was clearly faster. I will test that again and compare the minimum, median and average times obtained relative to room temperature, hopping to show that thermodynamic noise, not only operating system noise, enters into the equation.

Skoffer · 2021-03-24T04:10:29.459Z

Surely my statement is simplification. Another reason for “negative operating system time” can be governor management (I hope this term is correct). Operating system can change cpu frequency on demand, so it is possible, that during benchmark frequency can go up and overall execution time decrease. So yes, this formula is simplification.

jzr · 2021-03-24T05:23:26.181Z

See: Minimum times tend to mislead when benchmarking

rafael.guerra · 2021-03-24T06:27:21.944Z

See this paper conclusion: results suggest that using the minimum estimator for the true run time of a benchmark, rather than the mean or median, is robust to non-ideal statistics and also provides the smallest error.

Tamas_Papp · 2021-03-24T07:20:46.797Z

Skoffer:

the consensus is that min is the most adequate metric to measure actual code performance because the time of the code execution is always “time of the code itself + some random nonnegative noise from the operating system”

Or, possibly, garbage collection, which occurs in bursts. If some GC is inevitable, it is reasonable to use the median too because it gives you more realistic timing for practical purposes.

These are just informative statistics, there isn’t a single best one. That said, if you have to pick one, then in general minimum is a reasonable choice.