Julia has several ways to define loops or repeat code, each with advantages and disadvantages. I am looking at the performance of some also inspired by previous posts on this forum.
#!/usr/bin/env julia
# https://discourse.julialang.org/t/map-vs-loops-array-comprehensions-in-julia-1-0/16137
# Note, this thread is from 2018.
import BenchmarkTools
import Random
function loop_with_map(x)
x = map(a -> 2 * a, x)
return x
end
function loop_with_map_dot(x)
x .= map(a -> 2 * a, x) # Slowest
return x
end
function loop_list_comprehension(x)
# A list comprehension creates a new local scope and is therefore slow.
x = [ 2 * val for val in x]
return x
end
function loop_list_comprehension_dot(x)
# A list comprehension creates a new local scope and is therefore slow.
x .= [ 2 * val for val in x]
return x
end
function for_loop(x)
for i in range(1, length(x))
x[i] = 2 * x[i]
end
return x
end
function for_loop_opt(x)
for i in eachindex(x)
@inbounds x[i] = 2 * x[i]
end
return x
end
function loop_via_broadcast(x)
@. x = 2 * x
return x
end
function setup_function(n)
return Random.rand(Int64, n)
end
for n in [100, 1000, 10_000, 100_000, 1_000_000]
println("\n\nn = ", n)
BenchmarkTools.@btime loop_with_map(x_vec) setup = (x_vec = setup_function($n)) # Slow
BenchmarkTools.@btime loop_with_map_dot(x_vec) setup = (x_vec = setup_function($n)) # Very slow
BenchmarkTools.@btime loop_list_comprehension(x_vec) setup = (x_vec = setup_function($n)) # Slow
BenchmarkTools.@btime loop_list_comprehension_dot(x_vec) setup = (x_vec = setup_function($n)) # Very slow
BenchmarkTools.@btime for_loop(x_vec) setup = (x_vec = setup_function($n)) # Fast
BenchmarkTools.@btime for_loop_opt(x_vec) setup = (x_vec = setup_function($n)) # Fast
BenchmarkTools.@btime loop_via_broadcast(x_vec) setup = (x_vec = setup_function($n)) # Fast
end
Output:
2024_08_16 14:18:30 ~/temp $ loop_performance.jl
n = 100
89.105 ns (1 allocation: 896 bytes)
163.902 ns (1 allocation: 896 bytes)
138.964 ns (1 allocation: 896 bytes)
163.821 ns (1 allocation: 896 bytes)
12.209 ns (0 allocations: 0 bytes)
11.278 ns (0 allocations: 0 bytes)
12.198 ns (0 allocations: 0 bytes)
n = 1000
1.049 μs (1 allocation: 7.94 KiB)
1.180 μs (1 allocation: 7.94 KiB)
1.119 μs (1 allocation: 7.94 KiB)
1.140 μs (1 allocation: 7.94 KiB)
80.231 ns (0 allocations: 0 bytes)
80.838 ns (0 allocations: 0 bytes)
81.080 ns (0 allocations: 0 bytes)
n = 10000
7.885 μs (2 allocations: 78.17 KiB)
11.209 μs (2 allocations: 78.17 KiB)
8.743 μs (2 allocations: 78.17 KiB)
11.228 μs (2 allocations: 78.17 KiB)
1.239 μs (0 allocations: 0 bytes)
1.215 μs (0 allocations: 0 bytes)
1.306 μs (0 allocations: 0 bytes)
n = 100000
116.843 μs (2 allocations: 781.30 KiB)
171.743 μs (2 allocations: 781.30 KiB)
119.491 μs (2 allocations: 781.30 KiB)
174.575 μs (2 allocations: 781.30 KiB)
15.298 μs (0 allocations: 0 bytes)
14.419 μs (0 allocations: 0 bytes)
15.221 μs (0 allocations: 0 bytes)
n = 1000000
1.428 ms (2 allocations: 7.63 MiB)
2.420 ms (2 allocations: 7.63 MiB)
1.420 ms (2 allocations: 7.63 MiB)
2.573 ms (2 allocations: 7.63 MiB)
696.670 μs (0 allocations: 0 bytes)
664.293 μs (0 allocations: 0 bytes)
698.923 μs (0 allocations: 0 bytes)
2024_08_16 14:21:31 ~/temp $
Questions:
loop_with_map
: map
has been reported to be slow before, but a long time ago. I am surprised that it is slow today. Am I doing something wrong?
loop_with_map_dot
: This uses broadcasting to place the result of the calculation back in x
. The idea is that no new memory for x
has to be allocated; memory allocation is slow. Obviously this makes loop_with_map_dot
slower, not faster. I think the reason is that map
already allocates memory and that the broadcast makes that the old memory of x
has to be filled instead of just making x
point to the memory created by map
. More on broadcasting below. Correct?
loop_list_comprehension
: List comprehensions create their own scope which makes them slow. But I expected this creation of a new scope to take fixed time, i.e. independent of the number of times the loop is executed. What is going on?
loop_list_comprehension_dot
: Just the same as in loop_with_map_dot
, the broadcast serves to re-use the memory that x
points to, but as a result all of the values have to be pumped over. Correct?
for_loop_opt
: Skips index checks. Why is this faster than loop_via_broadcast
? It suggests (falsely?) that loop_via_broadcast
does a range check while this can’t fail since there is only one container involved. If two with different lengths were involved, I can imagine that a range check makes sense.
loop_via_broadcast
: here the re-use of the memory of x
actually works. This has been clarified here. Now I wonder if this is the only time this works, in essence.
Elrod
2
Yes.
You can use map!
if you want a map that updates the input, instead of allocating a result vector. Definitely prefer map!(f, x)
over x .= map(f, x)
.
2 Likes
mkitti
3
A two argument map!
does not exist in Base
.
2 Likes
Elrod
4
map!(f, x, x)
, then.
E.g.
map!(x -> x + x, x, x)
1 Like
mkitti
5
The docs warn against the arguments sharing memory.
│ Warning
│
│ Behavior can be unexpected when any mutated argument
│ shares memory with any other argument.
8 Likes
Link to #50824 that added that aliasing warning and has discussion and links to some related issues. While map!(f, x, x)
is not currently guaranteed to work, it is relatively uncontroversial that it had ought to work in most circumstances. If you’re feeling slightly brave, you’ll probably never run into issues when the input is identical to the output (but more complicated aliasing could still be problematic). But broadcasting does guarantee that simple aliasing like x .= 2 .* x
will work, so probably safer to use that (see also broadcast!
).
Simple counter-example with severe consequences:
julia> v = zeros(2); map!(x->x+1, v, v); v
2-element Vector{Float64}:
1.0
1.0
julia> using SparseArrays
julia> v = spzeros(2); map!(x->x+1, v, v); v
2-element SparseVector{Float64, Int64} with 2 stored entries:
[1] = 2.0
[2] = 2.0
5 Likes
Professionally, I never feel brave, sorry. My company makes extremely expensive machines and downtime of these machines is also very expensive… and rework of the products of these machines is also costly. So I am looking for the safest, securest ways for loops without loosing performance so I can promote such ways to colleagues.
4 Likes
SparseArrays
has a specialized method for map!
, which seems to cause the issue (but could potentially be improved, at least for the trivial case of ===
output with an input). But in any case,
julia> v = spzeros(2); broadcast!(x->x+1, v, v); v
2-element SparseVector{Float64, Int64} with 2 stored entries:
[1] = 1.0
[2] = 1.0
so broadcast!
is a suitable replacement for map!
.
2 Likes
DNF
12
In that case, use one of the two above, with some added remarks:
- Add a trailing
!
to the function names to indicate mutation.
- Remove
@inbounds
, it already happens automatically with eachindex
.
- you can use the nice update operator
*=
, so that a = 2 * a
becomes a *= 2
.
1 Like
DNF
13
I don’t know about this. Isn’t it just the allocation that’s the problem?
1 Like
Yes, broadcasting is much more conservative when it comes to aliasing — broadcast!
doesn’t have that warning, but you’ll also see that it (sometimes) allocates more than you’d expect. It errs on the side of (sometimes) making surprising defensive copies, which you may not want in cases where you’re chasing peak performance — which points us back on topic a bit more here.
In this case, with arrays that use the default broadcast implementation, the loop_via_broadcast
sees that it iterates both the LHS and RHS “in step” with eachother and thus new outputs won’t overwrite yet-to-be-read inputs… and so the implementation needn’t make that defensive copy.
The most important takeaways when looking at all of these different ways of performing 2x
is that:
- Assignment (the single
=
) is just about names that you’re choosing to use to describe things. Choosing to change what a name means doesn’t have anything to do with “memory re-use”.
- All arguments are always evaluated — separately! — before being passed to a function. It can sometimes be tricky to see what makes for a separate function call — especially in the face of broadcast dots and other syntax sugars, though.
Finally, and perhaps most mind-bendingly,
- The superpower of Julia is that the code you write can be as fast — or faster — than its “builtins.” Broadcasting is complicated, far more complicated than your simple loop. And simple is often fast.
6 Likes
The weak evidence about scope in list comprehensions is here.. Checking this is above my pay grade. Also, I think a simple function has its own scope, so I doubt now my own statement. I got it from Discourse somewhere though, but I can’t find it back.
loop_with_map
and loop_list_comprehension
actually do not mutate (to also my surprise), the others do so.
1 Like
That is baffling indeed. Great catch. Although off-topic, if anybody understands this, please share!
loop_with_map
and loop_list_comprehension
are both of the form:
function f(x)
x = #... something
end
This is just deciding you have a better use for the name x
within the context of that function body. What x
happened to be (or not) before the re-assignment has no effect. It’s just a name for us humans, no different than initially using the name “Matt” for me and then deciding that you’d rather use it for yourself. You’re not changing anything about either me or you by changing our nicknames!
This is just deciding you have a better use for the name x
within the context of that function body. What x
happened to be (or not) before the re-assignment has no effect. It’s just a name for us humans, no different than initially using the name “Matt” for me and then deciding that you’d rather use it for yourself. You’re not changing anything about either me or you by changing our nicknames!
[/quote]
Correct, I spoke in haste. If x
is assigned a new object in the function, the old object is unchanged and found back after the function is terminated. If the object is essentially re-used, it is mutated and the changed values are found back on the caller side. I understand this.
sgaure
20
List comprehensions create a closure, i.e. an anonymous function, so definitely a new scope. However, a new scope does not slow anything down (except possibly the compiler, though I doubt it’s very significant).
To actually answer your question: Just write the loop. It is simple, expressive and fast. Everything else (like broadcasting) is eye candy/convenience in the end. You can still (and actually) should use the appropriate iteration function.
So the best version of your function (which incorporates the tips by @DNF):
function for_loop_opt!(x)
for i in eachindex(x)
x[i] *= 2
end
return x
end
This
- is easy to read and understand even when you are not a Julia professional
- does not allocate
- has no possible out-of-bounds memory access (
@inbounds
is dangerous!)
- actually infers the
inbounds
property if the compiler can prove it (e.g. for standard Array
s)
2 Likes
DNF
22
But I was specifically referring to for_loop_opt
and loop_via_broadcast
, which both do.