So I was testing julia with different number of threads using $ julia -t 1
or $ julia -t 7
. I run versioninfo()
and it confirms that I do in fact have 1 or 7 cores on a basic desktop setup. I tested a decent sized matrix and ran a simple matrix multiplication, which I thought would already run in parallel naturally (based on a stack overflow answer). However, I get identical benchmarks with @benchmark or @time so I’m very confused now. The code was very simple, a matrix of 0,1, or 2’s (2,000 x 50,000) called M
and I did G = M * M'
and got 223 seconds almost exactly for both runs. I assumed it would be ~7 times faster with 7 cores.
Just confused and looking for general help on why this didn’t happen. Is -t 1
being ignored and using all cores anyway?
lmiq
2
This probably answers your question: Frequently Asked Questions · DifferentialEquations.jl
But the short answer is that BLAS multi threading is controlled independently of the Julia multi threading.
4 Likes
Thank you for your answer. So I did find this option, but I just tested again. I started up with 1 vs 7 threads. I then set with BLAS.set_num_threads(1) or (7) and get identical results again (~223 seconds). I still don’t get what julia is doing. Any more ideas?
lmiq
4
This is what I get here:
julia> A = rand(10^4, 10^4);
julia> BLAS.set_num_threads(1)
julia> @time A * A';
20.189948 seconds (3 allocations: 762.940 MiB)
julia> BLAS.set_num_threads(4)
julia> @time A * A';
7.785689 seconds (3 allocations: 762.940 MiB, 1.11% gc time)
1 Like
hummm, I’ll try again. Thank you.
Wow, so I simplified again, almost exactly your code and cannot find a difference in time… I don’t know what I’m doing wrong here. No matter what combination of julia -t
I use or BLAS.set_num_threads()
it comes up with the same time. I’m very baffled by this.
Next question is how can I determine by the run how many cores it used? Is there a macro for that? Maybe the Profile.jl package?
@Imiq When I used your example it worked… yet again, I try my example and no difference. This is very frustrating. Not sure what is going on.

I commented below here with my example where I don’t see a difference… Any ideas? Maybe due to the type of matrix?
jling
9
what version of Julia are you using? paste result of versioninfo()
would be helpful
Would appreciate any help. Thanks.

lmiq
11
What type of matrix is M? How exactly are you generating it?
Matrix{Int64} (alias for Array{Int64, 2})

lmiq
13
Maybe BLAS doesn’t have methods for boolean matrices. Try using Octavian
maybe? (It is sensible to the number of Julia threads).
DNF
14
This is why minimal working examples are useful to get quick and helpful resolution of your issues:
1 Like
Definitely not. You only get BLAS acceleration for floating-point types.
I would just convert it to Matrix{Float32}
(which is exact for integers up to 16777216
) or Matrix{Float64}
(which is exact for integers up to 9007199254740992
). These will benefit from optimized multithreaded BLAS, SIMD, etcetera.
For your matrix sizes, Float32
is almost 200x faster on my machine:
julia> M = rand(0:2, 2000,50000);
julia> using BenchmarkTools;
julia> @btime $M * $M';
126.120 s (5 allocations: 30.55 MiB)
julia> A = Matrix{Float32}(M);
julia> M * M' == A * A' # Float32 is exact for integers of this size
true
julia> @btime $A * $A';
688.669 ms (2 allocations: 15.26 MiB)
6 Likes
Wow, I had no idea. This doesn’t make any intuitive sense to me… There must be a package that implements multi-threading for integers?Thank you for your help.
Yup, amazing. Thank you very much. Works great. Had no idea, I will look in the documentation and see if that is apparent, I maybe missed something.
I appreciate your help! Solved below.