Good catch, thanks a lot @jling! I should have noticed this, it’s one of the first tips @ChrisRackauckas taught in his course!
Surprisingly, if I iterate over each column in my render() function, with a tiny scene (scene_2_spheres
), iterating per column is consistently slower by a large margin… (compared 5 times). (BTW I also fixed a minor logic bug… I kept applying random offset, brownian-motion-style, instead of simply applying a delta per sample around each image pixel.)
function render(scene::HittableList, cam::Camera, image_width=400,
n_samples=1)
# Image
aspect_ratio = 16.0f0/9.0f0 # TODO: use cam.aspect_ratio for consistency
image_height = convert(Int64, floor(image_width / aspect_ratio))
# Render
img = zeros(RGB{Float32}, image_height, image_width)
# Compared to C++, Julia:
# 1. is column-major, elements in a column are contiguous in memory
# this is the opposite to C++.
# 2. uses i=row, j=column as a convention.
# 3. is 1-based, so no need to subtract 1 from image_width, etc.
# 4. The array is Y-down, but `v` is Y-up
# Usually iterating over 1 column at a time is faster, but
# surprisingly, in this case the opposite pattern arises...
for j in 1:image_width, i in 1:image_height # iterate over each column
#for i in 1:image_height, j in 1:image_width # iterate over each row
accum_color = @SVector[0f0,0f0,0f0]
u = convert(Float32, j/image_width)
v = convert(Float32, (image_height-i)/image_height) # i is Y-down, v is Y-up!
for s in 1:n_samples
if s == 1 # 1st sample is always centered, for 1-sample/pixel
δu = 0f0; δv = 0f0
else
# claforte: I think the C++ version had a bug, the rand offset was
# between [0,1] instead of centered at 0, e.g. [-0.5, 0.5].
δu = convert(Float32, (rand()-0.5f0) / image_width)
δv = convert(Float32 ,(rand()-0.5f0) / image_height)
end
ray = get_ray(cam, u+δu, v+δv)
accum_color += ray_color(ray, scene)
end
img[i,j] = rgb_gamma2(accum_color / n_samples)
end
img
end
# `for j in 1:image_width, i in 1:image_height # iterate over each column`: 10.489 ms
# `for i in 1:image_height, j in 1:image_width # iterate over each row`: 8.129 ms
@btime render(scene_2_spheres(), default_camera(), 96, 16) # 16 samples
# Iterate over each column: 614.820 μs
# Iterate over each row: 500.334 μs
@btime render(scene_2_spheres(), default_camera(), 96, 1) # 1 sample
I guess this can easily be explained by get_ray
or ray_color
thrashing a cache… but that was a surprisingly large margin… maybe there’s still a bug in my code somewhere?
Anyway, as expected, this change makes no noticeable difference in larger scenes that have hundreds of spheres, like scene_random_spheres
.
For now I’ll keep the faster loop, and test again later, when I optimize for SIMD. 