Post by ascotti on Dec 28, 2018 22:00:19 GMT
Hi! Today I have dedicated a fair amount of time to improving the rendering speed and want to share what I learned.
Initially, I think the most useful optimization is caching the inverse of the transform matrix, so that we don't have to compute inverse(transform) every time. This brings huge benefits.
Adding a bit of tuning here and there (e.g. the code that multiplies a matrix and a vector) and support for multiple threads, I thought I got the tracer running at a decent speeed. The "Reflections and refractions" image, rendered at 800x400 with 64 samples per pixel, took 4 minutes and 14 seconds using 4 threads.
After the triangles chapter I went back to the code and used a profiler on it.
A bit surprisingly, most of the rendering time was spent managing memory and in particular the intersection lists. These lists get continuously created, used and abandoned, which puts a lot of stress on the memory management library and on the garbage collector if you have one.
I was creating a new list at every invocation of intersect_world() for example, but after we get a hit out of it, and the prepare_comps(), the list is not used anymore and could be recycled for the next intersect.
To make this story short, I removed almost every memory allocation from the rendering code, i.e. from all the code triggered by color_at(): basically these are the intersection lists and the information from prepare_comps(). This was done one little step at a time, making sure all tests were always passing (had to temporarily comment out a few at times).
But the same picture is now rendering in 1 minute and 18 seconds... wow, that's a huge improvement! And... the profile is now starting to show actual raytracing code in the top positions which is good :-)
Well... I'll go back to the smooth triangles now, hope some of these tips can be useful to you too!
Initially, I think the most useful optimization is caching the inverse of the transform matrix, so that we don't have to compute inverse(transform) every time. This brings huge benefits.
Adding a bit of tuning here and there (e.g. the code that multiplies a matrix and a vector) and support for multiple threads, I thought I got the tracer running at a decent speeed. The "Reflections and refractions" image, rendered at 800x400 with 64 samples per pixel, took 4 minutes and 14 seconds using 4 threads.
After the triangles chapter I went back to the code and used a profiler on it.
A bit surprisingly, most of the rendering time was spent managing memory and in particular the intersection lists. These lists get continuously created, used and abandoned, which puts a lot of stress on the memory management library and on the garbage collector if you have one.
I was creating a new list at every invocation of intersect_world() for example, but after we get a hit out of it, and the prepare_comps(), the list is not used anymore and could be recycled for the next intersect.
To make this story short, I removed almost every memory allocation from the rendering code, i.e. from all the code triggered by color_at(): basically these are the intersection lists and the information from prepare_comps(). This was done one little step at a time, making sure all tests were always passing (had to temporarily comment out a few at times).
But the same picture is now rendering in 1 minute and 18 seconds... wow, that's a huge improvement! And... the profile is now starting to show actual raytracing code in the top positions which is good :-)
Well... I'll go back to the smooth triangles now, hope some of these tips can be useful to you too!