Performance improvements

ascotti
Junior Member

Posts: 56

Performance improvements Dec 28, 2018 22:00:19 GMT Jamis, Richard Moss, and 1 more like this

Quote

Post by ascotti on Dec 28, 2018 22:00:19 GMT

Hi! Today I have dedicated a fair amount of time to improving the rendering speed and want to share what I learned.

Initially, I think the most useful optimization is caching the inverse of the transform matrix, so that we don't have to compute inverse(transform) every time. This brings huge benefits.

Adding a bit of tuning here and there (e.g. the code that multiplies a matrix and a vector) and support for multiple threads, I thought I got the tracer running at a decent speeed. The "Reflections and refractions" image, rendered at 800x400 with 64 samples per pixel, took 4 minutes and 14 seconds using 4 threads.

After the triangles chapter I went back to the code and used a profiler on it.

A bit surprisingly, most of the rendering time was spent managing memory and in particular the intersection lists. These lists get continuously created, used and abandoned, which puts a lot of stress on the memory management library and on the garbage collector if you have one.

I was creating a new list at every invocation of intersect_world() for example, but after we get a hit out of it, and the prepare_comps(), the list is not used anymore and could be recycled for the next intersect.

To make this story short, I removed almost every memory allocation from the rendering code, i.e. from all the code triggered by color_at(): basically these are the intersection lists and the information from prepare_comps(). This was done one little step at a time, making sure all tests were always passing (had to temporarily comment out a few at times).

But the same picture is now rendering in 1 minute and 18 seconds... wow, that's a huge improvement! And... the profile is now starting to show actual raytracing code in the top positions which is good :-)

Well... I'll go back to the smooth triangles now, hope some of these tips can be useful to you too!

Jamis Administrator Posts: 310	Performance improvements Dec 28, 2018 23:37:17 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Jamis on Dec 28, 2018 23:37:17 GMT That's great! Thanks for sharing. I think I'll go have a look in my own code at reusing some of these data structures...

seye
New Member

Posts: 1

Performance improvements Jan 26, 2019 17:35:00 GMT Jamis and ascotti like this

Quote

Post by seye on Jan 26, 2019 17:35:00 GMT

To echo what ascotti has said, I found an approximately 100x speedup (after profiling) from just caching object inverse transformation matrices, so it would definitely be a useful addition to any future editions. Weirdly, I'd suspected the memory management and sorting of the intersect lists would be the killer, but that doesn't seem to be too bad. As a fine tuning later, I might try to see if I can tweak it, but at the moment, I'm not going to play it too fancy and trust my GC.

gbordelon
New Member

Posts: 17

Performance improvements Oct 19, 2019 10:38:15 GMT Jamis likes this

Quote

Post by gbordelon on Oct 19, 2019 10:38:15 GMT

My tracer gained some speed after caching bounding boxes. Once the world has been created and the BVH established, bounds shouldn't change so compute all the bounding boxes just once.

I also gained a huge perf boost by removing dynamic intersection arrays. Each object has its own array of intersections which get reused. This isn't thread-safe and instead of adding synchronization, I created a way to recursively make a complete copy of a world so each thread gets its own world (and gets its own set of intersection arrays). Obviously lights and textures and other read-only structures don't need to be duplicated.

mariog
New Member

Posts: 9

Performance improvements Oct 25, 2019 13:47:23 GMT Jamis likes this

Quote

Post by mariog on Oct 25, 2019 13:47:23 GMT

Hi,

I followed ascotti's performance improving tips with the renderer (mine is written in Java), parallelized it and cached the inverse matrix transform. I saw a massive boost in performance. The image below is 1024x768, when I rendered it multiple times I managed a best time of 1.1 seconds, and the worst was 2.3 seconds (down from 32 seconds) on a Dell G3 6 core 2.6Ghz laptop. Amazing! Thanks.

Last Edit: Oct 25, 2019 15:00:01 GMT by mariog

gbordelon
New Member

Posts: 17

Performance improvements Oct 29, 2019 3:55:22 GMT

Quote

Post by gbordelon on Oct 29, 2019 3:55:22 GMT

My tracer got a big perf boost for scenes with lots of triangles and BVH. When a group's local_intersect function is collecting all its children's intersections for a shadow calculation, it can stop after finding the first intersection with t > 0. That optimization halved the render time for a scene with about 9k triangles.

My tracer is written in C. After adding -O2 to the makefile's cc args, my tracer got a huge perf boost. I also added -march=native to the cc args but I have not compared perf yet.

Jamis
Administrator

Posts: 310

Performance improvements Nov 8, 2019 4:18:24 GMT

Quote

Post by Jamis on Nov 8, 2019 4:18:24 GMT

gbordelon -- dropping all intersections that are behind the eye will speed things up a bit, but those intersections are useful when doing refraction (so you can find the index of refraction of the material that contains the eye) and necessary when doing CSG (so that you know which shape contains each intersection).

jdunlap
New Member

Posts: 18

Performance improvements Jan 22, 2021 18:55:19 GMT

Quote

Post by jdunlap on Jan 22, 2021 18:55:19 GMT

For my tracer written in C#, I got a huge performance boost by caching inverse transforms. The render for Refraction/Reflection from the book for which Jamis posted the YAML elsewhere on the forum went from slightly over 8 minutes to 10.4 seconds. That is a huge improvement! I was already caching bounding boxes for groups, although I just save them on first calculation, instead of pre-calculating them all. I might switch since pre-calculation would remove a null check during each call. Not a real time consumer, but it can add up.

Post by ascotti on Dec 28, 2018 22:00:19 GMT

Post by Jamis on Dec 28, 2018 23:37:17 GMT

Post by seye on Jan 26, 2019 17:35:00 GMT

Post by gbordelon on Oct 19, 2019 10:38:15 GMT

Post by mariog on Oct 25, 2019 13:47:23 GMT

Post by gbordelon on Oct 29, 2019 3:55:22 GMT

Post by Jamis on Nov 8, 2019 4:18:24 GMT

Post by jdunlap on Jan 22, 2021 18:55:19 GMT