Second Performance Test: nearly 400% faster!



After observing the behavior of the previous test, I rearranged the threading architecture for even more massive performance gains. This build runs at speeds in excess of 400 FPS with 100,000 entities....on Intel integrated graphics!


I've had more luck with concurrency in design than parallelism. (Images below are taken from here.)

Splitting the octree recursion up into separate threads produced only modest gains. It's difficult to optimize because the sparse octree is unpredictable.


Splitting different parts of the engine up into multiple threads did result in a massive performance boost.


The same test in Leadwerks 4 runs at about 9 FPS. making Leadwerks 5 more than 45 times faster under heavy loads like this.

Alpha subscribers can try the test out here.

Recommended Comments

When you're doing this threading it's really more about the processor than the gfx card isn't it? Are these threads on the CPU or GPU?

CPU. The rendering code is already very optimized and this is about eliminating all overhead on the CPU side.

as each thread is doing a smaller subset of the work, you are probably getting more cache hits, are you also using thread affinity on your busiest threads to stop them context switching.


I got the culling time down to an insanely low amount, and it would actually be much slower if I split it up into multiple threads:


sorry I assumed that each task was on a thread, running independently. I was suggesting that if you had more active threads than cpus, you would experience contention for those cpus. You would see this in the performance stats, as context switches, which will cause the current context to be saved, and another loaded. If this happens a lot you are loosing useful cpu processing power.

