After observing the behavior of the previous test, I rearranged the threading architecture for even more massive performance gains. This build runs at speeds in excess of 400 FPS with 100,000 entities....on Intel integrated graphics!
I've had more luck with concurrency in design than parallelism. (Images below are taken from here.)
Splitting the octree recursion up into separate threads produced only modest gains. It's difficult to optimize because the sparse octree is unpredictable.
Splitting different parts of the engine up into multiple threads did result in a massive performance boost.
The same test in Leadwerks 4 runs at about 9 FPS. making Leadwerks 5 more than 45 times faster under heavy loads like this.
Alpha subscribers can try the test out here.