I have resumed work on voxel-based global illumination using voxel cone step tracing in Leadwerks Game Engine 5 beta with our Vulkan renderer. I previously put about three months of work into this with some promising results, but it is a very difficult system and I wanted to focus on Vulkan. Some of features we have gained since then like Pixmaps and DXT decompression make the voxel GI system easier to finish.
I previously considered implementing Nvidia's raytracing techniques for Vulkan but the performance is terrible, even on the highest-end graphics cards. Voxel-based GI looks great and runs fast with basically no performance penalty.
Below we have a section of the scene voxelized and lit with direct lighting. Loading the Sponza scene from GLTF format made it easy to display all materials and textures correctly.
I found that the fastest way to manage voxel data was by storing the data in one big STL vector, and storing an STL set of occupied cells. (An STL set is like a map with only keys.) I found the fastest way to perform voxel raycasting was actually just to walk through the voxel data with optimized C++ code. This was much faster than my previous attempts to use octrees, and much simpler too! The above scene took about about 100 milliseconds to calculate direct lighting on a single CPU core, which is three times faster than my previous attempts. This definitely means that CPU-based GI lighting may be possible, which is my preferred approach. It's easier to implement, easy to parallelize, more flexible, more reliable, uses less video memory, transfers less data to the GPU, and doesn't draw any GPU power away from rendering the rest of the scene.
The challenge will be in minimizing the delay between when an object moves, GI is recalculated, and when the data uploaded to the GPU and appears onscreen. I am guessing a delay somewhere around 200 milliseconds will be acceptable. It should also be considered that only an onscreen object will have a perceived delay if the reflection is slow to appear. An offscreen object will have no perceived delay because you can only see the reflection. Using screen-space reflections on pixels that can use it is one way to mitigate that problem, but if possible I would prefer to use one uniform system instead of mixing two rendering techniques.
If this does not work then I will upload a DXT compressed texture containing the voxel data to the GPU. There are several stages at which the data can be handed off, so the question is which one works best?
My design has changed a bit, but this is a pretty graphic.
Using the pixmap class I will be able to load low-resolution versions of textures into system memory, decompress them to a readable format, and use that data to colorize the voxels according to the textures and UV coordinates of the vertices that are fed into the voxelization process.