Jump to content


  • Posts

  • Joined

  • Last visited

Blog Entries posted by Josh

  1. Josh
    I had to spend several weeks just eliminating light leaks and other artifacts, and getting the results I wanted in a variety of scenes. The results are looking good. Everyone who tries implementing this technique has problems with light leaks but I have fortunately been able to avoid this with careful planning:

    Now that I have nice results with a single volume texture centered at the origin, it's time to add additional stages. The idea is to have a cascading series of volume textures around the camera, where each volume is twice the dimensions (and eight times the volume) of the previous one. This allows us to cover a large area of the scene, while using lower-resolution data for parts of the scene that are further away from the camera:

    Putting this into practice, in this shot you can see some ambient occlusion and reflections in the trench. (The metalness is turned up really high to make it more easily visible, so don't worry about the bright white specs.)

    A little bit further from the center of the scene, we see another trench. This appears in the second GI volume with half the voxel resolution of the previous image, so it has some small artifacts. However, this will also appear further from the camera so they won't be noticeable:

    The transition between stages is good enough. If you look carefully at the floor between stage 0 and 1, the reflection of the window is lost in the lower-resolution stage 1. On the wall in between stage 1 and 2 the boundary is visible. However, the camera is going to be further away so those artifacts won't be as apparent as they are now.

    From the outside, we can see four 64x64x64 volume textures can be used to cover the entire train station, with a base voxel size of 12.5 centimeters.

    To cover the same area with a single volume texture we would need a 512x512x512 texture. Using several cascaded volume textures brings our memory usage down to less than 1% what it would be otherwise:
    64*64*64*4 = 1048576
    512*512*512 = 134217728
    1048576 / 134217728 * 100 = 0.78125%
    There is still a lot of room for optimization. We can perform the voxelization step for all four stages in one single pass using multi-target rendering, like the pointlight shadow shader does. We could also distribute the GI stage updates so that only one gets drawn each frame, since even objects in motion probably won't cause a change in the voxelized result every single frame. Right now I am just focusing on optimizing the shader and rendering everything each frame, so I can deal with a worst case scenario before I start adding techniques that will make it harder to measure performance.
    Without doing any performance tests, the rendering seems quite fast. I am not even using this in a real-time application right now, but I definitely get a feel for how responsive the viewport rendering is to mouse movement, and it seems to be very snappy so far. I can definitely bog it down if I turn up the settings too high.
    There are a lot of settings that are very scalable for performance / quality such as voxel resolution, number of cascaded stages, maximum ray steps, and number of light bounces.
    I've done some initial tests trying to make the volumes move around with the camera, and that produced a lot of new and strange artifacts I didn't anticipate. So I think we can expect a few more weeks of slow but steady progress as I dive into this even deeper.
    There's a lot of academic papers and demos out there that show this technique, but delivering a complete solution that produces good results for any scene is quite a challenge, and I've been working on this for four years, with the last three months spent pretty much full-time, just on this feature! 
    But I am glad to do this, because my love for you is so great and I want you to be happy.

  2. Josh
    Finally, finally, finally, finally, for the first time since I started working on this feature several years ago, finally we have real-time global illumination with a second light bounce: Below you can see the direct light hitting the floor, bounding up to the ceiling, and then being reflected back down on the floor again.

    Performance is still good and I have not started fine-tuning optimization yet. I was just trying to get the effect working at all, which was quite difficult to do, but now it works great.
    In the shot above, ambient light is set to black. Although the GI does light up a lot of the room, there is probably still a place for a small amount of flat ambient light, which can then be darkened by the ambient occlusion effect.
    Light leakage is really not an issue. The algorithm does a good job keeping dark indoor areas dark, and only lets light in where it should be.

    While areas with more sunlight exposure are quite a lot more bright:

    Everything else about Ultra Engine is great, but I think this is going to be the thing that really attracts a lot of people. I am very glad I get to include a feature so groundbreaking and amazing as this.
  3. Josh

    Now that I have the downsampled reflection data working, I can start casting rays. The cone step tracing is not a 100% perfect representation of physical light, but it gives a very favorable balance of quality and performance. Somehow I came up with a few formulas that eliminate light leaks and other artifacts.

    Quite honestly I did not think the results would be this good. Indoor / outdoor scenes with thin walls are very difficult to prevent light leaks in, but somehow it's working very nicely, even with a blurry reflection. I am seeing some banding, but I'm sure that can be ironed out.

    It's good to test reflections first because global illumination is just a bunch of reflection raycasts.

    Adding in the GI component will be the next step.
  4. Josh
    For downsampling of GI voxel data, I found that a compute shader offers the best performance. The first step was to add support for compute shaders into Ultra Engine. 
    I've never used these before but I was able to get them working pretty quickly. I think the user API will look something like this:
    //Load compute shader auto module = LoadShaderModule("Shaders/Compute/test.comp.spv"); auto shader = CreateShader(); shader->SetModule(module, SHADER_COMPUTE); //Create work group int workercount = 8; auto workgroup = CreateWorkgroup(shader, workercount, workercount, workercount); This is not using my final formula. It's not even using bilinear sampling, it's just using a single sample from the previous mipmap. But it's basically working:

    Next I will fix up the downfilter sampling and then we can start doing GI and reflections again. Basically, we are in exactly the same place we were a month ago, except now the voxelization process is instantaneous, and will work with animated models and objects in motion. Below is a shot of the results I was getting previously:

    It will be very interesting to see this working with animated models. Maybe we can remake the first level of Quake 2 and compare it to the RTX version.
    I'd like to thank Vilém Otte for his helpful advice with this.
  5. Josh

    After testing and some discussion with other programmers, I decided to try performing voxelization on the GPU instead of the CPU. The downside is the memory usage is much higher than a sparse voxel octree, but I found that sparse voxel octrees were very slow when it came to soft reflections, although the results of the sharp raycast were impressive:
    You can read the details of GPU voxelization here if you wish.

    Initially I thought the process would require rendering the scene in one pass for each slice of the volume texture, but it is actually possible to voxelize a 3D scene in just one single pass. Implementation was fairly difficult, but it's finally working:

    The voxelization process is very fast, fast enough to do in real-time. That means the problems I had earlier with reflection lag (shown below) should be eliminated with this approach. Animated characters should work with absolutely no problem as well:
    I am experiencing some flickering when multiple triangles contribute to a voxel. Because the volume texture is written to in random order, one imageStore() operation might overwrite the result of another one, and the "winning" pixel (last one drawn) can change from frame to frame. So that's something that needs to be solved.
  6. Josh

    I've got cone step tracing working now with the sparse voxel octree implementation. I actually found that two different routines are best when the surface is rough or smooth. For sharp reflections, and precise voxel raytracing works best:

    For rough surfaces, cone step tracing can be used. There are some issues to work out and I need to revisit the downsampling routine, but it's basically working:

    Here's a video showing the sharp raycast in motion. Performance is quite good with this:
  7. Josh

    I've moved on to one of the final steps for voxel cone step tracing, which is downsampling the lit voxels in a way that approximates a large area of rays being cast. You can read more about the details of this technique here.
    This artifact looks like a mirror that is sunken below the surface of some kind of frame. It was appearing because the mesh surface was inside the voxel, and neighboring voxels were being intersected. The solution was to move the ray starting point out of the voxel the point was in, using the normal's largest axis to determine which direction to move:

    Once that was fixed the artifact disappeared. This series of images shows reflections read from each LOD level. The first image is full resolution, and each image after that gets lower-res and blockier/ Notice the lighting in the reflections is much more accurate than in previous images.

    Because the downsampling routine does not yet consider the alpha value, the geometry has a tendency to grow as it is downsampled. The next step is to determine an equation that will consider the alpha component of each voxel, and use that to start fading the shapes out as the bigger voxels start spanning areas that are both solid and empty. This is the magic optimization that makes cone step tracing an imperfect but fast approximation of ray tracing for real-time rendering.
    A naive approach to downsampling would just take the average of the 8 child node colors. This would also result in a lot of light leaking. Instead, I took an average of the closest four children, then performed an alpha blend with the furthest four children, for each axis. When we add transparency into the downsampling and raycasting routine, the reflection gets more confusing, but it's generally correct. Most importantly, the skybox is not leaking through the reflection.

    I think there's a lot I can experiment with here. I'm using six images, with a lighting calculation for the positive and negative direction on each axis, but since it's only an approximation it might be possible to merge that into one image. The transparent areas are hitting interior faces of the voxels, which looks strange, but it is what I told it to do, and I am not sure what the alternative is. I've never actually seen any voxel cone step tracing demo that was this precise. Normally the reflection is not shown very clearly. So it's hard to know how I can improve it, but it's getting there.
  8. Josh

    I've now got basic specular reflections working with the sparse voxel octree system. This uses much less memory than a voxel grid or even a compressed volume texture. It also supports faster optimized ray tests, for higher quality reflections and higher resolution. Some of the images in this article were not possible to produce in my initial implementation that used volume textures.
    This shot shows the reflection of just the diffuse color. Notice the red column is visible in three reflections, but not in the reflected floor. It would be possible to add a secondary bounce to add reflections in reflections:

    With direct lighting added to the reflection, and the resolution turned up a bit, we can see the ray tracing is getting quite detailed. Of course, we prefer to use blurred downsampled data for voxel ray tracing, but the results so far indicate there is enough data to produce a good final result:

    In the shot below we are using a voxel size of about three centimeters, in a 1024x1024x1024 sparse voxel octree. A simple voxel grid would require 32 GB of video memory, but our structure fits into less than 240 MB.

    Turning the reflectivity up for all materials doesn't really look good and creates a confusing look, but it's still interesting to see. The amount of detail we see in the voxel reflections is quite good. The voxels are so high resolution we can even see the texture details of the original polygon mesh!

    The speed of the octree traversal routine is very important here, and I am in contact with some university faculty to see about implementing something special to give you the maximum possible performance.
    The next step is to downsample the octree data to display blurrier reflections. This will also be used for calculating GI.
  9. Josh

    While seeking a way to increase performance of octree ray traversal, I came across a lot of references to this paper:
    Funnily enough, the first page of the paper perfectly describes my first two attempted algorithms. I started with a nearest neighbor approach and then implemented a top-down recursive design:
    GLSL doesn't support recursive function calls, so I had to create a function that walks up and down the octree hierarchy without calling itself. This was an interesting challenge. You basically have to use a while loop and store your variables at each level in an array. Use a level integer to indicate the current level you are working at, and everything works out fine.
    while (true) { childnum = n[level]; n[level]++; childindex = svotnodes[nodeindex].child[childnum]; if (childindex != 0) { pos[level + 1] = pos[level] - qsize; pos[level + 1] += coffset[childnum] * hsize; bounds.min = pos[level + 1] - qsize; bounds.max = bounds.min + hsize; if (AABBIntersectsRay2(bounds, p0, dir)) { if (level == maxlevels - 2) { if (SVOTNodeGetDiffuse(childindex).a > 0.5f) return true; } else { parent[level] = nodeindex; nodeindex = childindex; level++; n[level] = 0; childnum = 0; size *= 0.5f; hsize = size * 0.5f; qsize = size * 0.25f; } } } while (n[level] == 8) { level--; if (level == -1) return false; nodeindex = parent[level]; childnum = n[level]; size *= 2.0f; hsize = size * 0.5f; qsize = size * 0.25f; } } I made an attempt to implement the technique described in the paper above, but something was bothering me. The octree traversal was so slow that even if I was able to speed it up four times, it would still be slower than Leadwerks with a shadow map.
    I can show you very simply why. If a shadow map is rendered with the triangle below, the GPU has to process just three vertices, but if we used voxel ray tracing, it would require about 90 octree traversals. I think we can assume the post-vertex pipeline triangle rasterization process is effectively free, because it's a fixed function feature GPUs have been doing since the dawn of time:

    The train station model uses 4 million voxels in the shot below, but it has about 40,000 vertices. In order for voxel direct lighting to be on par with shadow maps, the voxel traversal would have to be about 100 times faster then processing a single vertex. The numbers just don't make sense.

    Basically, voxel shadows are limited by the surface area, and shadow maps are limited by the number of vertices. Big flat surfaces that cover a large area use very few vertices but would require many voxels to be processed. So for the direct lighting component, I think shadow maps are still the best approach. I know Crytek is claiming to get better performance with voxels, but my experience indicates otherwise.
    Another aspect of shadow maps I did not fully appreciate before is the fact they give high resolution when an object is near the light source, and low resolution further away. This is pretty close to how real light works, and would be pretty difficult to match with voxels, since their density does not increase closer to the light source.

    There are also issues with moving objects, skinned animation, tessellation, alpha discard, and vertex shader effects (waving leaves, etc.). All of these could be tolerated, but I'm sure shadow maps are much faster, so it doesn't make sense to continue on that route.
    I feel I have investigated this pretty thoroughly and now I have a strong answer why voxels cannot replace shadow maps for the direct shadows. I also developed a few pieces of technology that will continue to be used going forward, like our own improved mesh voxelization and the sparse octree traversal routine (which will be used for reflections). And part of this forced me to implement Vulkan dynamic rendering, to get rid of render passes and simplify the code.
    Voxel GI and reflections are still in the works, and I am farther along than ever now. Direct lighting is being performed on the voxel data, but now I am using the shadow maps to light the voxels. The next step is to downsample the lit voxel texture, then perform a GI pass, downsample again, and perform the second GI pass / light bounce. Because the octree is now sparse, we will be able to use a higher resolution with faster performance than the earlier videos I showed. And I hope to finally be able to show GI with a second bounce.
  10. Josh

    The VK_KHR_dynamic_rendering extension has made its way into Vulkan 1.2.203 and I have implemented this in Ultra Engine. What does it do?
    Instead of creating renderpass objects ahead of time, dynamic rendering allows you to just specify the settings you need as your are performing filling in command buffers with rendering instructions. From the Khronos working group:
    In my experience, post-processing effects is where this hurt the most. The engine has a user-defined stack of post-processing effects, so there are many configurations possible. You had to store and cache a lot of renderpass objects for all possible combinations of settings. It's not impossible but it made things very very complicated. Basically, you have to know every little detail of how the renderpass object is going to be used in advance. I had several different functions like the code below, for initialing renderpasses that were meant to be used at various points in the rendering routine.
    bool RenderPass::InitializePostProcess(shared_ptr<GPUDevice> device, const VkFormat depthformat, const int colorComponents, const bool lastpass) { this->clearmode = clearmode; VkFormat colorformat = __FramebufferColorFormat; this->colorcomponents = colorComponents; if (depthformat != 0) this->depthcomponent = true; this->device = device; std::array< VkSubpassDependency, 2> dependencies; dependencies[0] = {}; dependencies[0].srcSubpass = VK_SUBPASS_EXTERNAL; dependencies[0].dstSubpass = 0; dependencies[0].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; dependencies[0].srcAccessMask = 0; dependencies[0].dstStageMask = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT; dependencies[0].dstAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT; dependencies[1] = {}; dependencies[1].srcSubpass = VK_SUBPASS_EXTERNAL; dependencies[1].dstSubpass = 0; dependencies[1].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; dependencies[1].srcAccessMask = 0; dependencies[1].dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; dependencies[1].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT; renderPassInfo = {}; renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO; renderPassInfo.attachmentCount = colorComponents; renderPassInfo.dependencyCount = colorComponents; if (depthformat == VK_FORMAT_UNDEFINED) { dependencies[0] = dependencies[1]; } else { renderPassInfo.attachmentCount++; renderPassInfo.dependencyCount++; } renderPassInfo.pDependencies = dependencies.data(); colorAttachment[0] = {}; colorAttachment[0].format = colorformat; colorAttachment[0].samples = VK_SAMPLE_COUNT_1_BIT; colorAttachment[0].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; colorAttachment[0].loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; colorAttachment[0].storeOp = VK_ATTACHMENT_STORE_OP_STORE; colorAttachment[0].stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; colorAttachment[0].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE; colorAttachment[0].finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; if (lastpass) colorAttachment[0].finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR; VkAttachmentReference colorAttachmentRef = {}; colorAttachmentRef.attachment = 0; colorAttachmentRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; depthAttachment = {}; VkAttachmentReference depthAttachmentRef = {}; if (depthformat != VK_FORMAT_UNDEFINED) { colorAttachmentRef.attachment = 1; depthAttachment.format = depthformat; depthAttachment.samples = VK_SAMPLE_COUNT_1_BIT; depthAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; depthAttachment.initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;// VK_IMAGE_LAYOUT_UNDEFINED; depthAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE; depthAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; depthAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE; depthAttachment.finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; depthAttachmentRef.attachment = 0; depthAttachmentRef.layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; } colorAttachment[0].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; depthAttachment.initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;// VK_IMAGE_LAYOUT_UNDEFINED; subpasses.push_back( {} ); subpasses[0].pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS; subpasses[0].colorAttachmentCount = colorComponents; subpasses[0].pColorAttachments = &colorAttachmentRef; subpasses[0].pDepthStencilAttachment = NULL; if (depthformat != VK_FORMAT_UNDEFINED) subpasses[0].pDepthStencilAttachment = &depthAttachmentRef; VkAttachmentDescription attachments[2] = { colorAttachment[0], depthAttachment }; renderPassInfo.subpassCount = subpasses.size(); renderPassInfo.pAttachments = attachments; renderPassInfo.pSubpasses = subpasses.data(); VkAssert(vkCreateRenderPass(device->device, &renderPassInfo, nullptr, &pass)); return true; } This gives you an idea of just how many render passes I had to create in advance:
    // Initialize Render Passes shadowpass[0] = make_shared<RenderPass>(); shadowpass[0]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true);//, CLEAR_DEPTH, -1); shadowpass[1] = make_shared<RenderPass>(); shadowpass[1]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true, true, true, 0); if (MULTIPASS_CUBEMAP) { cubeshadowpass[0] = make_shared<RenderPass>(); cubeshadowpass[0]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true, true, true, CLEAR_DEPTH, 6); cubeshadowpass[1] = make_shared<RenderPass>(); cubeshadowpass[1]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true, true, true, 0, 6); } //shaderStages[0] = TEMPSHADER->shaderStages[0]; //shaderStages[4] = TEMPSHADER->shaderStages[4]; posteffectspass = make_shared<RenderPass>(); posteffectspass->InitializePostProcess(dynamic_pointer_cast<GPUDevice>(Self()), VK_FORMAT_UNDEFINED, 1, false); raytracingpass = make_shared<RenderPass>(); raytracingpass->InitializeRaytrace(dynamic_pointer_cast<GPUDevice>(Self())); lastposteffectspass = make_shared<RenderPass>(); lastposteffectspass->InitializeLastPostProcess(dynamic_pointer_cast<GPUDevice>(Self()), depthformat, 1, false); lastcameralastposteffectspass = make_shared<RenderPass>(); lastcameralastposteffectspass->InitializeLastPostProcess(dynamic_pointer_cast<GPUDevice>(Self()), depthformat, 1, true); { std::vector<VkFormat> colorformats = { __FramebufferColorFormat ,__FramebufferColorFormat, VK_FORMAT_R8G8B8A8_SNORM, VK_FORMAT_R32_SFLOAT }; for (int earlyZPass = 0; earlyZPass < 2; ++earlyZPass) { for (int clearflags = 0; clearflags < 4; ++clearflags) { renderpass[clearflags][earlyZPass] = make_shared<RenderPass>(); renderpass[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 1, false, false, false, clearflags, 1, earlyZPass); renderpassRGBA16[clearflags][earlyZPass] = make_shared<RenderPass>(); renderpassRGBA16[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), colorformats, depthformat, 4, false, false, false, clearflags, 1, earlyZPass); firstrenderpass[clearflags][earlyZPass] = make_shared<RenderPass>(); firstrenderpass[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 1, false, true, false, clearflags, 1, earlyZPass); lastrenderpass[clearflags][earlyZPass] = make_shared<RenderPass>(); lastrenderpass[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 1, false, false, true, clearflags, 1, earlyZPass); //for (int d = 0; d < 2; ++d) { for (int n = 0; n < 5; ++n) { if (n == 2 or n == 3) continue; rendertotexturepass[clearflags][n][earlyZPass] = make_shared<RenderPass>(); rendertotexturepass[clearflags][n][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), colorformats, depthformat, n, true, false, false, clearflags, 1, earlyZPass); firstrendertotexturepass[clearflags][n][earlyZPass] = make_shared<RenderPass>(); firstrendertotexturepass[clearflags][n][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), colorformats, depthformat, n, true, true, false, clearflags, 1, earlyZPass); // lastrendertotexturepass[clearflags][n] = make_shared<RenderPass>(); // lastrendertotexturepass[clearflags][n]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), depthformat, n, true, false, true, clearflags); } } } } } With dynamic rendering, you still have to fill in most of the same information, but you can just do it based on whatever the current state of things is, instead of looking for an object that hopefully matches the exact settings you want:
    VkRenderingInfoKHR renderinfo = {}; renderinfo.sType = VK_STRUCTURE_TYPE_RENDERING_INFO_KHR; renderinfo.renderArea = scissor; renderinfo.layerCount = 1; renderinfo.viewMask = 0; renderinfo.colorAttachmentCount = 1; targetbuffer->colorAttachmentInfo[0].imageLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[0] = 0.0f; targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[1] = 0.0f; targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[2] = 0.0f; targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[3] = 0.0f; targetbuffer->colorAttachmentInfo[0].imageView = targetbuffer->imageviews[0]; renderinfo.pColorAttachments = targetbuffer->colorAttachmentInfo.data(); targetbuffer->depthAttachmentInfo.clearValue.depthStencil.depth = 1.0f; targetbuffer->depthAttachmentInfo.clearValue.depthStencil.stencil = 0; targetbuffer->depthAttachmentInfo.imageLayout = VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL; renderinfo.pDepthAttachment = &targetbuffer->depthAttachmentInfo; device->vkCmdBeginRenderingKHR(cb->commandbuffer, &renderinfo); Then there is the way render passes effect the image layout state. With the TransitionImageLayout command, it is fairly easy to track the current state of the image layout, but render passes automatically switch the image layout after completion to a predefined state. Again, not impossible to handle, in and of itself, but when you add these things into the complexity of designing a full engine, things start to get ugly.
    void GPUCommandBuffer::EndRenderPass() { vkCmdEndRenderPass(commandbuffer); for (int k = 0; k < currentrenderpass->layers; ++k) { for (int n = 0; n < currentrenderpass->colorcomponents; ++n) { if (currentdrawbuffer->colortexture[n]) currentdrawbuffer->colortexture[n]->imagelayout[0][currentdrawbuffer->baseface + k] = currentrenderpass->colorAttachment[n].finalLayout; } if (currentdrawbuffer->depthtexture != NULL and currentrenderpass->depthcomponent == true) currentdrawbuffer->depthtexture->imagelayout[0][currentdrawbuffer->baseface + k] = currentrenderpass->depthAttachment.finalLayout; } currentdrawbuffer = NULL; currentrenderpass = NULL; } Another example where this was causing problems was with user-defined texture buffers. One beta tester wanted to implement some interesting effects that required rendering to some HDR color textures, but the system was so static it couldn't handle a user-defined color format in a texture buffer. Again, this is not impossible to overcome, but the practical outcome is I just didn't have enough time because resources are finite.
    It's interesting that this extension also removes the need to create a Vulkan framebuffer object. I guess that means you can just start rendering to any combination of textures you want, so long as they use a format that is renderable by the hardware. Vulkan certainly changes a lot of conceptions we had in OpenGL.
    So this extension does eliminate a significant source of problems for me, and I am happy it was implemented.
  11. Josh
    Previously I described how I was able to save the voxel data into a sparse octree and correctly lookup the right voxel in a shader. This shot shows that each triangle is being rasterized separately, i.e. the triangle bounding box is being correctly trimmed to avoid a lot of overlapping voxels:

    Calculating direct lighting using the sparse octree was very difficult, and took me several days of debugging. I'm not 100% sure what the problem was, other than it seems GLSL code is not quite as flexible as C++. I actually had the same exact function working in GLSL and C++, and it worked perfectly in C++ but gave wrong results in GLSL! Of course I did not have a debugger for my GLSL code, so I ended up having to write a lot of if statements and outputting a pixel color base on the result. In the end I finally tracked the problem down to some data stored in an array, changed the way the routine worked, but what the exact issue was I'll never know.
    With the sparse voxel octree, we only have about 400,000 pixels to draw when we process direct lighting. Rendering all voxels in a 256x256x256 volume texture would require 16 million pixels to be drawn. So the sparse approach requires us to draw only 2% the number of pixels we would have to otherwise. Using shadow maps, on a 1920x1080 screen we would have to calculate about 2,000,000 shadow intersections. Although we are not comparing the same exact things, this does make me optimistic for the final performance results. Basically, instead of calculating shadow visibility for each pixels, we can just calculate per voxel, and your voxels are always going to be quite a bit bigger than screen pixels. So the whole issue of balancing shadow map resolution with screen resolution goes away.
    Ray traversal is very fast because it skips large chunks of empty space, instead of checking every single grid space for a voxel.
    The voxel resolution below is not very high, I am only using one octree, and there's currently no blending / filtering, but that will all come in time.

    Leadwerks 1 and 3D World Studio used lightmaps for lighting. Later versions of Leadwerks used deferred lighting and shadowmaps. Being able to roll out another cutting-edge lighting technology in Ultra Engine is icing on the cake for the new engine. I expect this will allow particle shadows and transparent glass with colored shadows, as well as real-time global illumination and reflections, all with great performance on most hardware.
  12. Josh

    My initial implementation of mesh voxelization for ray tracing used this code. It was good for testing, but has some problems:
    It's slow, using an unnecessary and expensive x * y * z loop No support for per-voxel color based on a texture lookup There are mathematical mistakes that cause inaccuracy, and the math has to be perfect My solution addresses these problems and only uses an x * y loop to generate the voxels. It does this by identifying the major (largest magnitude) axis of the triangle normal and using the other two axes for the X and Y axis, then finding the Z position of the triangle at each grid point along the surface.
    In previous screenshots, you could see some black faces that were caused by geometry that lies outside the bounds of the voxel geometry. Some of this was caused because I was voxelizing the mesh in local space and then transforming the resulting voxels to world space. That doesn't work, because the voxel position can end up rounding off to a different coordinate than the triangle it's supposed to enclosed. The best solution is to have a low-res LOD model that is used to generate the voxel data. (It's important to make sure the voxel geometry still contains the full-resolution model.)
    In the shot below you can see every single surface has a voxel to retrieve the color from. There are no texture lookups being performed in this shot, just colored voxels that were generated by reading the image pixel at each voxel position and then stored in a GPU buffer.

    This means we can safely calculate which lights are visible at each voxel and store those light IDs in a texture to be retrieved instead of performing a shadowmap lookup. It also means we can calculate global illumination at each voxel, store it in a texture, and just do a single texture lookup to grab the GI lighting when the actual model is rendered, instead of calculating GI each frame.
    Onwards and upwards!
  13. Josh

    Previously I noted that since Voxel global illumination involves calculation of direct lighting, it would actually be possible to do away with shadow maps altogether, and use voxels for direct and global illumination. This can eliminate the problems of image-based shadows like shadow acne and adjusting the shadow map size. I also believe this method will turn out a lot faster than shadow map rendering, and you know how I like fast performance. 
    The sparse voxel octree node structure consumes 48 bytes and looks like this:
    struct SparseVoxelOctreeTreeNode { uint32_t index, parent, color, emission; uint32_t child[2][2][2]; } It might be possible to eliminate the index and parent values, but the structure size has to be aligned to 16 bytes in GPU memory anyways, so I don't see it getting any smaller.
    In my test scenario, the sparse voxel octree creates 353,345 voxels, which consumes 14% the memory of the uncompressed data, but is only a little bit smaller than compressed volume textures, and I could see the SVO data being bigger than a compressed 3D texture.
    Uncompressed, diffuse + emission
    256*256*256*4 + 256*256*256*3 = 67108864 + 50331648 = 117440512 bytes = 112 Mb
    DXT5 compressed diffuse + DXT1 compressed emission
    16777216 + 8388608 = 25165824 bytes = 24 Mb
    Sparse Voxel Octree
    353345 * 48 = 16.2 Mb
    That's just for the voxelized triangles' diffuse color. We still need size textures to store direct lighting, one for each direction on each axis. Since these are rendered to in a shader, I don't see any texture compression happening here:
    256 * 256 * 256 * 4 * 6 bytes = 384 Mb
    If we store the lit sparse voxels in a 1024x512 RGBA textures (x6), that consumes a lot less memory:
    1024 * 512 * 4 * 6 = 12 Mb
    So overall, we do see a very significant reduction in memory usage when we make the octree sparse. It's also going to be A LOT more efficient to render to one 512x1024 texture buffer (with six color attachments), instead of rendering 256 separate slices of a volume texture.
    Looking up a single value in a sparse voxel octree is more complex than a simple texture sampler, because it has to iterate through all the levels of the tree, down to the terminal node. However, ray traversal should be much faster with the sparse voxel octree, because the algorithm can efficiently skip over large empty spaces. An interesting challenge is the fact that GLSL does not support recursive function calls, so recursive functions have to be written in creative ways. This one isn't too bad, but when you hav branching pathways in a ray traversal, it can get pretty complicated:
    bool SVOTGetNodeColor(in uint x, in uint y, in uint z, out vec4 diffuse, out vec3 emission) { diffuse = vec4(0,0,0,0); uint index = 1; int maxlevels = 10; uint size = 256; if (x >= size || y >= size || z >= size) return true; uint hsize; uint px,py,pz,childindex; for (int n = 0; n < maxlevels - 1; n++) { hsize = size / 2; px = uint(x >= hsize); py = uint(y >= hsize); pz = uint(z >= hsize); index = svotnodes[index - 1].child[px * 4 + py * 2 + pz]; if (index == 0) return false; x -= px * hsize; y -= py * hsize; z -= pz * hsize; size = hsize; } diffuse = SVOTNodeGetDiffuse(index); return true; } In this shot, I am rendering the original mesh geometry and doing a texture lookup in the sparse voxel octree to find the color stored for the voxel at each point. There's a few places where the surface appears black, meaning that the point being rendered lies outside the bounds of any voxel saved. Maybe there is a problem with the precision of the triangle voxelization routine, I will have to look into this further.

    The important point is that this image is being rendered without the use of a volume texture. You are seeing the sparse voxel octree being successfully sent to and navigated within in the fragment shader.
    The next step will be to take the diffuse colors and render direct lighting into a separate texture. That means my clustered forward rendering implementation will not get used, but developing that did prepare me for what I must do next. Instead of placing all the lights into a grid oriented around the camera frustum, I need to place them in a grid in world space, with the camera at the center of the grid. This is actually quite a lot simpler. 
  14. Josh

    This is an update on my progress of our voxel raytracing system. VXRT is designed to provide all the reflection information that PBR materials use. If a picture is worth a thousand words, then this counts as a 5000 word article.
    Direct lighting:

    Global illumination:

    Specular reflection:

    Skybox component:

    Final combined image:

  15. Josh

    The Ultra Engine editor is designed to be expandable and modifiable.  Lua script is integrated into the editor and can be used to write editor extensions and even modify the scene or the editor itself in real-time. 
    We can create a scene object entirely in code and make it appear in the scene browser tree:
    box = CreateBox(editor.world) box.name = "box01" o = CreateSceneObject(box) --make editor recognize the entity and add it to the scene browser o:SetSelected(true)
    We can even modify the editor itself and start adding new features to the interface:
    editor.sidepanel.tabber:AddItem("Roads") p = CreatePanel(0,0,300,500,editor.sidepanel.tabber) button = CreateButton("Create",20,20,100,30,p)
    Of course, you would usually want to put all this code in a script file and either run the script by selecting the Script > Run Script... menu item, or placing the script in the "Scripts/Start" folder so it automatically gets run at startup. But it sure is cool to be able to experiment live with Lua right in the console and see the result of your code instantly.
  16. Josh

    A while back I wrote enthusiastically about Basis Universal super compression. KTX2 is a texture file format from Khronos, makers of the Vulkan and glTF specifications. Like DDS files, KTX2 can store multiple mipmaps, as well as memory-compressed texture formats like DXT5 and BC7. However, KTX2 now supports Basis compressed data as well, which makes it the all-in-one universal texture format. glTF has an official extension for KTX2 textures in glTF files, so it can be combined with Draco mesh compression to compress your overall game model sizes:

    Additionally, KTX2 also includes information about clamp and filter settings,. The reason I implemented the .tex texture format in Leadwerks was because DDS lacks these features and I wanted it stored in the texture file.
    I've added built-in KTX2 texture loading and saving, so you can easily save and load these files. I plan to make KTX2 the recommended texture file format for Ultra Engine.

  17. Josh

    Google Draco is a library that aims to do for mesh data what MP3 and OGG did for music. It does not reduce memory usage once a mesh is loaded, but it could reduce file sizes and improve download times. Although mesh data does not tend to use much disk space, I am always interested in optimization. Furthermore, some of the NASA models I work with are very high-poly, and do take up significant disk space. Google offers a very compelling chart showing a compression ratio of about 95%:

    However, there is not much information given about the original mesh. Is it an ASCII .obj file? Of course that would be much bigger than binary data. I wanted to get a clear look at what kind of compression ratios I could expect, within the context of glTF files. I found a farily high-poly model on SketchFab here to work with.

    This model has 2.1 million triangles and 1 million vertices. That should be plenty to test with.
    Now, glTF is actually three different file formats. Normal glTF files store JSON data and come with an extra .bin file for binary data. This stores things like vertex positions and animation data, stuff you probably won't want to edit by hand. The .glb version of the format combines JSON and binary data into a single file, which can be viewed but not edited in a text editing program. Finally, there is also base64 glTF, which stores JSON together with binary data with base64 encoding in a single file. The base64 data looks like gibberish, but the file can be opened in a text editor, modified, and resaved without destroying the binary data.
    I was very curious to see what advantage Google Draco mesh compression would offer. Would it make glTF files significantly smaller, so that your games take up less space and have faster download times?
    To answer this question, I imported the model into Blender and exported several versions. I only exported position, normal, and texture coordinates. I also loaded the uncompressed .glb file in Ultra Engine and resaved it with simple mesh quantization.

    As you can see, mesh quantization (using one byte for each normal component, plus one byte for padding, and two bytes for each texture coordinate component) combined with regular old ZIP compression comes in significantly smaller than Draco compression at the maximum compression level. It's not in the chart, but I also tried ZIP compression the smallest Draco file, and that was still bigger at 28.8 MB.
    You can look at the models yourself here:
    Based on this test, it appears that Google Draco is only marginally smaller than an uncompressed quantitized mesh, and still slightly bigger when ZIP compression are applied to both. Unless someone can show me otherwise, it does not appear that Google Draco mesh compression offers the 95% reduction in file sizes they seem to promise.
    This model was made up of several sub-objects. I collapsed the model and resaved it, and Draco now produces compression more like I was expecting to see:
    Presumably this means whatever data structure they use takes up a certain amount of space (probably an n-dimensional tree), and having fewer of these structures is more optimal.
    Here is the corrected comparison chart. This is good. Draco shrank this model to 7% the size of the uncompressed .glb export:

    This will be very useful for 3D scans and CAD models, as long as they don't contain a lot of articulated subobjects. Original model is on the left, Draco compressed model is on the right:

  18. Josh

    The glTF importer took a very long time to develop, but it much easier to write a glTF save routine. In one day I got an exporter working with support for everything except skinning and animation. To save a model in glTF format, just call Model::Save("mymodel.gltf") and it will work! Entire scenes can also be saved in glTF format.Here is a model that was loaded from Leadwerks MDL, MAT, and TEX files and saved as glTF. The textures are converted to PNG files. (Microsoft has an official extension for adding DDS textures into the file, and I plan to implement that next.)

    Take a look at the exported file in your favorite modeling application:
    So you can load a model from any format supported by import plugins, and then save it as glTF flawlessly. Or, you can set up an automatic conversion in your project settings, so that the editor will automatically convert files from one format to another any time they added or resaved in your project.

    Ultra Engine uses the most widely compatible file formats available, and loads assets directly from the standard computer file system, so your game assets are always easy to access, modify, and replace.
  19. Josh

    In Leadwerks, required files were always a slightly awkward issue. The engine requires a BFN texture and a folder of shaders, in order to display anything. One of my goals is to make the Ultra Engine editor flexible enough to work with any game. It should be able to load the folder of an existing game, even if it doesn't use Ultra Engine, and display all the models and scenes with some accuracy. Of course the Quake game directory isn't going to include a bunch of Ultra Engine shaders, so what to do?
    One solution could be to load shaders and other files from the editor directory, but this introduces other issues. My solution is to build shaders, shader families, and the default BRDF texture into the engine itself. This is done with a utility that reads a list of files to includes, then loads each one and turns it into an array in C++ code that gets compiled into the engine: The code looks like this:
    if (rpath == RealPath("Shaders/Sky.json")) { static const std::array<uint64_t, 62> data = {0x61687322090a0d7bULL,0x6c696d6146726564ULL,0xd7b090a0d3a2279ULL,0x746174732209090aULL,0x9090a0d3a226369ULL,0x66220909090a0d7bULL,0xa0d3a2274616f6cULL,0x9090a0d7b090909ULL,0x555141504f220909ULL,0x909090a0d3a2245ULL,0x90909090a0d7b09ULL,0x6c75616665642209ULL,0x909090a0d3a2274ULL,0x909090a0d7b0909ULL,0x6573616222090909ULL,0x90909090a0d3a22ULL,0x909090a0d7b0909ULL,0x7265762209090909ULL,0x5322203a22786574ULL,0x532f737265646168ULL,0x762e796b532f796bULL,0x227670732e747265ULL,0x9090909090a0d2cULL,0x6d67617266220909ULL,0x5322203a22746e65ULL,0x532f737265646168ULL,0x662e796b532f796bULL,0x227670732e676172ULL,0x909090909090a0dULL,0x9090909090a0d7dULL,0x7d090909090a0d7dULL,0xd2c7d0909090a0dULL,0x756f64220909090aULL,0x90a0d3a22656c62ULL,0x909090a0d7b0909ULL,0x45555141504f2209ULL,0x90909090a0d3a22ULL,0x9090909090a0d7bULL,0x746c756166656422ULL,0x90909090a0d3a22ULL,0x90909090a0d7b09ULL,0x2265736162220909ULL,0x9090909090a0d3aULL,0x90909090a0d7b09ULL,0x7472657622090909ULL,0x685322203a227865ULL,0x6b532f7372656461ULL,0x34365f796b532f79ULL,0x732e747265762e66ULL,0x9090a0d2c227670ULL,0x7266220909090909ULL,0x3a22746e656d6761ULL,0x7265646168532220ULL,0x6b532f796b532f73ULL,0x72662e6634365f79ULL,0xd227670732e6761ULL,0x7d0909090909090aULL,0x7d09090909090a0dULL,0xd7d090909090a0dULL,0x90a0d7d0909090aULL,0xa0d7d090a0d7d09ULL,0xcdcdcdcdcdcdcd7dULL }; auto buffer = CreateBuffer(489); buffer->Poke(0,(const char*)data.data(),489); return CreateBufferStream(buffer); } An unsigned 64-bit integer is used for the data type, as this results in the smallest generated code file size.
    Files are searched for in the following order:
    A file on the hard drive in the specified path. A file from a loaded package with the specified relative path. A file built into the engine. Therefore, if your game includes a modified version of a shader, the shader module will still be loaded from the file in your game directory. However, if you don't include any shaders at all, the engine will just fall back on its own set of shaders compiled into the core engine.
    This gives Ultra Engine quite a lot more flexibility in loading scenes and models, and allows creation of 3D applications that can work without any required files at all, while still allowing for user control over the game shaders.
    The screenshot here shows the Ultra Engine editor loading a Leadwerks project folder and displaying 3D graphics using the Ultra Engine renderer, even though the Leadwerks project does not contain any of the shaders and other files Ultra Engine needs to run:

  20. Josh

    The new editor is being designed to be flexible enough to work with any game, so it can be used for modding as well as game development with our new 3D engine. Each project has configurable settings that can be used to handle what the editor actually does when you run the game. In the case of a game like Quake, this will involve running a few executables to first compile the map you are working on into a BSP structure, then perform lightmaps and pre-calculate visibility.

    You can also set up your own custom workflow to automatically convert textures and models, using either the import / export capabilities of the editor plugins, or an external executable. In Leadwerks, this was all hard-coded with the FBX to MDL converter and a few other converters, but in the new editor it's totally configurable.

  21. Josh

    Many games store 3D models, textures, and other game files in some type of compressed package format. These can be anything from a simple ZIP file to a custom multi-file archive system. This has the benefit of making the install size of the game smaller, and can prevent users from accessing the raw files. Often times undocumented proprietary file formats are used to optimize loading time, although with DDS and glTF this is not such a problem anymore.
    Leadwerks uses built-in support for encrypted ZIP files. In our new engine I wanted the ability to load game files from a variety of package formats, so our tools would have compatibility with many different games. To support this I implemented a new plugin type for package files. Packages can be used like this:
    auto pak = LoadPackage("data.zip"); auto dir = pak->LoadDir(""); for (auto file : dir) { if (pak->FileType(file) == 1) { auto stream = pak->ReadFile(); } } I created a package plugin for loading Valve package files and added support for browsing packages in the new editor, alongside with regular old folders. Ever wanted to see what the insides of some of your favorite games look like? Now you can:

    This can work not just for textures, but for materials, 3D models, and even scene or map files.
    I envision this system being flexible enough to support a wide variety of games, so that the new editor can be used not just as a game development tool but as a tool for modding games, even for games that don't have any official tools. All it takes is the right set of plugins to pull all those weird specialized file formats into our editor and export again in a game-ready format.
  22. Josh

    At last I have been able to work the plugin system into the new editor and realize my dreams.
    The editor automatically detects supported file formats and generates thumbnails for them. (Thumbnails are currently compatible with the Leadwerks system, so Leadwerks can read these thumbnail files and vice-versa.) If no support for a file format is found, the program just defaults to the whatever icon or thumbnail Windows shows.
    The options dialog includes a tab where you can examine each plugin in detail. I plan to allow disabling of individual plugins, like how it works in 3ds Max.
    It's completely possible that this editor could be used to mod existing games with the right set of plugins. I want to try doing this with Source games and see how easily I can load levels up. In the image below, the new editor is browsing an unmodified Leadwerks project, using a plugin to provide support for loading Leadwerks TEX files.

    I wrote about some of these ideas a while ago:
  23. Josh

    I've been wracking my brain trying to decide what I want to show at the upcoming conference, and decided I should get the new editor in a semi-workable state. I started laying out the interface two days ago. To my surprise, the whole process went very fast and I discovered some cool design features along the way.
    With the freedom and control I have with the new user interface system, I was able to make the side panel extend all the way to the top and bottom of the window client area. This gives you a little more vertical space to work with.
    The object bar on the left also extends higher and goes all the way down the side, so there is room for more buttons now.
    The toolbar only spans the width of the viewport area, and has only the most common buttons you will need.
    Right now, I am showing all files in the project, not just game files. If it's a model or texture file the editor will generate a rendered thumbnail, but for other files it just retrieves the thumbnail image from Windows for that file type.

    All in all I am very pleased with the design and pleasantly surprised how quickly I am able to implement editor features.
  24. Josh

    Ultra App Kit 1.2 is now available on our site and on Steam. This is a bug fix update that resolves numerous small issues reported in the bug reports forum.
    To download the latest version, see My Purchases.
  25. Josh

    One of my goals in Ultra Engine is to avoid "black box" file formats and leave all game assets in common file formats that can be easily read in a variety of programs. For this reason, I put a lot of effort into the Pixmap class and the DDS load and save capabilities.
    In Ultra Engine animated textures can be stored in a volume texture. To play the animation, the W component of the UVW texcoord is scrolled. The fragment shader will sample the volume texture between the nearest two slices on the Z axis of the texture, resulting in a smooth transition between frames using linear interpolation. There's no need to constantly swap a lot of textures in a material, as all animation frames are packed away in a single DDS file.
    The code below shows how multiple animation frames can be loaded and saved into a 3D texture:
    int framecount = 128; std::vector<shared_ptr<Pixmap> > pixmaps(framecount); for (int n = 0; n < framecount; ++n) { pixmaps[n] = LoadPixmap("https://raw.githubusercontent.com/Leadwerks/Documentation/master/Assets/Materials/Animations/water1_" + String(n) + ".png"); } SaveTexture("water1.dds", TEXTURE_3D, pixmaps, framecount); Here is the animation playing in the engine:

    My new video project1.mp4 The resulting DDS file is 32 MB for a 256x256x128 RGBA texture:
    You can open this DDS file in Visual Studio and view it. Note that the properties indicate this is the first slice of 128, verifying that our texture does contain the animation data:

    Adding Mipmaps
    The DDS format supports mipmaps in volume textures. A volume mipmap is just a lower-resolution image of the original, with all dimensions half the size of the previous frame, with a minimum dimension of 1. They are stored in the DDS file in descending order. The code below is a little complicated, but it will reliably compute mipmaps for any volume texture. Note the code is creating another STL vector called "mipchain" where all slices of all mipmaps are stored in order:
    auto plg = LoadPlugin("Plugins/FITextureLoader.dll"); int framecount = 128; std::vector<shared_ptr<Pixmap> > pixmaps(framecount); for (int n = 0; n < framecount; ++n) { pixmaps[n] = LoadPixmap("https://raw.githubusercontent.com/Leadwerks/Documentation/master/Assets/Materials/Animations/water1_" + String(n) + ".png"); } //Build mipmaps iVec3 size = iVec3(pixmaps[0]->size.x, pixmaps[0]->size.y, pixmaps.size()); auto mipchain = pixmaps; while (true) { auto osize = size; size.x = Max(1, size.x / 2); size.y = Max(1, size.y / 2); size.z = Max(1, size.z / 2); for (int n = 0; n < size.z; ++n) { auto a = pixmaps[n * 2 + 0]; auto b = pixmaps[n * 2 + 1]; auto mipmap = CreatePixmap(osize.x, osize.x, pixmaps[0]->format); for (int x = 0; x < pixmaps[0]->size.x; ++x) { for (int y = 0; y < pixmaps[0]->size.y; ++y) { int rgba0 = a->ReadPixel(x, y); int rgba1 = b->ReadPixel(x, y); int rgba = RGBA((Red(rgba0)+Red(rgba1))/2, (Green(rgba0) + Green(rgba1)) / 2, (Blue(rgba0) + Blue(rgba1)) / 2, (Alpha(rgba0) + Alpha(rgba1)) / 2); mipmap->WritePixel(x, y, rgba); } } mipmap = mipmap->Resize(size.x, size.y); pixmaps[n] = mipmap; mipchain.push_back(mipmap); } if (size == iVec3(1, 1, 1)) break; } SaveTexture("water1.dds", TEXTURE_3D, mipchain, framecount); The resulting DDS file is a little bigger (36.5 MB) because it includes the mipmaps.
    We can open this DDS file in Visual Studio and verify that the mipmaps are present and look correct:

    Texture Compression
    Volume textures can be stored in compressed texture formats. This is particularly useful for volume textures, since they are so big. Compressing all the mipmaps in a texture before saving can be easily done by replacing the last line of code in the previous example with the code below. We're going to use BC5 compression because this is a normal map.
    //Compress all images for (int n = 0; n < mipchain.size(); ++n) { mipchain[n] = mipchain[n]->Convert(TEXTURE_BC5); } SaveTexture("water1.dds", TEXTURE_3D, mipchain, framecount); The resulting DDS file is just 9.14 MB, about 25% the size of our uncompressed DDS file.
    When we open this file in Visual Studio, we can verify the texture format is BC5 and the blue channel has been removed. (Only the red and green channels are required for normal maps, as the Z component can be reconstructed in the fragment shader): Other types of textures may use a different compression format.

    This method can be used to make animated water, fire, lava, explosions and other effects packed away into a single DDS file that can be easily read in a variety of programs.
  • Create New...