Jump to content


  • Posts

  • Joined

  • Last visited

Blog Entries posted by Josh

  1. Josh
    Previously, I showed how to create a terrain data set from a single 32768x32768 heightmap. The files have been uploaded to our Github account here. We will load data directly from the Github repository with our load-from-URL feature because this makes it very easy to share code examples. Also, even if you fly around the terrain for a long time, you are unlikely to ever need to download the complete data set. Think about Google Earth. How long would it take you to view the entire planet at full resolution? It's more ground than you can cover, so there is no need to download the whole set.
    Creating a streaming terrain is similar to a regular terrain. We set the terrain resolution to 32768, which is the maximum resolution we have height data for. The terrain is split into patches of 64x64 tiles. We also supply a URL and a callback function to load sections of terrain data.
    auto terrain = CreateTerrain(world, 32768, 64, "https://github.com/Leadwerks/Documentation/raw/master/Assets/Terrain/32768", FetchPatchInfo); Let's take a look at the FetchPatchInfo callback function. The function receives the terrain and a structure that contains information about the section we want to grab.
    void FetchPatchInfo(shared_ptr<StreamingTerrain> terrain, TerrainPatchInfo& patchinfo) The TerrainPatchInfo structure looks like this:
    struct TerrainPatchInfo { iVec2 position; iVec2 size; int level; shared_ptr<Pixmap> heightmap; shared_ptr<Pixmap> normalmap; }; The most important parts of the TerrainPatchInfo structure are the position (iVec2), size (iVec2), and level (int) members. The level indicates the resolution level we are grabbing info for. Like model LODs, zero is the highest-resolution level and as the level gets higher, the resolution gets lower. At ground level we are likely to be viewing level 0 data. If we are looking down on the terrain from space we viewing the highest level, with the lowest resolution. Since our terrain is 32768x32768 and our patch size is 64, we can go up nine levels of detail before the terrain data fits into a single 64x64 heightmap. If you need to, take a look back at how we generated LOD data in my earlier article.
    The position parameter tells us where on the terrain the patch lies. The meaning of this value changes with the level we are at. Let's consider a small 4x4 terrain. At level 0, the maximum resolution, the patch positions are laid out like this:

    If we go up one level (1) we have a terrain made up of just four patches. Notice that the patch with position (1,1) is now in the lower-right hand corner of the terrain, even though in the previous image the tile with position (1,1) is in the upper left quadrant.

    At the highest LOD level (2) there is just a single tile with position (0,0):

    As you can see, the tile position by itself doesn't give us an accurate picture of where the tile is located. We need the LOD level to know what this value actually means.
    The size parameter tells us the size of the pixel data the patch expects to use. This will be the terrain patch size plus one. Why is it bigger than the terrain patch size we indicated in the CreateTerrain command? This is because an NxN patch of tiles uses (N+1)x(N+1) vertices. The 4x4 patch of tiles below uses 5x5 vertices. Since height and normal data is read per-vertex we need our texture data to match this layout. (This is not optimal for textures, which work best at power-of-two resolutions, but don't worry. The engine will copy these patches into a big texture atlas which is a power-of-two size.)

    To load height data, we just convert our level, x, and y values into a string and load the appropriate heightmap from our online repository. We have control over this because we previously saved all our heightmaps with the same naming convention.
    First we will load a 64x64 pixmap and copy that to the patch height data. This will cover most of the pixels except a one-pixel line along the right and lower edge:
    //Create path to heightmap file WString heightmappath = terrain->datapath + L"/LOD" + WString(patchinfo.level) + L"/" + WString(patchinfo.position.x) + L"_" + WString(patchinfo.position.y) + L".dds"; //Load heightmap patchinfo.heightmap = CreatePixmap(patchinfo.size.x + 1, patchinfo.size.y + 1, TEXTURE_RED16); //Load most of the patch auto pixmap = LoadPixmap(heightmappath, 0, 0, LOAD_QUIET); if (pixmap) { Assert(pixmap->size.x + 1 == patchinfo.heightmap->size.x); Assert(pixmap->size.y + 1 == patchinfo.heightmap->size.y); pixmap->CopyRect(0,0,pixmap->size.x,pixmap->size.y,patchinfo.heightmap,0,0); } Next we need to fill in the right edge of the height data. If we have not reached the edge of the terrain, we can load the next tile to the right and copy it's left edge into the right edge of our height data. The CountPatches() method will tell us how many patches the terrain has at this resolution. If we have reached the edge of the terrain, we just copy the column of pixels that is one pixel from the right edge:
    iVec2 patches = terrain->CountPatches(patchinfo.level); if (patchinfo.position.x < patches.x - 1) { //Copy left edge of the tile to the right of this one to the right edge of the patch WString path = terrain->datapath + L"/LOD" + WString(patchinfo.level) + L"/" + WString(patchinfo.position.x + 1) + L"_" + WString(patchinfo.position.y) + L".dds"; auto pixmap = LoadPixmap(path, 0, 0, LOAD_QUIET); if (pixmap) pixmap->CopyRect(0, 0, 1, pixmap->size.y, patchinfo.heightmap, patchinfo.heightmap->size.x - 1, 0); } else { //Edge of terrain reached, so copy the pixels second to last from the edge to the edge for (int y = 0; y < patchinfo.heightmap->size.y; ++y) { patchinfo.heightmap->WritePixel(patchinfo.heightmap->size.x - 1, y, patchinfo.heightmap->ReadPixel(patchinfo.heightmap->size.x - 2, y)); } } We will do basically the same thing to fill in the bottom edge:
    if (patchinfo.position.y < patches.y - 1) { //Copy top edge of the tile beneath this one to the bottom edge of the patch WString path = terrain->datapath + L"/LOD" + WString(patchinfo.level) + L"/" + WString(patchinfo.position.x) + L"_" + WString(patchinfo.position.y + 1) + L".dds"; auto pixmap = LoadPixmap(path,0,0,LOAD_QUIET); if (pixmap) pixmap->CopyRect(0, 0, pixmap->size.x, 1, patchinfo.heightmap, 0, patchinfo.heightmap->size.y - 1); } else { //Edge of terrain reached, so copy the pixels second to last from the edge to the edge for (int x = 0; x < patchinfo.heightmap->size.x; ++x) { patchinfo.heightmap->WritePixel(x, patchinfo.heightmap->size.y - 1, patchinfo.heightmap->ReadPixel(x, patchinfo.heightmap->size.y - 2)); } } We have to also fill in the very lower-right pixel:
    if (patchinfo.position.x < patches.x - 1 and patchinfo.position.y < patches.y - 1) { //Copy top edge of the tile beneath this one to the bottom edge of the patch WString path = terrain->datapath + L"/LOD" + WString(patchinfo.level) + L"/" + WString(patchinfo.position.x + 1) + L"_" + WString(patchinfo.position.y + 1) + L".dds"; auto pixmap = LoadPixmap(path, 0, 0, LOAD_QUIET); if (pixmap) pixmap->CopyRect(0, 0, 1, 1, patchinfo.heightmap, patchinfo.heightmap->size.x - 1, patchinfo.heightmap->size.y - 1); } else { //Write the lower-right pixel patchinfo.heightmap->WritePixel(patchinfo.heightmap->size.x - 1, patchinfo.heightmap->size.y - 1, patchinfo.heightmap->ReadPixel(patchinfo.heightmap->size.x - 2, patchinfo.heightmap->size.y - 2)); } We have our height data completely loaded now. Next we're going to generate the normal map ourselves. You data set might have normal maps already stored, but I prefer to generate these myself because normals need to be very precise and they must be generated for the exact height the terrain is being scaled to, in order for tessellated vertices to appear correctly. (Normals actually can't be scaled because it's a rotation problem, not a vector problem.)
    //Calculate the normal map - I'm not 100% sure on the height factor patchinfo.normalmap = patchinfo.heightmap->MakeNormalMap(TerrainHeight / pow(2,patchinfo.level), TEXTURE_RGBA); One thing to be very careful of is that the FetchPatchInfo callback is called on its own thread, and may be called several times at once on different threads to load data for different tiles. Any code executed in this function must be thread-safe! The nice thing about this is the engine does not stall if a patch of terrain is taking a long time to load.
    That's all it takes to get a streaming terrain up and running. You can replace the FetchPatchInfo callback with your own function and load data from any source. Texture layers / colors are something I am still working out, but this gives you all you need to display flat terrains with streaming data for big games and multi-domain simulations. Here are the results:
    Next we will begin warping the terrain geometry into arbitrary shapes in order to display planetary data with various projection methods.
  2. Josh
    Being able to support huge worlds is great, but how do you fill them up with content? Loading an entire planet into memory all at once isn't possible, so we need a system that allows us to stream terrain data in and out of memory dynamically. I wanted a system that could load data from any source, including local files on the hard drive or online GIS sources. Fortunately, I developed most of this system last spring and I am ready to finish it up now.
    Preparing Terrain Data
    The first step is to create a set of data to test with. I generated a 32768x32768 terrain using L3DT. This produces a 2.0 GB heightmap. The total terrain data with normals, terrain layers, and other data would probably exceed 10 GB, so we need to split this up into smaller pieces.
    Loading a 2 GB file into memory might be okay, but we have some special functionality in the new engine that can help with this. First, some terminology: A Stream is an open file that can be read from or written to. A Buffer is a block of memory that can have values poked / peeked at a specific offset. (This is called a "Bank" in Leadwerks 4.) A BufferStream  is a block of memory with an internal position value that allows reading and writing with Stream commands. We also have the new StreamBuffer class, which allows you to use Buffer commands on a file on the hard drive! The advantage here is you can treat a BufferStream like it's a big block of memory without actually loading the entire file into memory at once.
    Our Pixmap class allows easy manipulation, copying, and conversion of pixel data. The CreatePixmap() function can accept a Buffer as the source of the pixel data. The StreamBuffer class is derived from the Buffer class, so we can create a StreamBuffer from a file and then create a 32768x32768 pixmap without actually loading the data into memory like so:
    auto stream = ReadFile("Terrain/32768/32768.r16"); auto buffer = CreateStreamBuffer(stream,0,stream->GetSize()); auto pixmap = CreatePixmap(32768, 32768, TEXTURE_R16, buffer); So at this point we have a 32768x32768 heightmap that can be manipulated without actually using any memory.
    Next we are going to split the pixmap up into a bunch of smaller pixmaps and save each one as a separate file. To do this, we will create a single 1024x1024 pixmap:
    auto dest = CreatePixmap(1024, 1024, TEXTURE_R16); Then we simply walk through the original heightmap, copy a 1024x1024 patch of data to our small heightmap, and save each patch as a separate file in .dds format:
    CreateDir("Terrain/32768/LOD0"); for (int x = 0; x < pixmap->size.x / 1024; ++x) { for (int y = 0; y < pixmap->size.y / 1024; ++y) { pixmap->CopyRect(x * 1024, y * 1024, 1024, 1024, dest, 0, 0); dest->Save("Terrain/32768/LOD0/" + String(x) + "_" + String(y) + ".dds"); } } We end up with a set of 1024 smaller heightmap files. (I took this screenshot while the program was still processing, so at the time there were only 411 files saved.)

    Creating LODs
    When you are working with large terrains it is necessary to store data at multiple resolutions. The difference between looking at the Earth from orbit and at human-scale height is basically like the difference between macroscopic and microscopic viewing. (Google Earth demonstrates this pretty well.) We need to take our full-resolution data and resample it into a series of lower-resolution data sets. We can do that all in one go with the following code:
    int num = 32; // =32768/1024 int lod = 0; while (num > 0) { CreateDir("Terrain/32768/LOD" + String(lod+1)); for (int x = 0; x < num / 2; ++x) { for (int y = 0; y < num / 2; ++y) { auto pm00 = LoadPixmap("Terrain/32768/LOD" + String(lod) + "/" + String(x * 2 + 0) + "_" + String(y * 2 + 0) + ".dds"); auto pm10 = LoadPixmap("Terrain/32768/LOD" + String(lod) + "/" + String(x * 2 + 1) + "_" + String(y * 2 + 0) + ".dds"); auto pm01 = LoadPixmap("Terrain/32768/LOD" + String(lod) + "/" + String(x * 2 + 0) + "_" + String(y * 2 + 1) + ".dds"); auto pm11 = LoadPixmap("Terrain/32768/LOD" + String(lod) + "/" + String(x * 2 + 1) + "_" + String(y * 2 + 1) + ".dds"); pm00 = pm00->Resize(512, 512); pm10 = pm10->Resize(512, 512); pm01 = pm01->Resize(512, 512); pm11 = pm11->Resize(512, 512); pm00->CopyRect(0, 0, 512, 512, dest, 0, 0); pm10->CopyRect(0, 0, 512, 512, dest, 512, 0); pm01->CopyRect(0, 0, 512, 512, dest, 0, 512); pm11->CopyRect(0, 0, 512, 512, dest, 512, 512); dest->Save("Terrain/32768/LOD" + String(lod + 1) + "/" + String(x) + "_" + String(y) + ".dds"); } } num /= 2; lod++; } The LOD1 folder then contains 256 1024x1024 heightmaps. The LOD2 folder contains 64, and so on, all the way to LOD 5 which contains the entire terrain downsampled into a single 1024x1024 heightmap:

    Now we have a multi-resolution data set that can be dynamically loaded into the engine. (If we were loading data from an online GIS data set it would probably already be set up like this.) The next step will be to set up a custom callback function that handles the data loading.
  3. Josh
    In my work with NASA we visualize many detailed CAD models in VR. These models may consist of tens of millions of polygons and thousands of articulated sub-objects. This often results in rendering performance that is bottlenecked by the vertex rather than the fragment pipeline. I recently performed some research to determine how to maximize our rendering speed in these situations.
    Leadwerks 4 used separate vertex buffers, but in Leadwerks 5 I have been working exclusively with interleaved vertex buffers. Data is interleaved and packed tightly. I always knew this could make a small improvement in speed, but I underestimated how important this is. Each byte in the data makes a huge impact. Now vertex colors and the second texture coordinate set are two vertex attributes that are almost never used. I decided to eliminate these. If required, this data can be packed into a 1D texture, applied to a material, and then read in a custom vertex shader, but I don't think the cost of keeping this data in the default vertex structure is justified. By reducing the size of the vertex structure I was able to make rendering speed in vertex-heavy scenarios about four times faster.
    Our vertex structure has been cut down to a convenient 32 bytes:
    struct Vertex {     Vec3 position;     short texcoords[2];     signed char normal[3];     signed char displacement;     signed char tangent[4];     unsigned char boneweights[4];     unsigned char boneindices[4]; }; I created a separate vertex buffer for rendering shadow maps, which only require position data. I decided to copy the position data into this and store it separately. This requires about 15% more vertex memory usage, but results in a much more compact vertex structure for faster shadow rendering. I may pack the vertex texture coordinates in there, since that would result in a 16-byte-aligned structure. I did not see any difference in performance on my Nvidia card and I suspect this is the same cost as a 12-byte structure on most hardware.
    Using unsigned shorts instead of unsigned integers for mesh indices increases performance by 11%.
    A vertex-limited scene is one in which our default setting of using an early Z-pass can be a disadvantage, so I added an option to disable this on a per-camera basis.
    Finally, I found that vertex cache optimization tools can produce a significant performance increase. I implemented two different libraries. In order to do this, I added a new plugin function for filtering a mesh:
    int FilterMesh(char* filtername, char* params, GMFSDK::GMFVertex*& vertices, uint32_t& vertex_count, uint32_t*& indices, uint32_t& indice_count, int polygonpoints); This allows you to add new mesh processing routines such as flipping the indices of a mesh, calculating normals, or performing mesh modifications like bending, twisting, distorting, etc. Both libraries resulted in an additional 100% increase in framerate in vertex-limited scenes.
    What will this help with? These optimizations will make a real difference when rendering CAD models and point cloud data.
  4. Josh
    A beta update is available.
    The ray tracing system is now using a smaller 128x128x128 grid. There is still only one single grid that does not move. Direct lighting calculation has been moved to the GPU. The GI will appear darker and won't look very good. Additional shader work is needed to make the data look right, and I probably need to implement a compute shader for parts of it. The system is now dynamic, although it current has a lot of latency. GI renders only get triggered when something moves, so if everything is still the GI data will not be updated. There is a lot of work left to do, but I wanted to get the structure of the program in place first and then refine everything.
    I tested the TEX loader plugin and it appeared to work fine with bluegrid.tex, so I did not investigate any further.
    I started to implement a more sophisticated custom pixel shader function but I realized I didn't really know how to do it. Should the normal map lookup take place before this function, or be skipped entirely? What if the user modified the texture coordinates? The whole thing is not as simple as I thought and I need to think about it more.
    With those stipulations stated, this is a good intermediate update.
  5. Josh
    So far the new Voxel ray tracing system I am working out is producing amazing results. I expect the end result will look like Minecraft RTX, but without the enormous performance penalty of RTX ray tracing.
    I spent the last several days getting the voxel update speed fast enough to handle dynamic reflections, but the more I dig into this the more complicated it becomes. Things like a door sliding open are fine, but small objects moving quickly can be a problem. The worst case scenario is when the player is carrying an object in front of them. In the video below, the update speed is fast, but the limited resolution of the voxel grid makes the reflections flash quite a lot. This is due to the reflection of the barrel itself. The gun does not contribute to the voxel data, and it looks perfectly fine as it moves around the scene, aside from the choppy reflection of the barrel in motion.
    The voxel resolution in the above video is set to about 6 centimeters. I don't see increasing the resolution as an option that will go very far. I think what is needed is a separation of dynamic and static objects. A sparse voxel octree will hold all static objects. This needs to be precompiled and it cannot change, but it will handle a large amount of geometry with low memory usage. For dynamic objects, I think a per-object voxel grid should be used. The voxel grid will move with the object, so reflections of moving objects will update instantaneously, eliminating the problem we see above.
    We are close to having a very good 1.0 version of this system, and I may wrap this up soon, with the current limitations. You can disable GI reflections on a per-object basis, which is what I would recommend doing with dynamic objects like the barrels above. The GI and reflections are still dynamic and will adjust to changes in the environment, like doors opening and closing, elevators moving, and lights moving and turning on and off. (If those barrels above weren't moving, showing their reflections would be absolutely no problem, as I have demonstrated in previous videos.)
    In general, I think ray tracing is going to be a feature you can take advantage of to make your games look incredible, but it is something you have to tune. The whole "Hey Josh I created this one weird situation just to cause problems and now I expect you to account for this scenario AAA developers would purposefully avoid" approach will not work with ray tracing. At least not in the 1.0 release. You're going to want to avoid the bad situations that can arise, but they are pretty easy to prevent. Perhaps I can combine screen-space reflections with voxels for reflections of dynamic objects before the first release.
    If you are smart about it, I expect your games will look like this:
    I had some luck with real-time compression of the voxel data into BC3 (DXT5) format. It adds some delay to the updating, but if we are not trying to show moving reflections much then that might be a good tradeoff. Having only 25% of the data being sent to the GPU each frame is good for performance.
    Another change I am going to make it a system that triggers voxel refreshes, instead of constantly updating it no matter what. If you sit still and nothing is moving, then the voxel data won't get recalculated and processed, which will make the performance even faster. This makes sense if we expect most of the data to not change each frame.
    I haven't run any performance benchmarks yet, but from what I am seeing I think the performance penalty for using this system will be basically zero, even on integrated graphics. Considering what a dramatic upgrade in visuals this provides, that is very impressive.
    In the future, I think I will be able to account for motion in voxel ray tracing, as well as high-definition polygon raytracing for sharp reflections, but it's not worth delaying the release of the engine. Hopefully in this article I showed there are many factors, and many approaches we are can use to try to optimize for different aspects of the effect. For the 1.0 release of our new engine, I think we want to emphasize performance above all else.
  6. Josh

    Crowdfunding campaigns are a great way to kick off marketing for a game or product, with several benefits.
    Free promotion to your target audience. Early validation of an idea before you create the product. A successful crowdfunding campaign demonstrates organic consumer interest, which makes bloggers and journalists much more willing to give your project coverage. Oh yeah, there's also the financial aspect, but that's actually the least important part. If you make $10,000 in crowdfunding, you can leverage that campaign to make far more than that amount in sales of your final product. I did over a million dollars in sales on Steam starting with a $40,000 Kickstarter project.
    There are two types of crowdfunding projects. The first is something you don't really want to do unless you get paid enough to make it worthwhile. For this type of project you should set a goal for the minimum amount of money you would be able to finish the project for. There is more uncertainty with this type of campaign, but if you don't meet your goal you don't have to deliver anything. Failing early can be a good thing, because there's nothing worse than building a product and having nobody buy it. With the successful Leadwerks for Linux Kickstarter campaign, people were asking for Linux support and I said "Okay, put your money where your mouth is" and they did.
    The second type of project is something you would probably do anyways, and a crowdfunding campaign just gives you a way to test demand and make some extra cash early. For this type of project you should set a relatively low goal, something you think you can earn quickly. If your campaign fails, that puts you in an awkward position because then you have to either cancel the project or admit you didn't actually need the money. A successful campaign does put you on the hook with a delivery date and a firm description of the product, so make sure your goals are realistic and attainable within your planned time frame.
    For a campaign to be successful you need to prepare. Don't just kick off a campaign without having an existing fanbase. You need to build an email list of people interested in your project before the campaign starts. But if you haven't done that yet, there is another way...
    With my crowdfunding campaign for the new engine coming up in October, there is an opportunity for others to latch on to the success of the upcoming campaign. I have an extensive email list I don't use very often, and my more formal blog articles regularly get 20,000+ views. Plus I now have some reach on Steam, and a lot more customers than back in 2013. I expect my campaign will hit its target goal within the first few days. Once my goal is reached, it would be easy for me to post an announcement saying "Oh hey, check out these other projects built with my technology" and add links on my project page. Your project could link back to mine and to others, and we can create a network of projects utilizing the new game engine technology. I think my new campaign will be very successful, and jumping onto that will probably give you a better result than you would get otherwise.
    Another thing to consider is that with the new ray-tracing technology, even simple scenes look incredible. I think there is a temporary window of opportunity where games that utilize this type of technology will stand out and automatically get more attention because the graphics look so dramatically better. My final results will make your game look like the shot from Minecraft RTX below, but the voxel method I am using will run fast on all hardware:

    So if you have a game project made with the new engine, or something that would look good in the new engine, there is an opportunity to piggyback your crowdfunding campaign off of mine. What makes a good game pitch? Demonstrating gameplay, having a playable demo, a track record of past published games, and gameplay videos all make a much better case than pages of bullet points. (I like animated GIFs because they show a lot more than a static screenshot but they are dead simple and fun.) You need to inspire the audience to believe in your concept, and for them to believe in your ability to deliver. So put your best foot forward!
  7. Josh
    I've been working to make my previously demonstrated voxel ray tracing system fully dynamic. Getting the voxel data to update fast enough was a major challenge, and it forced me to rethink the design. In the video below you can see the voxel data being updated at a sufficient speed. Lighting has been removed, as I need to change the way this runs.
    I plan to keep two copies of the data in memory and let the GPU interpolate smoothly in between them, in order to smooth out the motion. Next I need to add the direct lighting and GI passes back in, which will add an additional small delay but hopefully be within a tolerable threshold.
  8. Josh
    A new beta update is available. The raytracing implementation has been sped up significantly. The same limitations of the current implementation still apply, but the performance will be around 10x faster, as the most expensive part of the raytrace shader has been precomputed and cached.
    The Material::SetRefraction method has also been exposed to Lua. The Camera::SetRefraction method is now called "SetRefractionMode".
    The results are so good, I don't have any plans to use any kind of screen-space reflection effect.
  9. Josh
    An update is available for beta testers.
    All Lua errors should now display the error message and open the script file and go to the correct line the error occurs on.
    The voxel raytracing system is now accessible. To enable it, just call Camera:SetGIMode(true).
    At this time, only a single voxel grid with dimensions of 32 meters, centered at the origin is in use. The voxel grid will only be generated once, at the time the SetGIMode() method is called. Only the models that have already been loaded will be included when the voxel grid is built. Building takes several seconds in debug mode but less than one second in release. Raytraced GI and reflections do not take into account material properties yet, so there is no need to adjust PBR material settings at this time. Skyboxes and voxels are not currently combined. Only one or the other is shown. Performance is much faster than Nvidia RTX but still has a lot of room for improvement. If it is too slow for you right now, use a smaller window resolution. It will get faster as I work on it more. The raytracing stuff makes such a huge difference that I wanted to get a first draft out to the testers as quickly as possible. I am very curious to see what you are able to do with it.
  10. Josh
    PBR materials look nice, but their reflections are only as good as the reflection data you have. Typically this is done with hand-placed environment probes that take a long time to lay out, and display a lot of visual artifacts. Nvidia's RTX raytracing technology is interesting, but it struggles to run old games on a super-expensive GPU, My goal in Leadwerks 5 is to have automatic reflections and global illumination that doesn't require any manual setup, with fast performance.
    I'm on the final step of integrating our voxel raytracing data into the standard lighting shader and the results are fantastic. I found I could compress the 3D textures in BC3 format in real-time and save a ton of memory that way. However, I discovered that only about 1% of the 3D voxel texture actually has any data in it! That means there could be a lot of room for improvement with a sparse voxel octree texture of some sort, which could allow greater resolution. In any case, the remaining implementation of this feature will be very interesting. (I believe the green area on the back wall is an artifact caused by the BC3 compression.)
    I think I can probably render the raytracing component of the scene in a separate smaller buffer and the denoise it like I did with SSAO to make the performance hit negligible on this. Another interesting thing is that the raytracing automatically creates it's own ambient occlusion effect.
    Here is the current state, showing the raytraced component only. It works great with our glass refraction effects.
    Next I will start blending it into the PBR material lighting calculation a little better.
    Here's an updated video that shows it worked into the lighting more:
  11. Josh
    An update is available that adds the new refraction effect. It's very easy to create a refractive transparent material:
    auto mtl = CreateMaterial(); mtl->SetTransparent(true); mtl->SetRefraction(0.02); The default FPS example shows some nice refraction, with two overlapping layers of glass, with lighting on all layers. It looks great with some of @TWahl's PBR materials.

    If you want to control the strength of the refraction effect on a per-pixel basis add an alpha channel to your normal map.
    I've configured the launch.json for Visual Studio Code so that the current selected file is passed to the program in the command line. By default, game executable will run the "Scripts/Main.lua" file. If however, the current selected Lua file in the VSCode IDE is a file located in "Scripts/Examples" the executable will launch that one instead. This design allows you to quickly run a different script without overwriting Main.lua, but won't accidentally run a different script if you are working on something else.

    The whole integration with Visual Studio Code has gotten really nice.

    A new option "frameBufferColorFormat" is added to the Config/settings.json file to control the default color format for texture buffers .I have it set to 37 (VK_FORMAT_R8G8B8A8_UNORM) but you can set it to 91 (VK_R16G16B16A16_UNORM) for high-def color, but you probably won't see anything without an additional tone mapping post-processing effect.
    Slow performance in the example game has been fixed. There are a few things going on here. Physics weren't actually the problem, it was the Lua debugger. The biggest problem was an empty Update() function that all the barrels had in their script. Now, this should not really be a problem, but I suspect the routine in vscode-debugger.lua that finds the matching chunk name is slow and can be optimized quite a lot. I did not want to make any additional changes to it right now, but in the future I think this can be further improved. But anyways, the FPS example will be nice and snappy now and runs normally.
    Application shut down will be much faster now, as I did some work to clean up the way the engine cleans itself up upon termination.
  12. Josh
    Heat haze is a difficult problem. A particle emitter is created with a transparent material, and each particle warps the background a bit. The combined effect of lots of particles gives the whole background a nice shimmering wavy appearance. The problem is that when two particles overlap one another they don't blend together, because the last particle drawn is using the background of the solid world for the refracted image. This can result in a "popping" effect when particles disappear, as well as apparent seams on the edges of polygons.

    In order to do transparency with refraction the right way, we are going to render all our transparent objects into a separate color texture and then draw that texture on top of the solid scene. We do this in order to accommodate multiple layers of transparency and refraction. Now, the correct way to handle multiple layers would be to render the solid world, render the first transparency object, then switch to another framebuffer and use the previous framebuffer color attachment for the source of your refraction image. This could be done per-object, although it could get very expensive, flipping back and forth between two framebuffers, but that still wouldn't be enough.
    If we render all the transparent surfaces into a single image, we can blend their normals, refractive index, and other properties, and come up with a single refraction vector that combined the underlying surfaces in the best way possible.
    To do this, the transparent surface color is rendered into the first color attachment. Unlike deferred lighting, the pixels at this point are fully lit.

    The screen normals are stored in an additional color attachment. I am using world normals in this shot but later below I switched to screen normals:

    These images are drawn on top of the solid scene to render all transparent objects at once. Here we see the green box in the foreground is appearing in the refraction behind the glass dragon.

    To prevent this from happening, we need add another color texture to the framebuffer and render the pixel Z position into it. I am using the R32_SFLOAT format. I use the separate blend mode feature in Vulkan, and set the blend mode to minimum so that the smallest value always gets saved in the texture. The Z-position is divided by the camera far range in the fragment shader, so that the saved values are always between 0 and 1. The clear color for this attachment is set to 1,1,1,1, so any value written into the buffer will replace the background. Note this is the depth of the transparent pixels, not the whole scene, so the area in the center where the dragon is occluded by the box is pure white, since those pixels were not drawn.

    In the transparency pass, the Z position of the transparent pixel is compared to the Z position at the refracted texcoords. If the refracted position is closer to the camera than the transparent surface, the refraction is disabled for that pixel and the background directly behind the pixel is shown instead. There is some very slight red visible in the refraction, but no green.

    Now let's see how well this handles heat haze / distortion. We want to prevent the problem when two particles overlap. Here is what a particle emitter looks like when rendered to the transparency framebuffer, this time using screen-space normals. The particles aren't rotating so there are visible repetitions in the pattern, but that's okay for now.

    And finally here is the result of the full render. As you can see, the seams and popping is gone, and we have a heavy but smooth distortion effect. Particles can safely overlap without causing any artifacts, as their normals are just blended together and combined to create a single refraction angle.

  13. Josh
    One of the downsides of deferred rendering is it isn't very good at handling transparent surfaces. Since we have moved to a new forward renderer, one of my goals in Leadwerks 5 is to have easy hassle-free transparency with lighting and refraction that just works.
    Pre-multiplied alpha provides a better blending equation than traditional alpha blending. I'm not going to go into the details here, but it makes it so the transparent surface can be brighter than the underlying surface, as you can see on the vehicle's windshield here:

    I've been working for a while to build an automatic post-processing step into the engine that occurs when a transparency object is onscreen. If no transparent objects are onscreen, then the post-processing step can be skipped.
    You can also call Camera::SetRefraction(false) and just use regular GPU-blended transparency with no fancy refraction of the background, but I plan to enable it by default.
    To use this effect, there is absolutely nothing you have to do except to create a material, make it transparent, and apply it to a mesh somewhere.
    auto mtl = CreateMaterial(); mtl->SetTransparent(true); mtl->SetColor(1,1,1,0.5); The lower the alpha value of the material color, the more see-through it is. You can use an alpha value of zero to make a refractive predator-like effect.
  14. Josh
    A new update is available that improves Lua integration in Visual Studio Code and fixes Vulkan validation errors.
    The SSAO effect has been improved with a denoise filter. Similar to Nvidia's RTX raytracing technology, this technique smooths the results of the SSAO pass, resulting in a better appearance.

    It also requires far fewer sample and the SSAO pass can be run at a lower resolution. I lowered the number of SSAO samples from 64 to 8 and decreased the area of the image to 25%, and it looks better than the SSAO in Leaqdwerks 4, which could appear somewhat grainy. With default SSAO and bloom effects enabled, I see no difference in framerate compared to the performance when no post-processing effects are in use.
    I upgraded my install of the Vulkan SDK to 1.2 and a lot of validation errors were raised. They are all fixed now. The image layout transition stuff is ridiculously complicated, and I can see no reason why this is even a feature! This could easily be handled by the driver just storing the current state and switching whenever needed, which is exactly what I ended up doing with my own wrapper class. In theory, everything should work perfectly on all supported hardware now since the validation layers say it is correct.
    You can now explicitly state the validation layers you want loaded, in settings.json, although there isn't really any reason to do this:
    "vkValidationLayers": { "debug": [ "VK_LAYER_LUNARG_standard_validation", "VK_LAYER_KHRONOS_validation" ] } Debugging Lua in Visual Studio Code is improved. The object type will now be shown so you can more easily navigate debug information.

    That's all for now!
  15. Josh
    A new update is available to beta testers. This makes some pretty big changes so I wanted to release this before doing any additional work on the post-processing effects system.
    Terrain Fixed
    Terrain system is working again, with an example for Lua and C++.
    New Configuration Options
    New settings have been added in the "Config/settings.json" file:
    "MultipassCubemap": false, "MaxTextures": 512, "MaxCubemaps": 16, "MaxShadowmaps": 64, "MaxIntegerTextures": 32, "MaxUIntegerTextures": 32, "MaxCubeShadowmaps": 64, "MaxVolumeTextures": 16, "LuaErrorCommand": "code", "LuaErrorCommandArguments": "-g \"$(CurrentFile)\":$(LineNumber) \"$(AppDir)\"" The max texture values will allow you to reduce the array size the engine requires for textures. If you have gotten an error message about "not enough texture units" this setting can be used to bring your application down under the limit your hardware has.
    The Lua settings define the command that is run when a Lua error occurs. By default this will open Visual Studio code and display the file and line number an error occurs on. 
    String Classes
    I've implemented two string classes for better string handling. The String and WString class are derived from both the std::string / wstring AND the Object class, which means they can be used in a variable that accepts an object (like the Event.source member). 8-bit character strings will automatically convert to wide strings, but not the other way. All the Load commands used to have two overloads, one for narrow and one for wide strings. That has been replaced with a single command that accepts a WString, so you can call LoadTexture("brick,dds") without having to specify a wide string like this: L"brick.dds".
    The global string functions like Trim, Right, Mid, etc. have been added as methods on the two string classes, Eventually the global functions will be phased out.
    Lua Integration in Visual Studio Code
    Lua integration in Visual Studio Code is just about finished and it's amazing! Errors are displayed, debugging works great, and console output is displayed, just like any serious modern programming language. Developing with Lua in Leadwerks 5 is going to be a blast!

    Lua launch options are now available for Debug, Release, Debug 64f, and Release 64f.
    I feel the Lua support is good enough now that the .bat files are not needed. It's easier just to open VSCode and copy the example you want to run into Main.lua. These are currently located in "Scripts/Examples" but they will be moved into the documentation system in time.
    The black console window is going away and all executables are by default compiled as a windowed application, not a console app. The console output is still available in Visual Studio in the debug output, or it can be piped to a file with a .bat launcher.
    See the notes here on how to get started with VSCode and Lua.
  16. Josh
    A new update is available that adds post-processing effects in Leadwerks 5 beta.

    To use a post-processing effect, you load it from a JSON file and apply it to a camera like so:
    auto fx = LoadPostEffect("Shaders/PostEffects/SSAO.json"); camera->AddPostEffect(fx); You can add as many effects as you want, and they will be executed in sequence.
    The JSON structure looks like this for a simple effect:
    { "postEffect": { "subpasses": [ { "shader": { "vertex": "Shaders/PostEffects/PostEffect.vert.spv", "fragment": "Shaders/PostEffects/SSAO.frag.spv" } } ] } } Multiple subpasses are supported for custom blurring and chains of shaders. This Gaussian blur effect uses several intermediate buffers to blur and downsample the image:
    { "postEffect": { "buffers": [ { "size": [0.5, 0.5] }, { "size": [0.25, 0.25] }, { "size": [0.125, 0.125] } ], "subpasses": [ { "target": 0, "shader": { "vertex": "Shaders/PostEffects/PostEffect.vert.spv", "fragment": "Shaders/PostEffects/blurx.frag.spv" } }, { "target": 1, "shader": { "vertex": "Shaders/PostEffects/PostEffect.vert.spv", "fragment": "Shaders/PostEffects/blury.frag.spv" } }, { "target": 2, "shader": { "vertex": "Shaders/PostEffects/PostEffect.vert.spv", "fragment": "Shaders/PostEffects/blurx.frag.spv" } }, { "shader": { "vertex": "Shaders/PostEffects/PostEffect.vert.spv", "fragment": "Shaders/PostEffects/blury.frag.spv" } } ] } } A new file is located in "Config/settings.json". This file contains information for the engine when it initializes. You can specify a default set of post-processing effects that will automatically be loaded whenever a camera is created. If you don't want any post-processing effects you can either change this file, or call Camera::ClearPostEffects() after creating a camera.
    Customizable properties are not yet supported but I plan to add these so you can modify the look of an effect on-the-fly.
    Fixed physics bug reported by @wadaltmon
    Other changes:
    EnablePhysics is renamed to SetPhysicsMode. EnableGravity is renamed to SetGravityMode. EnableSweptCollision is renamed to SetSweptCollision. COLLISION_CHARACTER is renamed to COLLISION_PLAYER
  17. Josh
    A new update is available for beta testers.
    The dCustomJoints and dContainers DLLs are now optional if your game is not using any joints (even if you are using physics).
    The following methods have been added to the collider class. These let you perform low-level collision tests yourself:
    Collider::ClosestPoint Collider::Collide Collider::GetBounds Collider::IntersectsPoint Collider::Pick The PluginSDK now supports model saving and an OBJ save plugin is provided. It's very easy to convert models this way using the new Model::Save() method:
    auto plugin = LoadPlugin("Plugins/OBJ.dll"); auto model = LoadModel(world,"Models/Vehicles/car.mdl"); model->Save("car.obj"); Or create models from scratch and save them:
    auto box = CreateBox(world,10,2,10); box->Save("box.obj"); I have used this to recover some of my old models from Leadwerks 2 and convert them into GLTF format:

    There is additional documentation now on the details of the plugin system and all the features and options.
    Thread handling is improved so you can run a simple application that handles 3D objects and exits out without ever initializing graphics.
    Increased strictness of headers for private and public members and methods.
    Fixed a bug where directional lights couldn't be hidden. (Check out the example for the CreateLight command in the new docs.)
    All the Lua scripts in the "Scripts\Start" folder are now executed when the engine initializes, instead of when the first script is run. These will be executed for all programs automatically, so it is useful for automatically loading plugins or workflows. If you don't want to use Lua at all, you can delete the "Scripts" folder and the Lua DLL, but you will need to load any required plugins yourself with the LoadPlugin command.
    Shadow settings are simplified. In Leadwerks 4, entities could be set to static or dynamic shadows, and lights could use a combination of static, dynamic, and buffered modes. You can read the full explanation of this feature in the documentation here. In Leadwerks 5, I have distilled that down to two commands. Entity::SetShadows accepts a boolean, true to cast shadows and false not to. Additionally, there is a new Entity::MakeStatic method. Once this is called on an entity it cannot be moved or changed in any way until it is deleted. If MakeStatic() is called on a light, the light will store an intermediate cached shadowmap of all static objects. When a dynamic object moves and triggers a shadow redraw, the light will copy the static shadow buffer to the shadow map and then draw any dynamic objects in its range. For example, if a character walks across a room with a single point light, the character model has to be drawn six times but the static scene geometry doesn't have to be redrawn at all. This can result in an enormous reduction of rendered polygons. (This is something id Software's Doom engine does, although I implemented it first.)
    In the documentation example the shadow polygon count is 27000 until I hit the space key to make the light static. The light then renders the static scene (everything except the fan blade) into an image, there thereafter that cached image is coped to the shadow map before the dynamic scene objects are drawn. This results in the shadow polygons rendered to drop by a lot, since the whole scene does not have to be redrawn each frame.

    I've started using animated GIFs in some of the documentation pages and I really like it. For some reason GIFs feel so much more "solid" and stable. I always think of web videos as some iframe thing that loads separately, lags and doesn't work half the time, and is embedded "behind" the page, but a GIF feels like it is a natural part of the page.

    My plan is to put 100% of my effort into the documentation and make that as good as possible. Well, if there is an increased emphasis on one thing, that necessarily means a decreased emphasis on something else. What am I reducing? I am not going to create a bunch of web pages explaining what great features we have, because the documentation already does that. I also am not going to attempt to make "how to make a game" tutorials. I will leave that to third parties, or defer it into the future. My job is to make attractive and informative primary reference material for people who need real usable information, not to teach non-developers to be developers. That is my goal with the new docs.
  18. Josh
    A new update is available to beta testers.
    I updated the project to the latest Visual Studio 16.6.2 and adjusted some settings. Build speeds are massively improved. A full rebuild of your game in release mode will now take less than ten seconds. A normal debug build, where just your game code changes, will take about two seconds. (I found that "Whole program optimization" completely does not work in the latest VS and when I disabled it everything was much faster. Plus there's the precompiled header I added a while back.)
    Delayed DLL loading is enabled. This makes it so the engine only loads DLLs when they are needed. If they aren't used by your application, they don't have to be included. If you are not making a VR game, you do not need to include the OpenVR DLL. You can create a small utility application that requires no DLLs in as little as 500 kilobytes. It was also found that the dContainers lib from Newton Dynamics is not actually needed, although the corresponding DLLs are (if your application uses physics).
    A bug in Visual Studio was found that requires all release builds add the setting "/OPT:NOREF,NOICF,NOLBR" in the linker options:
    A new StringObject class derived from both the WString and Object classes is added. This allows the FileSystemWatcher to store the file path in the event source member when an event occurs. A file rename event will store the old file name in the event.extra member.
    The Entity::Pick syntax is changes slightly, removing the X and Y components for the vector projected in front of the entity. See the new documentation for details.
    The API is being finalized and the new docs system has a lot of finished C++ pages. There's a lot of new stuff documented in there like message dialogs, file and folder request dialogs, world statistics, etc. The Buffer class (which replaces the LE4 "Bank" class) is official and documented. The GUI class has been renamed to "Interface".
    Documentation has been organized by area of functionality instead of class hierarchy. It feels more intuitive to me this way.

    I've also made progress using SWIG to make a wrapper for the C# programming language, with the help of @klepto2 and @carlb. It's not ready to use yet, but the feature has gone from "unknown" to "okay, this can be done". (Although SWIG also supports Lua, I think Sol2 is better suited for this purpose.)
  19. Josh
    The Leadwerks 5 beta has been updated.
    A new FileSystemWatcher class has been added. This can be used to monitor a directory and emit events when a file is created, deleted, renamed, or overwritten. See the documentation for details and an example. Texture reloading now works correctly. I have only tested reloading textures, but other assets might work as well.
    CopyFile() will now work with URLs as the source file path, turning it into a download command.
    Undocumented class methods and members not meant for end users are now made private. The goal is for 100% of public methods and members to be documented so there is nothing that appears in intellisense that you aren't allowed to use.
    Tags, key bindings, and some other experimental features are removed. I want to develop a more cohesive design for this type of stuff, not just add random ways to do things differently.
    Other miscellaneous small fixes.
  20. Josh
    I am happy to show you a preview of the new documentation system I am working on:

    Let's take a look at what is going on here:
    It's dark, so you can stare lovingly at it for hours without going blind. You can switch between languages with the links in the header. Lots of internal cross-linking for easy access to relevant information. Extensive, all-inclusive documentation, including Enums, file formats, constructors, and public members. Data is fetched from a Github repository and allows user contributions. I am actually having a lot of fun creating this. It is very fulfilling to be able to go in and build something with total attention to detail.
  21. Josh
    All this month I have been working on a sort of academic paper for a conference I will be speaking at towards the end of the year. This paper covers details of my work for the last three years, and includes benchmarks that demonstrate the performance gains I was able to get as a result of the new design, based on an analysis of modern graphics hardware.
    I feel like my time spent has not been very efficient. I have not written any code in a while, but it's not like I was working that whole time. I had to just let the ideas settle for a bit.
    Activity doesn't always mean progress.
    Anyways, I am wrapping up now, and am very pleased with the results. It all turned out much much better than I was expecting.
  22. Josh
    I have been spending most of my time on something else this month in preparation for the release of the Leadwerks 5 SDK. However, I did add one small feature today that has very big implications for the way the engine works. You can load a file from a web URL:
    local tex = LoadTexture("https://www.github.com/Leadwerks/Documentation/raw/master/Assets/brickwall01.dds") Why is this a big deal? Well, it means you can post code snippets that can be copied and pasted without requiring download of any extra files. That means the documentation can include examples that use files that aren't required to be in the user's project directory:
    The documentation doesn't have to have any awkward zip files you are instructed to download like here, because any files that are needed in any examples can simply be linked directly to by the URL. So basically the default blank template can really be blank and doesn't need to include any "sample" files at all. If you have something like a model that has separate material and texture files, it should be possible to just link to the model file's URL and the rest of the associated files will be grabbed automatically.
  23. Josh
    I've moved the GI calculation over to the GPU and our Vulkan renderer in Leadwerks Game Engine 5 beta now supports volume textures. After a lot of trial and error I believe I am closing in on our final techniques. Voxel GI always involves a degree of light leakage, but this can be mitigated by setting a range for the ambient GI. I also implemented a hard reflection which was pretty easy to do. It would not be much more difficult to store the triangles in a lookup table for each voxel in order to trace a finer polygon-based ray for results that would look the same as Nvidia RTX but perform much faster.
    The video below is only performing a single GI bounce at this time, and it is displaying the lighting on the scene voxels, not on the original polygons. I am pretty pleased with this progress and I think the final results will look great and run fast. In addition, the need for environment probes placed in the scene will soon forever be a thing of the past.

    2035564276_VoxelGI_raytracingprogress.mp4.a173c8eb756aa1403cccc972a3306d49.mp4 There is still a lot of work to do on this, but I would say that this feature just went from something I was very overwhelmed and intimidated by to something that is definitely under control and feasible.
    Also, we might not use cascaded shadow maps (for directional lights) at all but instead rely on a voxel raytrace for directional light shadows. If it works, that would be my preference because CSMs waste so much space and drawing a big outdoors scene 3-4 times can be taxing.
  24. Josh
    I implemented light bounces and can now run the GI routine as many times as I want. When I use 25 rays per voxel and run the GI routine three times, here is the result. (The dark area in the middle of the floor is actually correct. That area should be lit by the sky color, but I have not yet implemented that, so it appears darker.)

    It's sort of working but obviously these results aren't usable yet. Making matters more difficult is the fact that people love to show their best screenshots and love to hide the problems their code has, so it is hard to find something reliable to compare my results to.
    I also found that the GI pass, unlike all previous passes, is very slow. Each pass takes about 30 seconds in release mode! I could try to optimize the C++ code but something tells me that even optimized C++ code would not be fast enough. So it seems the GI passes will probably need to be performed in a shader. I am going to experiment a bit with some ideas I have first to provide better quality GI results first though.
  25. Josh
    The polygon voxelization process for our voxel GI system now takes vertex, material, and base texture colors into account. The voxel algorithm does not yet support a second color channel for emission, but I am building the whole system with that in mind. When I visualize the results of the voxel building the images are pretty remarkable! Of course the goal is to use this data for fast global illumination calculations but maybe they could be used to make a whole new style of game graphics.

    Direct lighting calculations on the CPU are fast enough that I am going to stick with this approach until I have to use the GPU. If several cascading voxel grids were created around the camera, and each updated asynchronously on its own thread, that might give us the speed we need to relieve the GPU from doing any extra work. The final volume textures could be compressed to DXT1 (12.5% their original size) and sent to the GPU.
    After direct lighting has been calculated, the next step is to downsample the voxel grid. I found the fastest way to do this is to iterate through just the solid voxels. This is how my previous algorithm worked:
    for (x=0; x < size / 2; ++x) { for (y=0; y < size / 2; ++y) { for (z=0; z < size / 2; ++z) { //Downsample this 2x2 block } } } A new faster approach works by "downsampling" the set of solid voxels by dividing each value by two. There are some duplicated values but that's fine:
    for (const iVec3& i : solidvoxels) { downsampledgrid->solidvoxels.insert(iVec3(i.x/2,i.y/2,i.z/2)) } for (const iVec3& i : downsampledgrid->solidvoxels) { //Downsample this 2x2 block } We can then iterate through just the solid voxels when performing the downsampling. A single call to memset will set all the voxel data to black / empty before the downsampling begins. This turns out to be much much faster than iterating through every voxel on all three axes.
    Here are the results of the downsampling process. What you don't see here is the alpha value of each voxel. The goblin in the center ends up bleeding out to fill very large voxels, because the rest of the volume around him is empty space, but the alpha value of those voxels will be adjusted to give them less influence in the GI calculation.

    For a 128x128x128 voxel grid, with voxel size of 0.125 meters, my numbers are now:
    Voxelization: 607 milliseconds Direct lighting (all six directions): 109 First downsample (to 64x64): 39 Second downsample (to 32x32): 7 Third downsample (to 16x16): 1 Total: 763 Note that voxelization, by far the slowest step here, does not have to be performed completely on all geometry each update. The direct lighting time elapsed is within a tolerable range, so we are in the running to make GI calculations entirely on the CPU, relieving the GPU of extra work and compressing our data before it is sent over the PCI bridge.
    Also note that a smaller voxel grids could be used, with more voxel grids spread across more CPU cores. If that were the case I would expect our processing time for each one to go down to 191 milliseconds total (39 milliseconds without the voxelization step), and the distance your GI covers would then be determined by your number of CPU cores.
    In fact there is a variety of ways this task could be divided between several CPU cores.
  • Create New...