Jump to content

Josh

Staff
  • Posts

    23,338
  • Joined

  • Last visited

Blog Entries posted by Josh

  1. Josh
    Textures in Leadwerks don't actually store any pixel data in system memory. Instead the data is sent straight from the hard drive to the GPU and dumped from memory, because there is no reason to have all that data sitting around in RAM. However, I needed to implement texture saving for our terrain system so I implemented a simple "Pixmap" class for handling image data:
    class Pixmap : public SharedObject { VkFormat m_format; iVec2 m_size; shared_ptr<Buffer> m_pixels; int bpp; public: Pixmap(); const VkFormat& format; const iVec2& size; const shared_ptr<Buffer>& pixels; virtual shared_ptr<Pixmap> Copy(); virtual shared_ptr<Pixmap> Convert(const VkFormat format); virtual bool Save(const std::string& filename, const SaveFlags flags = SAVE_DEFAULT); virtual bool Save(shared_ptr<Stream>, const std::string& mimetype = "image/vnd-ms.dds", const SaveFlags flags = SAVE_DEFAULT); friend shared_ptr<Pixmap> CreatePixmap(const int, const int, const VkFormat, shared_ptr<Buffer> data); friend shared_ptr<Pixmap> LoadPixmap(const std::wstring&, const LoadFlags); }; shared_ptr<Pixmap> CreatePixmap(const int width, const int height, const VkFormat format = VK_FORMAT_R8G8B8A8_UNORM, shared_ptr<Buffer> data = nullptr); shared_ptr<Pixmap> LoadPixmap(const std::wstring& path, const LoadFlags flags = LOAD_DEFAULT); You can convert a pixmap from one format to another in order to compress raw RGBA pixels into BCn compressed data. The supported conversion formats are very limited and are only being implemented as they are needed. Pixmaps can be saved as DDS files, and the same rules apply. Support for the most common formats is being added.
    As a result, the terrain system can now save out all processed images as DDS files. The modern DDS format supports a lot of pixel formats, so even heightmaps can be saved. All of these files can be easily viewed in Visual Studio itself. It's by far the most reliable DDS viewer, as even the built-in Windows preview function is missing support for DX10 formats. Unfortunately there's really no modern DDS viewer application like the old Windows Texture Viewer.

    Storing terrain data in an easy-to-open standard texture format will make development easier for you. I intend to eliminate all "black box" file formats so all your game data is always easily viewable in a variety of tools, right up until the final publish step.
  2. Josh
    A new beta is available in the beta forum. This adds new texture and pixmap features, Basis texture support, and support for customized project workflows. Use of Basis textures brought the download size down to less than 300 megabytes. New Lua examples are included:
    Build Texture Build Cubemap SetSubPixels
  3. Josh
    The terrain system in Leadwerks Game Engine 4 allows terrains up to 64 square kilometers in size. This is big enough for any game where you walk and most driving games, but is not sufficient for flight simulators or space simulations. For truly massive terrain, we need to be able to dynamically stream data in and out of memory, at multiple resolutions, so we can support terrains bigger than what would otherwise fit in memory all at once.
    The next update of Leadwerks Game Engine 5 beta supports streaming terrain, using the following command:
    shared_ptr<StreamingTerrain> CreateStreamingTerrain(shared_ptr<World> world, const int resolution, const int patchsize, const std::wstring& datapath, const int atlassize = 1024, void FetchPatchInfo(TerrainPatchInfo*) = NULL) Let's looks at the parameters:
    resolution: Number of terrain points along one edge of the terrain, should be power-of-two. patchsize: The number of tiles along one edge of a terrain piece, should be a power-of-two number, probably 64 or 128. datapath: By default this indicates a file path and name but can be customized. atlassize: Width and height of texture atlas texture data is copied into. 1024 is usually fine. FetchPatchInfo: Optional user-defined callback to override the default data handler. A new Lua sample is included that creates a streaming terrain:
    local terrain = CreateStreamingTerrain(world, 32768, 64, "Terrain/32768/32768") terrain:SetScale(1,1000,1) The default fetch patch function can be used to make your own data handler. Here is the default function, which is probably more complex than what you need for streaming GIS data. The key parts to note are:
    The TerrainPatchInfo structure contains the patch X and Y position and the level of detail. The member patch->heightmap should be set to a pixmap with format TEXTURE_R16. The member patch->normalmap should be set to a pixmap with format TEXTURE_RGBA (for now). You can generate this from the heightmap using MakeNormalMap(). The scale value input into the MakeNormalMap() should be the terrain vertical scale you intend to use, times two, divided by the mipmap level. This ensures normals are calculated correctly at each LOD. For height and normal data, which is all that is currently supported, you should use the dimensions patchsize + 1, because the a 64x64 patch, for example, uses 65x65 vertices. Don't forget to call patch->UpdateBounds() to calculate the AABB for this patch. The function must be thread-safe, as it will be called from many different threads, simultaneously. void StreamingTerrain::FetchPatchInfo(TerrainPatchInfo* patch) { //User-defined callback if (FP_FETCH_PATCH_INFO != nullptr) { FP_FETCH_PATCH_INFO(patch); return; } auto stream = this->stream[TEXTURE_DISPLACEMENT]; if (stream == nullptr) return; int countmips = 1; int mw = this->resolution.x; while (mw > this->patchsize) { countmips++; mw /= 2; } int miplevel = countmips - 1 - patch->level; Assert(miplevel >= 0); uint64_t mipmapsize = Round(this->resolution.x * 1.0f / pow(2.0f, miplevel)); auto pos = mipmappos[TEXTURE_DISPLACEMENT][miplevel]; uint64_t rowpos; patch->heightmap = CreatePixmap(patchsize + 1, patchsize + 1, TEXTURE_R16); uint64_t px = patch->x * patchsize; uint64_t py = patch->y * patchsize; int rowwidth = patchsize + 1; for (int ty = 0; ty < patchsize + 1; ++ty) { if (py + ty >= mipmapsize) { patch->heightmap->CopyRect(0, patch->heightmap->size.y - 2, patch->heightmap->size.x, 1, patch->heightmap, 0, patch->heightmap->size.y - 1); continue; } if (px + rowwidth > mipmapsize) rowwidth = mipmapsize - px; rowpos = pos + ((py + ty) * mipmapsize + px) * 2; streammutex[TEXTURE_DISPLACEMENT]->Lock(); stream->Seek(rowpos); stream->Read(patch->heightmap->pixels->data() + (ty * (patchsize + 1) * 2), rowwidth * 2); streammutex[TEXTURE_DISPLACEMENT]->Unlock(); if (rowwidth < patchsize + 1) { patch->heightmap->WritePixel(patch->heightmap->size.x - 1, ty, patch->heightmap->ReadPixel(patch->heightmap->size.x - 2, ty)); } } patch->UpdateBounds(); stream = this->stream[TEXTURE_NORMAL]; if (stream == nullptr) { patch->normalmap = patch->heightmap->MakeNormalMap(scale.y * 2.0f / float(1 + miplevel), TEXTURE_RGBA); } else { pos = mipmappos[TEXTURE_NORMAL][miplevel]; Assert(pos < stream->GetSize()); patch->normalmap = CreatePixmap(patchsize + 1, patchsize + 1, TEXTURE_RGBA); rowwidth = patchsize + 1; for (int ty = 0; ty < patchsize + 1; ++ty) { if (py + ty >= mipmapsize) { patch->normalmap->CopyRect(0, patch->normalmap->size.y - 2, patch->normalmap->size.x, 1, patch->normalmap, 0, patch->normalmap->size.y - 1); continue; } if (px + rowwidth > mipmapsize) rowwidth = mipmapsize - px; rowpos = pos + ((py + ty) * mipmapsize + px) * 4; Assert(rowpos < stream->GetSize()); streammutex[TEXTURE_NORMAL]->Lock(); stream->Seek(rowpos); stream->Read(patch->normalmap->pixels->data() + uint64_t(ty * (patchsize + 1) * 4), rowwidth * 4); streammutex[TEXTURE_NORMAL]->Unlock(); if (rowwidth < patchsize + 1) { patch->normalmap->WritePixel(patch->normalmap->size.x - 1, ty, patch->normalmap->ReadPixel(patch->normalmap->size.x - 2, ty)); } } } } There are some really nice behaviors that came about naturally as a consequence of the design.
    Because the culling algorithm works its way down the quad tree with only known patches of data, the lower-resolution sections of terrain will display first and then be replaced with higher-resolution patches as they are loaded in. If the cache gets filled up, low-resolution patches will be displayed until the cache clears up and more detailed patches are loaded in. If all the patches in one draw call have not yet been loaded, the previous draw call's contents will be rendered instead. As a result, the streaming terrain is turning out to be far more robust than I was initially expecting. I can fly close to the ground at 650 MPH (the speed of some fighter jets) with no problems at all.
    There are also some issues to note:
    Terrain still has cracks in it. Seams in the normal maps will appear along the edges, for now. Streaming terrain does not display material layers at all, just height and normal. But there is enough to start working with it and piping data into the system.
     
  4. Josh
    An update is available for beta testers.
    What's new:
    GLTF animations now work! New example included. Any models from Sketchfab should work. Added Camera::SetGamma, GetGamma. Gamma is 1.0 by default, use 2.2 for dark scenes. Fixed bug that was creating extra bones. This is why the animation example was running slow in previous versions. Fixed bug where metalness was being read from wrong channel in metal-roughness map. Metal = R, roughness = G. Texture definitions in JSON materials are changed, but the old scheme is left in for compatibility. Textures are now an array:
    { "material": { "color": [ 1, 1, 1, 1 ], "emission": [ 0, 0, 0 ], "metallic": 0.75, "roughness": 0.5, "textures": [ { "slot": "BASE", "file": "./wall_01_D.tex" }, { "slot": "NORMAL", "file": "./wall_01_N.tex" } ] } } The slot value can be an integer from 0-31 or one of these strings:
    BASE NORMAL METALLIC_ROUGHNESS DISPLACEMENT EMISSION BRDF Bugs:
    FPS example menu freezes. Close window to exit instead. Looping animations are not randomized so the animation example will show characters that appear identical even though they are separate skeletons animating independently. Unskinned GLTF animation is not yet supported (requires bone attachments feature)
  5. Josh
    Leadwerks Game Engine 5 Beta now supports debugging Lua in Visual Studio Code. To get started, install the Lua Debugger extension by DevCat.
    Open the project folder in VSCode and press F5. Choose the Lua debugger if you are prompted to select an environment.
    You can set breakpoints and step through Lua code, viewing variables and the callstack. All printed output from your game will be visible in the Debug Console within the VS Code interface.

    Having first-class support for Lua code in a professional IDE is a dream come true. This will make development with Lua in Leadwerks Game Engine 5 a great experience.
  6. Josh
    I have resumed work on voxel-based global illumination using voxel cone step tracing in Leadwerks Game Engine 5 beta with our Vulkan renderer. I previously put about three months of work into this with some promising results, but it is a very difficult system and I wanted to focus on Vulkan. Some of features we have gained since then like Pixmaps and DXT decompression make the voxel GI system easier to finish.
    I previously considered implementing Nvidia's raytracing techniques for Vulkan but the performance is terrible, even on the highest-end graphics cards. Voxel-based GI looks great and runs fast with basically no performance penalty.
    Below we have a section of the scene voxelized and lit with direct lighting. Loading the Sponza scene from GLTF format made it easy to display all materials and textures correctly.

    I found that the fastest way to manage voxel data was by storing the data in one big STL vector, and storing an STL set of occupied cells. (An STL set is like a map with only keys.) I found the fastest way to perform voxel raycasting was actually just to walk through the voxel data with optimized C++ code. This was much faster than my previous attempts to use octrees, and much simpler too! The above scene took about about 100 milliseconds to calculate direct lighting on a single CPU core, which is three times faster than my previous attempts. This definitely means that CPU-based GI lighting may be possible, which is my preferred approach. It's easier to implement, easy to parallelize, more flexible, more reliable, uses less video memory, transfers less data to the GPU, and doesn't draw any GPU power away from rendering the rest of the scene.
    The challenge will be in minimizing the delay between when an object moves, GI is recalculated, and when the data uploaded to the GPU and appears onscreen. I am guessing a delay somewhere around 200 milliseconds will be acceptable. It should also be considered that only an onscreen object will have a perceived delay if the reflection is slow to appear. An offscreen object will have no perceived delay because you can only see the reflection. Using screen-space reflections on pixels that can use it is one way to mitigate that problem, but if possible I would prefer to use one uniform system instead of mixing two rendering techniques.
    If this does not work then I will upload a DXT compressed texture containing the voxel data to the GPU. There are several stages at which the data can be handed off, so the question is which one works best?

    My design has changed a bit, but this is a pretty graphic.
    Using the pixmap class I will be able to load low-resolution versions of textures into system memory, decompress them to a readable format, and use that data to colorize the voxels according to the textures and UV coordinates of the vertices that are fed into the voxelization process.
  7. Josh
    The polygon voxelization process for our voxel GI system now takes vertex, material, and base texture colors into account. The voxel algorithm does not yet support a second color channel for emission, but I am building the whole system with that in mind. When I visualize the results of the voxel building the images are pretty remarkable! Of course the goal is to use this data for fast global illumination calculations but maybe they could be used to make a whole new style of game graphics.

    Direct lighting calculations on the CPU are fast enough that I am going to stick with this approach until I have to use the GPU. If several cascading voxel grids were created around the camera, and each updated asynchronously on its own thread, that might give us the speed we need to relieve the GPU from doing any extra work. The final volume textures could be compressed to DXT1 (12.5% their original size) and sent to the GPU.
    After direct lighting has been calculated, the next step is to downsample the voxel grid. I found the fastest way to do this is to iterate through just the solid voxels. This is how my previous algorithm worked:
    for (x=0; x < size / 2; ++x) { for (y=0; y < size / 2; ++y) { for (z=0; z < size / 2; ++z) { //Downsample this 2x2 block } } } A new faster approach works by "downsampling" the set of solid voxels by dividing each value by two. There are some duplicated values but that's fine:
    for (const iVec3& i : solidvoxels) { downsampledgrid->solidvoxels.insert(iVec3(i.x/2,i.y/2,i.z/2)) } for (const iVec3& i : downsampledgrid->solidvoxels) { //Downsample this 2x2 block } We can then iterate through just the solid voxels when performing the downsampling. A single call to memset will set all the voxel data to black / empty before the downsampling begins. This turns out to be much much faster than iterating through every voxel on all three axes.
    Here are the results of the downsampling process. What you don't see here is the alpha value of each voxel. The goblin in the center ends up bleeding out to fill very large voxels, because the rest of the volume around him is empty space, but the alpha value of those voxels will be adjusted to give them less influence in the GI calculation.




    For a 128x128x128 voxel grid, with voxel size of 0.125 meters, my numbers are now:
    Voxelization: 607 milliseconds Direct lighting (all six directions): 109 First downsample (to 64x64): 39 Second downsample (to 32x32): 7 Third downsample (to 16x16): 1 Total: 763 Note that voxelization, by far the slowest step here, does not have to be performed completely on all geometry each update. The direct lighting time elapsed is within a tolerable range, so we are in the running to make GI calculations entirely on the CPU, relieving the GPU of extra work and compressing our data before it is sent over the PCI bridge.
    Also note that a smaller voxel grids could be used, with more voxel grids spread across more CPU cores. If that were the case I would expect our processing time for each one to go down to 191 milliseconds total (39 milliseconds without the voxelization step), and the distance your GI covers would then be determined by your number of CPU cores.
    In fact there is a variety of ways this task could be divided between several CPU cores.
  8. Josh
    The Leadwerks 5 beta has been updated.
    A new FileSystemWatcher class has been added. This can be used to monitor a directory and emit events when a file is created, deleted, renamed, or overwritten. See the documentation for details and an example. Texture reloading now works correctly. I have only tested reloading textures, but other assets might work as well.
    CopyFile() will now work with URLs as the source file path, turning it into a download command.
    Undocumented class methods and members not meant for end users are now made private. The goal is for 100% of public methods and members to be documented so there is nothing that appears in intellisense that you aren't allowed to use.
    Tags, key bindings, and some other experimental features are removed. I want to develop a more cohesive design for this type of stuff, not just add random ways to do things differently.
    Other miscellaneous small fixes.
  9. Josh
    A new update is available that adds post-processing effects in Leadwerks 5 beta.

    To use a post-processing effect, you load it from a JSON file and apply it to a camera like so:
    auto fx = LoadPostEffect("Shaders/PostEffects/SSAO.json"); camera->AddPostEffect(fx); You can add as many effects as you want, and they will be executed in sequence.
    The JSON structure looks like this for a simple effect:
    { "postEffect": { "subpasses": [ { "shader": { "vertex": "Shaders/PostEffects/PostEffect.vert.spv", "fragment": "Shaders/PostEffects/SSAO.frag.spv" } } ] } } Multiple subpasses are supported for custom blurring and chains of shaders. This Gaussian blur effect uses several intermediate buffers to blur and downsample the image:
    { "postEffect": { "buffers": [ { "size": [0.5, 0.5] }, { "size": [0.25, 0.25] }, { "size": [0.125, 0.125] } ], "subpasses": [ { "target": 0, "shader": { "vertex": "Shaders/PostEffects/PostEffect.vert.spv", "fragment": "Shaders/PostEffects/blurx.frag.spv" } }, { "target": 1, "shader": { "vertex": "Shaders/PostEffects/PostEffect.vert.spv", "fragment": "Shaders/PostEffects/blury.frag.spv" } }, { "target": 2, "shader": { "vertex": "Shaders/PostEffects/PostEffect.vert.spv", "fragment": "Shaders/PostEffects/blurx.frag.spv" } }, { "shader": { "vertex": "Shaders/PostEffects/PostEffect.vert.spv", "fragment": "Shaders/PostEffects/blury.frag.spv" } } ] } } A new file is located in "Config/settings.json". This file contains information for the engine when it initializes. You can specify a default set of post-processing effects that will automatically be loaded whenever a camera is created. If you don't want any post-processing effects you can either change this file, or call Camera::ClearPostEffects() after creating a camera.
    Customizable properties are not yet supported but I plan to add these so you can modify the look of an effect on-the-fly.
    Fixed physics bug reported by @wadaltmon
    Other changes:
    EnablePhysics is renamed to SetPhysicsMode. EnableGravity is renamed to SetGravityMode. EnableSweptCollision is renamed to SetSweptCollision. COLLISION_CHARACTER is renamed to COLLISION_PLAYER
  10. Josh
    A new update is available that improves Lua integration in Visual Studio Code and fixes Vulkan validation errors.
    The SSAO effect has been improved with a denoise filter. Similar to Nvidia's RTX raytracing technology, this technique smooths the results of the SSAO pass, resulting in a better appearance.

    It also requires far fewer sample and the SSAO pass can be run at a lower resolution. I lowered the number of SSAO samples from 64 to 8 and decreased the area of the image to 25%, and it looks better than the SSAO in Leaqdwerks 4, which could appear somewhat grainy. With default SSAO and bloom effects enabled, I see no difference in framerate compared to the performance when no post-processing effects are in use.
    I upgraded my install of the Vulkan SDK to 1.2 and a lot of validation errors were raised. They are all fixed now. The image layout transition stuff is ridiculously complicated, and I can see no reason why this is even a feature! This could easily be handled by the driver just storing the current state and switching whenever needed, which is exactly what I ended up doing with my own wrapper class. In theory, everything should work perfectly on all supported hardware now since the validation layers say it is correct.
    You can now explicitly state the validation layers you want loaded, in settings.json, although there isn't really any reason to do this:
    "vkValidationLayers": { "debug": [ "VK_LAYER_LUNARG_standard_validation", "VK_LAYER_KHRONOS_validation" ] } Debugging Lua in Visual Studio Code is improved. The object type will now be shown so you can more easily navigate debug information.

    That's all for now!
  11. Josh
    In my work with NASA we visualize many detailed CAD models in VR. These models may consist of tens of millions of polygons and thousands of articulated sub-objects. This often results in rendering performance that is bottlenecked by the vertex rather than the fragment pipeline. I recently performed some research to determine how to maximize our rendering speed in these situations.
    Leadwerks 4 used separate vertex buffers, but in Leadwerks 5 I have been working exclusively with interleaved vertex buffers. Data is interleaved and packed tightly. I always knew this could make a small improvement in speed, but I underestimated how important this is. Each byte in the data makes a huge impact. Now vertex colors and the second texture coordinate set are two vertex attributes that are almost never used. I decided to eliminate these. If required, this data can be packed into a 1D texture, applied to a material, and then read in a custom vertex shader, but I don't think the cost of keeping this data in the default vertex structure is justified. By reducing the size of the vertex structure I was able to make rendering speed in vertex-heavy scenarios about four times faster.
    Our vertex structure has been cut down to a convenient 32 bytes:
    struct Vertex {     Vec3 position;     short texcoords[2];     signed char normal[3];     signed char displacement;     signed char tangent[4];     unsigned char boneweights[4];     unsigned char boneindices[4]; }; I created a separate vertex buffer for rendering shadow maps, which only require position data. I decided to copy the position data into this and store it separately. This requires about 15% more vertex memory usage, but results in a much more compact vertex structure for faster shadow rendering. I may pack the vertex texture coordinates in there, since that would result in a 16-byte-aligned structure. I did not see any difference in performance on my Nvidia card and I suspect this is the same cost as a 12-byte structure on most hardware.
    Using unsigned shorts instead of unsigned integers for mesh indices increases performance by 11%.
    A vertex-limited scene is one in which our default setting of using an early Z-pass can be a disadvantage, so I added an option to disable this on a per-camera basis.
    Finally, I found that vertex cache optimization tools can produce a significant performance increase. I implemented two different libraries. In order to do this, I added a new plugin function for filtering a mesh:
    int FilterMesh(char* filtername, char* params, GMFSDK::GMFVertex*& vertices, uint32_t& vertex_count, uint32_t*& indices, uint32_t& indice_count, int polygonpoints); This allows you to add new mesh processing routines such as flipping the indices of a mesh, calculating normals, or performing mesh modifications like bending, twisting, distorting, etc. Both libraries resulted in an additional 100% increase in framerate in vertex-limited scenes.
    What will this help with? These optimizations will make a real difference when rendering CAD models and point cloud data.
  12. Josh
    One of the downsides of deferred rendering is it isn't very good at handling transparent surfaces. Since we have moved to a new forward renderer, one of my goals in Leadwerks 5 is to have easy hassle-free transparency with lighting and refraction that just works.
    Pre-multiplied alpha provides a better blending equation than traditional alpha blending. I'm not going to go into the details here, but it makes it so the transparent surface can be brighter than the underlying surface, as you can see on the vehicle's windshield here:

    I've been working for a while to build an automatic post-processing step into the engine that occurs when a transparency object is onscreen. If no transparent objects are onscreen, then the post-processing step can be skipped.
    You can also call Camera::SetRefraction(false) and just use regular GPU-blended transparency with no fancy refraction of the background, but I plan to enable it by default.
    To use this effect, there is absolutely nothing you have to do except to create a material, make it transparent, and apply it to a mesh somewhere.
    auto mtl = CreateMaterial(); mtl->SetTransparent(true); mtl->SetColor(1,1,1,0.5); The lower the alpha value of the material color, the more see-through it is. You can use an alpha value of zero to make a refractive predator-like effect.
     
  13. Josh
    An update is available that adds the new refraction effect. It's very easy to create a refractive transparent material:
    auto mtl = CreateMaterial(); mtl->SetTransparent(true); mtl->SetRefraction(0.02); The default FPS example shows some nice refraction, with two overlapping layers of glass, with lighting on all layers. It looks great with some of @TWahl's PBR materials.

    If you want to control the strength of the refraction effect on a per-pixel basis add an alpha channel to your normal map.
    I've configured the launch.json for Visual Studio Code so that the current selected file is passed to the program in the command line. By default, game executable will run the "Scripts/Main.lua" file. If however, the current selected Lua file in the VSCode IDE is a file located in "Scripts/Examples" the executable will launch that one instead. This design allows you to quickly run a different script without overwriting Main.lua, but won't accidentally run a different script if you are working on something else.

    The whole integration with Visual Studio Code has gotten really nice.

    A new option "frameBufferColorFormat" is added to the Config/settings.json file to control the default color format for texture buffers .I have it set to 37 (VK_FORMAT_R8G8B8A8_UNORM) but you can set it to 91 (VK_R16G16B16A16_UNORM) for high-def color, but you probably won't see anything without an additional tone mapping post-processing effect.
    Slow performance in the example game has been fixed. There are a few things going on here. Physics weren't actually the problem, it was the Lua debugger. The biggest problem was an empty Update() function that all the barrels had in their script. Now, this should not really be a problem, but I suspect the routine in vscode-debugger.lua that finds the matching chunk name is slow and can be optimized quite a lot. I did not want to make any additional changes to it right now, but in the future I think this can be further improved. But anyways, the FPS example will be nice and snappy now and runs normally.
    Application shut down will be much faster now, as I did some work to clean up the way the engine cleans itself up upon termination.
  14. Josh

    Articles
    An update is available for Leadwerks 5 beta on Steam that adds a World::SetSkyColor() command. This allows you to set a gradient for PBR reflections when no skybox is in use.
    I learned with Leadwerks 4 that default settings are important. The vast majority of screenshots people show off are going to use whatever default rendering settings I program in. We need a good balance between quality and performance for the engine to use as defaults. Therefore, the engine will use SSAO and bloom effects by default, a gentle gradient will be applied to PBR reflections, and the metal / roughness values of new materials will each be 0.5. Here is the result when a simple box is created with a single directional light:

    And here is what a more complex model looks like, without any lights in the scene:

    You can use World::SetSkyColor() to change the intensity of the reflections:

    Or you can change the colors to get an entirely different look:

    A Lua example using this command is available in the "Scripts/Examples" folder.
    These feature will help you to get better graphics out of the new engine with minimal effort.
  15. Josh

    Articles
    The terrain streaming / planet rendering stuff was the last of the feature creep. That finishes out the features I have planned for the first release of the new engine. My approach for development has been to go very broad so I could get a handle on how all the features work together, solve the hard problems, and then fill in the details when convenient.
    The hard problems are all solved so now it's just a matter of finishing things, Consequently, I don't think my blogs are going to make any more groundbreaking feature announcements, but rather are going to show steady improvement of each subsystem as we progress towards a finished product.
    The GUI is something I wanted to spend some more cycles on. The initial release of the new engine will be a pure programming SDK with GUI support, but the GUI I am implementing is also going to be the basis of the new editor, when that time comes. I decided that using Lua scripts to control widgets was a bad idea because when operating at-scale I think this will cause some small slowdown in the UI. My goals for the new editor are for it to load fast and be very snappy and responsive, and that is my highest priority. It is nice to have overarching design goals because then you know what you must do.
    I've started the process of converting our Lua widget scripts into C++ code. The API now has functions like CreatePanel(), CreateButton(), etc. and is much more formalized than the flexible-but-open-ended GUI system in Leadwerks 4. For customization, I am implementing a color system. We have a bunch of color constants like this:
        enum WidgetColor     {         WIDGET_COLOR_BACKGROUND,         WIDGET_COLOR_BORDER,         WIDGET_COLOR_FOREGROUND,         WIDGET_COLOR_SELECTION,         WIDGET_COLOR_HIGHLIGHT,         WIDGET_COLOR_AUX0,         WIDGET_COLOR_AUX1,         WIDGET_COLOR_AUX2,         WIDGET_COLOR_AUX3,     }; There is a Widget::SetColor() command that lets you set any of the above values. Now, this is not a complete set of colors. The GUI system uses a lot more colors than that. But these colors are generated by multiplying the defined color by some value to make it a little darker or a little lighter.
    This means I am making a decision to reduce the flexibility of the system in favor of more formalized feature support, better documentation, and better performance.
    I think we will be able to load a color scheme from a JSON file and that will allow enough customization that most things people want to do will be possible. For custom widget behavior, I think either an actor or a DLL plugin could be used. There are enough options for future extensibility that I feel like we will be okay deferring that decision for now, and I am not coding myself into a corner.
    Here's a shot of the current state of things:

    I probably have enough GUI code ahead of me I could just go silent for a month and stay busy with this. I don't really want to think about that for the rest of today. Goodnight.
  16. Josh

    Articles
    Our new editor is being designed to support user-created extensions written in Lua. I want Lua to work in our new editor the way MaxScript works in 3ds Max, to allow an endless assortment of new tools you can create and use.
    Now that the editor GUI system is well underway, I want to start thinking about how user-created extensions will work with our new editor. I'm going to lay out some theoretical code for how a road creation tool might integrate into the editor.
    First we declare a start function that is run when the extension is loaded. This will add a toolbar and menu item so the tool can be selected, as well as create a new event listen function:
    function extension:Start() --Load the tool icon local icon = LoadPixmap("Icons/RoadTool.svg") --Add a toolbar button self.toolbarbutton = application.mainwindow.toolbar:InsertButton(icon) --Add a menu button self.menuitem = application.mainwindow.menu["Tools"]:InsertItem("Road Tool") self.menuitem:SetPixmap(icon) --Listen for events. EVENT_NONE will process all events: ListenEvent(EVENT_NONE, application.viewportgrid.viewport[1], self.ProcessEvent, self) ListenEvent(EVENT_NONE, application.viewportgrid.viewport[2], self.ProcessEvent, self) ListenEvent(EVENT_NONE, application.viewportgrid.viewport[3], self.ProcessEvent, self) ListenEvent(EVENT_NONE, application.viewportgrid.viewport[4], self.ProcessEvent, self) end Now we need to declare a function to process events. If the function returns false, the event will not be further processed, so the default mouse tool will be overridden.
    function extension:ProcessEvent(event) --Return if the road tool is not active if self.toolbarbutton:GetState() == false then return true end --Evaluate widget events - keep the menu and toolbar button in sync if event.id == EVENT_WIDGETACTION then if event.source == self.menu then self.toolbarbutton:SetState(event.data) elseif event.source == self.toolbarbutton self.menuitem:SetState(event.data) end --Evaluate mouse events elseif event.id == EVENT_MOUSEDOWN then local viewport = Viewport(event.source) if viewport ~= nil then local pickinfo = PickInfo() if viewport.camera:Pick(viewport.framebuffer, event.x, event.y, pickinfo, 0, true) then self:AddNode(pickinfo.position) end return false end --Evaluate key hits elseif event.id == EVENT_KEYDOWN then if event.data == KEY_ENTER then if #self.splinepoints > 1 then --Create our road self:CreateRoad() --Update the undo system application:CreateUndoStep() --Tell the editor the scene is modified application:ModifyScene() --Refresh the viewports application.viewportgrid:Redraw() return false end end end return true end  
  17. Josh

    Articles
    An update is available for the Ultra App Kit beta on Steam.
    Menu open / close behavior is finished and is now working bug-free. Fixed problem where list boxes were only showing the first item. A submenu item is demonstrated in the example program. A progress bar widget is added in the example program. A label widget is added in the example program. A second radio button is added in the example program. Still to do:
    Work out some scaling issues. Light theme. Some small details with some widget styles. Finish documentation. Project wizard / manager application.
  18. Josh
    The beta testers and I are are discussing game programming in the new engine. I want to see C++, Lua, and C# all take a near-identical approach that steals the best aspects of modern game programming and ditches the worst, to create something new and unique. To that end, we are developing the same simple game several times, with several different methodologies, to determine what really works best. One thing I realized quickly was we really need a way to load prefabs from files.
    I started implementing a JSON scene format using the wonderful nlohmann::json library, and found the whole engine can easily serialize all information into the schema. You can save a scene, or save a single entity as a prefab. They're really the same thing, except that prefabs contain a single top-level entity.
    { "scene": { "entities": [ { "angularVelocity": [ 0.0, 0.0, 0.0 ], "castShadows": true, "collisionType": 0, "color": [ 0.7529412508010864, 1.2352941036224365, 1.5, 1.0 ], "floatPrecision": 32, "guid": "53f8d368-24da-4fba-b343-22afb4237d1b", "hidden": false, "light": { "cacheShadows": true, "coneAngles": [ 45.0, 35.0 ], "range": [ 0.019999999552965164, 12.0 ], "type": 0 }, "mass": 0.0, "matrix": 52, "name": "Point Light 2", "physicsMode": 1, "pickMode": 0, "position": 0, "quaternion": 24, "rotation": 12, "scale": 40, "static": false, "velocity": [ 0.0, 0.0, 0.0 ] } } } } For fast-loading binary data I save an accompanying .bin file. The values you see for position, rotation, etc. are offsets in the binary file where the data is saved.
    From there, it wasn't that much if a stretch to implement Lua State serialization. Lua tables align pretty closely to JSON tables. It's not perfect, but it's close enough I would rather deal with the niggles of that than implement an ASCII data structure that doesn't show syntax highlighting in Visual Studio Code.
    "luaState": { "camera": "ab65bd91-153d-47fb-a11b-ff40c19cd8f4", "cameraheight": 1.7, "camerarotation": "<Vec3>::-4.2,39.1,0", "carriedobject": "9daf54a7-c3b5-4b4b-979b-6b034d6b80fd", "carriedobject_damping": "<Vec2>::0.1,0.1", "carriedobject_gravitymode": true, "carryposition": "<Vec3>::-0.259477,-0.372455,1.95684", "carryrotation": "<Quat>::0.0635833,0.755824,-0.649477,-0.0535437", "interactionrange": 2.5, "listener": "45c93683-ca3b-493a-ad06-b16fe14e4175", "looksmoothing": 2.0, "lookspeed": 0.1, "maxcarrymass": 10.0, "modelfile": "Models/Weapons/Ronan Rifle/scene.gltf", "mouselost": false, "mousemovement": "<Vec2>::-5.00474e-19,3.1102e-22", "movespeed": 5.0, "throwforce": 1500.0, "weapon": "2f8e3d74-26be-4828-9053-a455a9fd05fd", "weaponposition": "<Vec3>::0.12,-0.4,0.42", "weaponrotation": "<Vec3>::-89.9802,-0,0", "weaponswayspeed": 0.1, "weaponswayticks": 2443.5362155621197 } As a result, we now have the ability to easily add quick save of any game, and loading of the game state, automatically, without any special code. The only exception is for entities that are created in code, since they do not have a GUID to trace back to the original loaded scene. This is easily handled with a LoadState() function that gets executed in Lua after a saved game is loaded. In my FPSPlayer script I create a kinematic joint to make the player carry an object when they select it by looking at it and pressing the E key. Since this joint is created in code, there is no way to trace it back to the original scene file. So what I do is first remove the existing joint, if an object is currently being picked up, and then create a new joint, if one has been loaded in the game save file.
    function entity:LoadState() if self.carryjoint ~= nil then self.carryjoint.child:SetGravityMode(self.carriedobject_gravitymode) self.carryjoint.child:SetDamping(self.carriedobject_damping.x, self.carriedobject_damping.y) self.carryjoint:Break() self.carryjoint = nil end if self.carriedobject ~= nil then local pos = TransformPoint(self.carryposition, self.camera, nil) self.carryjoint = CreateKinematicJoint(pos, self.carriedobject) self.carryjoint:SetFriction(1000,1000) end end Here is the result. The player rotation, camera angle, and other settings did not have to be manually programmed. I just saved the scene and reloaded the entity info, and it just works. You can see even the weapon sway timing gets restored exactly the way it was when the game is reloaded from the saved state.
    For most of your gameplay, it will just work automatically. This is a game-changing feature because it enables easy saving and loading of your game state at any time, something that even AAA games sometimes struggle to support.
  19. Josh

    Articles
    2020 was the most intellectually challenging year in my career. Many major advancements were invented, and 2021 will see those items refined, polished, and turned into a usable software product. Here is a partial list of things I created:
    Streaming hierarchal planet-scale terrain system with user-defined deformation and texture projection. Vulkan post-processing stack and transparency with refraction. Vulkan render-to-texture. Major progress on voxel ray tracing. Porting/rewrite of Leadwerks GUI with Implementation in 3D, rendered to texture, and using system drawing. Plugin system for loading and saving textures, models, packages, and processing image and mesh data. Lua debugger with integration in Visual Studio Code. Pixmap class for loading, modifying, compressing, and saving texture data. Vulkan particle system with physics. Implemented new documentation system. Lua and C++ state serialization system with JSON. C++ entity component system and preprocessor. (You don't know anything about this yet.) Not only was this a year of massive technical innovation, but it was also the year when my efforts were put to the test to see if my idea of delivering a massive performance increase for VR was actually possible, or if I was in fact, as some people from the Linux community have called me, "unbelievably delusional". Fortunately, it turned out that I was right, and an as-of-yet-unreleased side-by-side benchmark showing our performance against another major engine proves we offer significantly better performance for VR and general 3D graphics. More on this later...
    Additionally, I authored a paper on VR graphics optimization for a major modeling and simulation conference (which was unfortunately canceled but the paper will be published at a later time). This was another major test because I had to put my beliefs, which I mostly gain from personal experience, into a more quantifiable defensible scientific format. Writing this paper was actually one of the hardest things I have ever done in my life. (I would like to thank Eric Lengyel of Terathon Software for providing feedback on the final paper, as well as my colleagues in other interesting industries.)
    I'm actually kind of floored looking at this list. That is a massive block of work, and there's a lot of really heavy-hitting items. I've never produced such a big volume of output before. I'm expecting 2021 to be less about groundbreaking research and more about turning these technologies into usable polished products, to bring you the benefits all these inventions offer.
  20. Josh

    Articles
    Before finalizing Ultra App Kit I want to make sure our 3D engine works correctly with the GUI system. This is going to be the basis of all our 3D tools in the future, so I want to get it right before releasing the GUI toolkit. This can prevent breaking changes from being made in the future after the software is released.
    Below you can see our new 3D engine being rendered in a viewport created on a GUI application. The GUI is being rendered using Windows GDI+, the same system that draws the real OS interface, while the 3D rendering is performed with Vulkan 1.1. The GUI is using an efficient event-driven program structure with retained mode drawing, while Vulkan rendering is performed asynchronously in real time, on another thread. (The rendering thread can also be set to render only when the viewport needs to be refreshed.)

    The viewport resizes nicely with the window:

    During this process I learned there are two types of child window behavior. If a window is parented to another window it will appear on top of the parent, and it won’t have a separate icon appear in the Windows task bar. Additionally, if the WS_CHILD window style is used, then the child window coordinates will be relative to the parent, and moving the parent will instantly move the child window with it. We need both types of behavior. A splash screen is an example of the first, and a 3D viewport is an example of the second. Therefore, I have added a WINDOW_CHILD window creation flag you can use to control this behavior.
    This design has been my plan going back several years, and at long last we have the result. This will be a strong foundation for creating game development tools like the new engine's editor, as well as other ideas I have.
    This is what "not cutting corners" looks like.
  21. Josh

    Articles
    Ultra App Kit 1.2 is now available on our site and on Steam. This is a bug fix update that resolves numerous small issues reported in the bug reports forum.
    To download the latest version, see My Purchases.
  22. Josh

    Articles
    In Leadwerks, required files were always a slightly awkward issue. The engine requires a BFN texture and a folder of shaders, in order to display anything. One of my goals is to make the Ultra Engine editor flexible enough to work with any game. It should be able to load the folder of an existing game, even if it doesn't use Ultra Engine, and display all the models and scenes with some accuracy. Of course the Quake game directory isn't going to include a bunch of Ultra Engine shaders, so what to do?
    One solution could be to load shaders and other files from the editor directory, but this introduces other issues. My solution is to build shaders, shader families, and the default BRDF texture into the engine itself. This is done with a utility that reads a list of files to includes, then loads each one and turns it into an array in C++ code that gets compiled into the engine: The code looks like this:
    if (rpath == RealPath("Shaders/Sky.json")) { static const std::array<uint64_t, 62> data = {0x61687322090a0d7bULL,0x6c696d6146726564ULL,0xd7b090a0d3a2279ULL,0x746174732209090aULL,0x9090a0d3a226369ULL,0x66220909090a0d7bULL,0xa0d3a2274616f6cULL,0x9090a0d7b090909ULL,0x555141504f220909ULL,0x909090a0d3a2245ULL,0x90909090a0d7b09ULL,0x6c75616665642209ULL,0x909090a0d3a2274ULL,0x909090a0d7b0909ULL,0x6573616222090909ULL,0x90909090a0d3a22ULL,0x909090a0d7b0909ULL,0x7265762209090909ULL,0x5322203a22786574ULL,0x532f737265646168ULL,0x762e796b532f796bULL,0x227670732e747265ULL,0x9090909090a0d2cULL,0x6d67617266220909ULL,0x5322203a22746e65ULL,0x532f737265646168ULL,0x662e796b532f796bULL,0x227670732e676172ULL,0x909090909090a0dULL,0x9090909090a0d7dULL,0x7d090909090a0d7dULL,0xd2c7d0909090a0dULL,0x756f64220909090aULL,0x90a0d3a22656c62ULL,0x909090a0d7b0909ULL,0x45555141504f2209ULL,0x90909090a0d3a22ULL,0x9090909090a0d7bULL,0x746c756166656422ULL,0x90909090a0d3a22ULL,0x90909090a0d7b09ULL,0x2265736162220909ULL,0x9090909090a0d3aULL,0x90909090a0d7b09ULL,0x7472657622090909ULL,0x685322203a227865ULL,0x6b532f7372656461ULL,0x34365f796b532f79ULL,0x732e747265762e66ULL,0x9090a0d2c227670ULL,0x7266220909090909ULL,0x3a22746e656d6761ULL,0x7265646168532220ULL,0x6b532f796b532f73ULL,0x72662e6634365f79ULL,0xd227670732e6761ULL,0x7d0909090909090aULL,0x7d09090909090a0dULL,0xd7d090909090a0dULL,0x90a0d7d0909090aULL,0xa0d7d090a0d7d09ULL,0xcdcdcdcdcdcdcd7dULL }; auto buffer = CreateBuffer(489); buffer->Poke(0,(const char*)data.data(),489); return CreateBufferStream(buffer); } An unsigned 64-bit integer is used for the data type, as this results in the smallest generated code file size.
    Files are searched for in the following order:
    A file on the hard drive in the specified path. A file from a loaded package with the specified relative path. A file built into the engine. Therefore, if your game includes a modified version of a shader, the shader module will still be loaded from the file in your game directory. However, if you don't include any shaders at all, the engine will just fall back on its own set of shaders compiled into the core engine.
    This gives Ultra Engine quite a lot more flexibility in loading scenes and models, and allows creation of 3D applications that can work without any required files at all, while still allowing for user control over the game shaders.
    The screenshot here shows the Ultra Engine editor loading a Leadwerks project folder and displaying 3D graphics using the Ultra Engine renderer, even though the Leadwerks project does not contain any of the shaders and other files Ultra Engine needs to run:

  23. Josh

    Articles
    A while back I wrote enthusiastically about Basis Universal super compression. KTX2 is a texture file format from Khronos, makers of the Vulkan and glTF specifications. Like DDS files, KTX2 can store multiple mipmaps, as well as memory-compressed texture formats like DXT5 and BC7. However, KTX2 now supports Basis compressed data as well, which makes it the all-in-one universal texture format. glTF has an official extension for KTX2 textures in glTF files, so it can be combined with Draco mesh compression to compress your overall game model sizes:

    Additionally, KTX2 also includes information about clamp and filter settings,. The reason I implemented the .tex texture format in Leadwerks was because DDS lacks these features and I wanted it stored in the texture file.
    I've added built-in KTX2 texture loading and saving, so you can easily save and load these files. I plan to make KTX2 the recommended texture file format for Ultra Engine.

  24. Josh

    Articles
    I've now got basic specular reflections working with the sparse voxel octree system. This uses much less memory than a voxel grid or even a compressed volume texture. It also supports faster optimized ray tests, for higher quality reflections and higher resolution. Some of the images in this article were not possible to produce in my initial implementation that used volume textures.
    This shot shows the reflection of just the diffuse color. Notice the red column is visible in three reflections, but not in the reflected floor. It would be possible to add a secondary bounce to add reflections in reflections:

    With direct lighting added to the reflection, and the resolution turned up a bit, we can see the ray tracing is getting quite detailed. Of course, we prefer to use blurred downsampled data for voxel ray tracing, but the results so far indicate there is enough data to produce a good final result:

    In the shot below we are using a voxel size of about three centimeters, in a 1024x1024x1024 sparse voxel octree. A simple voxel grid would require 32 GB of video memory, but our structure fits into less than 240 MB.

    Turning the reflectivity up for all materials doesn't really look good and creates a confusing look, but it's still interesting to see. The amount of detail we see in the voxel reflections is quite good. The voxels are so high resolution we can even see the texture details of the original polygon mesh!


    The speed of the octree traversal routine is very important here, and I am in contact with some university faculty to see about implementing something special to give you the maximum possible performance.
    The next step is to downsample the octree data to display blurrier reflections. This will also be used for calculating GI.
  25. Josh
    Previously I wrote about introducing latency to the voxel cone step tracing realtime global illumination system. The idea here is to improve performance and quality, at the cost of a small delay when the GI calculation gets updated. The diffuse GI lighting gets cached so the final scene render is very fast.
    Here's what a gradual GI update does. Of course, this will be running unseen in the background for the final version, but this shows what is actually happening:

    My new video project1.mp4 Vulkan has a sophisticated system for supporting multiple device queues. I initially thought I could just run the GI update on a separate queue, like a low-priority CPU thread running in the background:

    Unfortunately, this is not really possible on today's hardware. Even with the lowest queue priority setting, the GI queue hogs the whole GPU and causes the rendering queue to stall out. So I had to stick with doing everything in the main render queue.
    The GI calculation only updates when the camera moves a certain distance. There is a latency setting that controls how many substeps the task is broken up into. High latency means many small steps, so the framerate will not dip much when GI is updating. Low latency means the GI will significantly decrease the framerate every time the camera triggers an update. It is possible to set up a combination of high resolution and low latency that will cause the render queue to stall out. If this happens the program will encounter a VK_ERROR_DEVICE_LOST error. I don't know how to prevent this for now, other than just don't use ridiculous settings.
    Here you can see the GI updating seamlessly as the camera moves around. I actually have to use three sets of volume textures, one for rasterization and direct lighting, and then the final GI data is stored in a set of two textures that alternate back and forth. This allows me to display the previous results while the next update is still processing, and the two textures get swapped when the update is finished.

    962180265_Mynewvideoproject1.mp4 A couple of small issues remain.
    The transition between textures needs to be smoothed, to handle changes to the environment. I need a way to calculate the diffuse GI for dynamic objects that don't appear in the GI reflections. (I have an idea.) The items are not too terribly difficult and they will be done in short order. I'm very happy with how this has turned out. It provides quality real-time global illumination without compromising performance, and it will work very well with VR.
×
×
  • Create New...