Jump to content

Josh

Staff
  • Posts

    23,142
  • Joined

  • Last visited

Blog Entries posted by Josh

  1. Josh
    I've had some more time to work with the Lua debugger in Leadwerks Game Engine 5 beta, and it's really amazing.  Adding the engine classes into the debug information has been pretty simple. All it takes is a class function that adds members into a table and returns it to Lua.
    sol::table Texture::debug(sol::this_state ts) const { auto t = Object::debug(ts); t["size"] = size; t["format"] = format; t["type"] = type; t["flags"] = flags; t["samples"] = samples; t["faces"] = faces; return t; } The base Object::debug function will add all the custom properties that you attach to the object:
    sol::table Object::debug(sol::this_state ts) const { sol::table t(ts, sol::create); for (auto& pair : entries) { if (pair.second.get_type() == sol::type::function) continue; if (Left(pair.first, 1) == "_") continue; t[pair.first] = pair.second; } return t; } This allows you to access both the built-in class members and your own values you attach to an object. You can view all these variables in the side panel while debugging, in alphabetical order:

    You can even hover over a variable to see its contents!

    The Lua debugger in Leadwerks 4 just sends a static stack of data to the IDE that is a few levels deep, but the new Lua debugger in VS Code will actually allow you to traverse the code and look all around your program. You can drill down as deep as you want, even viewing the positions of individual vertices in a model:

    This gives us a degree of power we've never had before with Lua game coding. Programming games with Lua will be easier than ever in our new game engine, and it's easy to add your own C++ classes to the environment.
  2. Josh
    In Leadwerks Game Engine 4, terrain was a static object that could only be modified in the editor. Developers requested access to the terrain API but it was so complex I felt it was not a good idea to expose it. The new terrain system is better thought out and more flexible, but still fairly complicated because you can do so much with it. This article is a deep dive into the inner workings of the new terrain system.
    Creating Terrain
    Terrain can be treated as an editable object, which involves storing more memory, or as a static object, which loads faster and consumers less memory. There isn't two different types of terrain, it's just that you can skip loading some information if you don't plan on making the terrain deform once it is loaded up, and the system will only allocate memory as it is needed, based on your usage. The code below will create a terrain consisting of 2048 x 2048 points, divided into patches of 32 x 32 points.
    local terrain = CreateTerrain(world, 2048, 32) This will scale the terrain so there is one point every meter, with a maximum height of 100 meters and a minimum height of -100 meters. The width and depth of the terrain will both be a little over two kilometers:
    terrain:SetScale(1,200,1) Loading Heightmaps
    Let's look at how to load a heightmap and apply it to a terrain. Because RAW files do not contain any information on size or formats, we are going to first load the pixel data into a memory buffer and then create a pixmap from that with the correct parameters:
    --We have to specify the width, height, and format then create the pixmap from the raw pixel data. local buffer = LoadBuffer("Terrain/2048/2048.r16") local heightmap = CreatePixmap(2048, 2048, TEXTURE_R16, buffer) --Apply the heightmap to the terrain terrain:SetHeightMap(heightmap) Because we can now export image data we have some options. If we wanted we could save the loaded heightmap in a different format. I like R16 DDS files for these because unlike RAW/R16 heightmap data these images can be viewed in a DDS viewer like the one included in Visual Studio:
    heightmap:Save("Terrain/2048/2048_H.dds") Here is what it looks like if I open the saved file with Visual Studio 2019:

    After we have saved that file, we can then just load it directly and skip the RAW/R16 file:
    --Don't need this anymore! --local buffer = LoadBuffer("Terrain/2048/2048.r16") --local heightmap = CreatePixmap(2048, 2048, TEXTURE_R16, buffer) --Instead we can do this: local heightmap = LoadPixmap("Terrain/2048/2048_H.dds") --Apply the heightmap to the terrain terrain:SetHeightMap(heightmap) This is what is going on under the hood when you set the terrain heightmap:
    bool Terrain::SetHeightMap(shared_ptr<Pixmap> heightmap) { if (heightmap->size != resolution) { Print("Error: Pixmap size is incorrect."); return false; } VkFormat fmt = VK_FORMAT_R16_UNORM; if (heightmap->format != fmt) heightmap = heightmap->Convert(fmt); if (heightmap == nullptr) return false; Assert(heightmap->pixels->GetSize() == sizeof(terraindata->heightfielddata[0]) * terraindata->heightfielddata.size()); memcpy(&terraindata->heightfielddata[0], heightmap->pixels->buf, heightmap->pixels->GetSize()); ModifyHeight(0,0,resolution.x,resolution.y); return true } There is something important to take note of here. There are two copies of the height data. One is stored in system memory and is used for physics, raycasting, pathfinding, and other functions. The other copy of the height data is stored in video memory and is used to adjust the vertex heights when the terrain is drawn. In this case, the data is stored in the same format, just a single unsigned 16-bit integer, but other types of terrain data may be stored in different formats in system memory (RAM) and video memory (VRAM).
    Building Normals
    Now let's give the terrain some normals for nice lighting. The simple way to do this is to just recalulate all normals across the terrain. The new normals will be copied into the terrain normal texture automatically:
    terrain:BuildNormals() However, updating all the normals across the terrain is a somewhat time-consuming process. How time consuming is it? Let's find out:
    local tm = Millisecs() terrain:BuildNormals() Print(Millisecs() - tm) The printed output in the console says the process takes 1600 milliseconds (1.6 seconds) in debug mode and 141 in milliseconds release mode. That is quite good but the task is distributed across 8 threads on this machine. What if someone with a slower machine was working with a bigger terrain? If I disable multithreading, the time it takes is 7872 milliseconds in debug mode and 640 milliseconds in release mode. A 4096 x 4096 terrain would take four times as long, creating a 30 second delay before the game started, every single time it was run in debug mode. (In release mode it is so fast it could be generated dynamically all the time.) Admittedly, a developer using a single-core processor to debug a game with a 4096 x 4096 terrain is a sort of extreme case, but the whole design approach for Leadwerks 5 has been to target the extreme cases, like the ones I see while working on virtual reality projects at NASA.
    What can we do to eliminate this delay? The answer is caching. We can retrieve a pixmap from the terrain after building the normals, save it, and then load the normals straight from that file next time the game is run.
    --Build normals for the entire terrain terrain:BuildNormals() --Retrieve a pixmap containing the normals in R8G8 format normalmap = terrain:GetNormalMap() --Save the pixmap as an uncompressed R8G8 DDS file, which will be loaded next time as a texture normalmap:Save("Terrain/2048/2048_N.dds") There is one catch. If you ran the code above there would be no DDS file saved. The reason for this is that internally, the terrain system stores each point's normal as two bytes representing two axes of the vector. Whenever the third axis is needed, it is calculated from the other two with this formula:
    normal.z = sqrt(max(0.0f, 1.0f - (normal.x * normal.x + normal.y * normal.y))); The pixmap returned from the GetNormalMap() method therefore uses the format TEXTURE_RG, but the DDS file format does not support two-channel uncompressed images. In order to save this pixmap into a DDS file we have to convert it to a supported format. We will use TEXTURE_RGBA. The empty blue and alpha channels double the file size but we won't worry about that right now.
    --Build normals for the entire terrain terrain:BuildNormals() --Retrieve a pixmap containing the normals in R8G8 format normalmap = terrain:GetNormalMap() --Convert to a format that can be saved as an image normalmap = normalmap:Convert(TEXTURE_RGBA) --Save the pixmap as an uncompressed R8G8 DDS file, which will be loaded next time as a texture normalmap:Save("Terrain/2048/2048_N.dds") When we open the resulting file in Visual Studio 2019 we see a funny-looking normal map. This is just because the blue channel is pure black, for reasons explained.

    In my initial implementation I was storing the X and Z components of the normal, but I switched to X and Y. The reason for this is that I can use a lookup table with the Y component, since it is an unsigned byte, and use that to quickly retrieve the slope at any terrain point:
    float Terrain::GetSlope(const int x, const int y) { if (terraindata->normalfielddata.empty()) return 0.0f; return asintable[terraindata->normalfielddata[(y * resolution.x + x) * 2 + 1]]; } This is much faster than performing the full calculation, as shown below:
    float Terrain::GetSlope(const int x, const int y) { int offset = (y * resolution.x + x) * 2; int nx = terraindata->normalfielddata[offset + 0]; int ny = terraindata->normalfielddata[offset + 1]; Vec3 normal; normal.x = (float(nx) / 255.0f - 0.5f) * 2.0f; normal.y = float(ny) / 255.0f; normal.z = sqrt(Max(0.0f, 1.0f - (normal.x * normal.x + normal.y * normal.y))); normal /= normal.Length(); return 90.0f - ASin( normal.y ); } Since the slope is used in expensive layering operations and may be called millions of times, it makes sense to optimize it.
    Now we can structure our code so it first looks for the cached normals image and loads that before performing the time-consuming task of building normals from scratch:
    --Load the saved normal data as a pixmap local normalmap = LoadPixmap("Terrain/2048/2048_N.dds") if normalmap == nil then --Build normals for the entire terrain terrain:BuildNormals() --Retrieve a pixmap containing the normals in R8G8 format normalmap = terrain:GetNormalMap() --Convert to a format that can be saved as an image normalmap = normalmap:Convert(TEXTURE_RGBA) --Save the pixmap as an uncompressed R8G8 DDS file, which will be loaded next time as a texture normalmap:Save("Terrain/2048/2048_N.dds") else --Apply the texture to the terrain. (The engine will automatically create a more optimal BC5 compressed texture.) terrain:SetNormalMap(normalmap) end The time it takes to load normals from a file is pretty much zero, so in the worst-case scenario described we just eliminated a huge delay when the game starts up. This is just one example of how the new game engine is being designed with extreme scalability in mind.
    Off on a Tangent...
    Tangents are calculated in the BuildNormals() routine at the same time as normals, because they both involve a lot of shared calculations. We could use the Terrain:GetTangentMap() method to retrieve another RG image, convert it to RGBA, and save it as a second DDS file, but instead let's just combine normals and tangents with the Terain:GetNormalTangentMap() method you did not know existed until just now. Since that returns an RGBA image with all four channels filled with data there is no need to convert the format. Our code above can be replaced with the following.
    --Load the saved normal data as a pixmap local normaltangentmap = LoadPixmap("Terrain/2048/2048_NT.dds") if normaltangentmap == nil then --Build normals for the entire terrain terrain:BuildNormals() --Retrieve a pixmap containing the normals in R8G8 format normaltangentmap = terrain:GetNormalTangentMap() --Save the pixmap as an uncompressed R8G8 DDS file, which will be loaded next time as a texture normaltangentmap:Save("Terrain/2048/2048_NT.dds") else --Apply the texture to the terrain. (The engine will automatically create a more optimal BC5 compressed texture.) terrain:SetNormalTangentMap(normaltangentmap) end This will save both normals and tangents into a single RGBA image that looks very strange:

    Why do we even have options for separate normal and tangent maps? This allows us to save both as optimized BC5 textures, which actually do use two channels of data. This is the same format the engine uses internally, so it will give us the fastest possible loading speed and lowest memory usage, but it's really only useful for static terrain because getting the data back into a format for system memory would require decompression of the texture data:
    --Retrieve a pixmap containing the normals in R8G8 format normalmap = terrain:GetNormalMap() tangentmap = terrain:GetTangentMap() --Convert to optimized BC5 format normalmap = normalmap:Convert(TEXTURE_BC5) tangentmap = tangentmap:Convert(TEXTURE_BC5) --Save the pixmaps as an compressed BC5 DDS file, which will be loaded next time as a texture normalmap:Save("Terrain/2048/2048_N.dds") tangentmap:Save("Terrain/2048/2048_T.dds") When saved, these two images combined will use 50% as much space as the uncompressed RGBA8 image, but again don't worry about storage space for now. The saved normal map looks just the same as the uncompressed RGBA version, and the tangent map looks like this:

    Material Layers
    Terrain material layers to make patches of terrain look like rocks, dirt, or snow work in a similar manner but are still under development and will be discussed in detail later. For now I will just show how I am adding three layers to the terrain, setting some constraints for slope and height, and then painting the material across the entire terrain.
    --Add base layer local mtl = LoadMaterial("Materials/Dirt/dirt01.mat") local layerID = terrain:AddLayer(mtl) --Add rock layer mtl = LoadMaterial("Materials/Rough-rockface1.json") local rockLayerID = terrain:AddLayer(mtl) terrain:SetLayerSlopeConstraints(rockLayerID, 35, 90, 25) --Add snow layer mtl = LoadMaterial("Materials/Snow/snow01.mat") local snowLayerID = terrain:AddLayer(mtl) terrain:SetLayerHeightConstraints(snowLayerID, 50, 1000, 8) terrain:SetLayerSlopeConstraints(snowLayerID, 0, 35, 10) --Apply Layers terrain:SetLayer(rockLayerID, 1.0) terrain:SetLayer(snowLayerID, 1.0) Material layers can take a significant time to process, at least in debug mode, as we will see later. Fortunately all this data can be cached in a manner similar to what we saw with normals and tangents. This also produces some very cool images:

    Optimizing Load Time
    The way we approach terrain building depends on the needs of each game or application. Is the terrain static or dynamic? Do we want changes in the application to be saved back out to the hard drive to be retrieved later? We already have a good idea of how to manage dynamic terrain data, now let's look at static terrains, which will provide faster load times and a little bit lower memory usage.
    Terrain creation is no different than before:
    local terrain = CreateTerrain(world, 2048, 32) terrain:SetScale(1,200,1) Loading the heightmap works the same as before. I am using the R16 DDS file here but it makes absolutely no difference in terms of loading speed, performance, or memory usage.
    --Load heightmap local heightmap = LoadPixmap("Terrain/2048/2048_H.dds") --Apply the heightmap to the terrain terrain:SetHeightmap(heightmap) Now here is where things get interesting. Remember how I talked about the terrain data existing in both system and video memory? Well, I am going to let you in on a little secret: We don't actually need the normal and tangent data in system memory if we aren't editing the terrain. We can load the optimized BC5 textures and apply them directly to the terrain's material, and it won't even realize what happened!:
    --Load the saved normal data as texture local normaltexture = LoadTexture("Terrain/2048/2048_N.dds") --Apply the normal texture to the terrain material terrain.material:SetTexture(normaltexture, TEXTURE_NORMAL) --Load the saved tangent data as texture local tangenttexture = LoadTexture("Terrain/2048/2048_T.dds") --Apply the normal texture to the terrain material terrain.material:SetTexture(tangenttexture, TEXTURE_TANGENT) Because we never fed the terrain any normal or tangent data, that memory will never get initialized, saving us 16 megabytes of system memory on a 2048 x 2048 terrain. We also save the time of compressing two big images into BC5 format at runtime. In the material layer system, which will be discussed at a later time, this approach will save 32 megabytes of memory and some small processing time, Keep in mind all those numbers increase four times with the next biggest terrain.
    In debug mode the static and cached dynamic test are not bad, but the first time the dynamic test is run there is a long delay of  60 40 25 15 seconds (explanation at the end of this section). We definitely don't want that happening every time you debug your game. Load times are in milliseconds.
    Dynamic Terrain (Debug, First Run)
    Loading time: 15497 Dynamic Terrain (Debug, Cached Data)
    Loading time: 1606 Static Terrain (Debug)
    Loading time: 1078 When the application is run in release mode the load times are all very reasonable, although the static mode loads about five times faster than building all the data at runtime. Memory usage does not vary very significantly. Memory shown is in megabytes.
    Dynamic Terrain (Release, First Run)
    Loading time (milliseconds): 1834 Memory usage (MB): 396 Dynamic Terrain (Release, Cached Data)
    Loading time: 386 Memory usage: 317 Static Terrain (Release)
    Loading time: 346 Memory usage: 311 The conclusion is that making use of cached textures and only using dynamic terrains when you need them can significantly improve your load times when running in debug mode, which you will be doing during the majority of the time during development. If you don't care about any of these details it will be automatically handled for you when you save your terrain in the new editor but if you are creating terrains programmatically this is important to understand. If you are loading terrain data from the hard drive dynamically as the game runs (streaming terrain) then these optimizations could be very important.
    While writing this article I found that I could greatly decrease the loading time in debug mode when I replaced STL with my own sorting routines in some high-performance code. STL usually runs very fast but in debug mode can be onerous. It's scary stuff, but I actually remember doing this same routine back when I was using Blitz3D, which if I remember correctly did not having any sorting functions. I found this ran slightly faster than STL in release mode and much faster in debug mode. I was able to bring one computationally expensive routine down from 20 seconds to 4 seconds (in debug mode only, it runs fine in release either way).
    //Scary homemade sorting firstitem = 0; lastitem = mtlcount - 1; sortcount = 0; while (true) { minalpha = 0; minindex = -1; for (n = firstitem; n <= lastitem; ++n) { if (listedmaterials[n].y == -1) continue; if (minindex == -1) { minalpha = listedmaterials[n].x; minindex = n; } else { if (listedmaterials[n].x < minalpha) { minalpha = listedmaterials[n].x; minindex = n; } } } if (minindex == -1) break; if (minindex == firstitem) ++firstitem; if (minindex == lastitem) --lastitem; sortedmaterials[sortcount] = listedmaterials[minindex]; listedmaterials[minindex].y = -1; ++sortcount; } There may be some opportunities for further performance increase in some of the high-performance terrain code. It's just a matter of how much time I want to put into this particular aspect of the engine right now.
    Optimizing File Size
    Basis Universal is the successor to the Crunch library. The main improvement it makes is support for modern compression formats (BC5 for normals and BC7 to replace DXT5). BasisU is similar to OGG/MP3 compression in that it doesn't reduce the size of the data in memory, but it can significantly reduce the size when it is saved to a file. This can reduce the size of your game's data files. It's also good for data that is downloaded dynamically, like large GIS data sets. I have seen people claim this can improve load times but I have never seen any proof of this and I don't believe it is correct.
    Although we do not yet support BasisU files, I wanted to run the compressable files through it and see what how much hard drive space we could save. I am including only the images needed for the static terrain method, since that is how large data sets would most likely be used.
    Uncompressed (R16 / RGBA): 105 megabytes Standard Texture Compression (DXT5 + BC5): 48 megabytes Standard Texture Compression + Zip Compression: 18.7 megabytes BasisU + Standard Texture Compression: 26 megabytes BasisU + Standard Texture Compression + Zip Compression: 10.1 megabytes If we just look at one single 4096x4096 BC3 (DXT5) DDS file, when compressed in a zip file it is 4.38 megabytes. When compressed in a BasisU file, it is only 1.24 megabytes.
    4096x4096 uncompressed RGBA: 64 megabytes 4096x4096 DXT5 / BC3: 16 megabytes 4096x4096 DXT5 / BC3 + zip compression: 4.38 megabytes 4096x4096 BasisU: 1.24 megabytes It looks like we can save a fair amount of data by incorporating BasisU into our pipeline. However, the compression times are longer than we would want to use for terrain that is being frequently saved in the editor, and it should be performed in a separate step before final packaging of the game. With the open-source plugin SDK anyone could add a plugin to support this right now. There is also some texture data that should not be compressed, so our savings with BasisU is less than what we would see for normal usage. In general, it appears that BasisU can cut the size of your game files down to about a third of what they would be in a zip file.
    A new update with these changes will be available in the beta tester forum later today.
  3. Josh
    Previously I talked about the technical details of hardware tessellation and what it took to make it truly useful. In this article I will talk about some of the implications of this feature and the more advanced ramifications of baking tessellation into Turbo Game Engine as a first-class feature in the 
    Although hardware tessellation has been around for a few years, we don't see it used in games that often. There are two big problems that need to be overcome.
    We need a way to prevent cracks from appearing along edges. We need to display a consistent density of triangles on the screen. Too many polygons is a big problem. I think these issues are the reason you don't really see much use of tessellation in games, even today. However, I think my research this week has created new technology that will allow us to make use of tessellation as an every-day feature in our new Vulkan renderer.
    Per-Vertex Displacement Scale
    Because tessellation displaces vertices, any discrepancy in the distance or direction of the displacement, or any difference in the way neighboring polygons are subdivided, will result in cracks appearing in the mesh.

    To prevent unwanted cracks in mesh geometry I added a per-vertex displacement scale value. I packed this value into the w component of the vertex position, which was not being used. When the displacement strength is set to zero along the edges the cracks disappear:

    Segmented Primitives
    With the ability to control displacement on a per-vertex level, I set about implementing more advanced model primitives. The basic idea is to split up faces so that the edge vertices can have their displacement scale set to zero to eliminate cracks. I started with a segmented plane. This is a patch of triangles with a user-defined size and resolution. The outer-most vertices have a displacement value of 0 and the inner vertices have a displacement of 1. When tessellation is applied to the plane the effect fades out as it reaches the edges of the primitive:

    I then used this formula to create a more advanced box primitive. Along the seam where the edges of each face meet, the displacement smoothly fades out to prevent cracks from appearing.

    The same idea was applied to make segmented cylinders and cones, with displacement disabled along the seams.


    Finally, a new QuadSphere primitive was created using the box formula, and then normalizing each vertex position. This warps the vertices into a round shape, creating a sphere without the texture warping that spherical mapping creates.

    It's amazing how tessellation and displacement can make these simple shapes look amazing. Here is the full list of available commands:
    shared_ptr<Model> CreateBox(shared_ptr<World> world, const float width = 1.0); shared_ptr<Model> CreateBox(shared_ptr<World> world, const float width, const float height, const float depth, const int xsegs = 1, const int ysegs = 1); shared_ptr<Model> CreateSphere(shared_ptr<World> world, const float radius = 0.5, const int segments = 16); shared_ptr<Model> CreateCone(shared_ptr<World> world, const float radius = 0.5, const float height = 1.0, const int segments = 16, const int heightsegs = 1, const int capsegs = 1); shared_ptr<Model> CreateCylinder(shared_ptr<World> world, const float radius = 0.5, const float height=1.0, const int sides = 16, const int heightsegs = 1, const int capsegs = 1); shared_ptr<Model> CreatePlane(shared_ptr<World> world, cnst float width=1, const float height=1, const int xsegs = 1, const int ysegs = 1); shared_ptr<Model> CreateQuadSphere(shared_ptr<World> world, const float radius = 0.5, const int segments = 8); Edge Normals
    I experimented a bit with edges and got some interesting results. If you round the corner by setting the vertex normal to point diagonally, a rounded edge appears.

    If you extend the displacement scale beyond 1.0 you can get a harder extended edge.

    This is something I will experiment with more. I think CSG brush smooth groups could be used to make some really nice level geometry.
    Screen-space Tessellation LOD
    I created an LOD calculation formula that attempts to segment polygons into a target size in screen space. This provides a more uniform distribution of tessellated polygons, regardless of the original geometry. Below are two cylinders created with different segmentation settings, with tessellation disabled:

    And now here are the same meshes with tessellation applied. Although the less-segmented cylinder has more stretched triangles, they both are made up of triangles about the same size.

    Because the calculation works with screen-space coordinates, objects will automatically adjust resolution with distance. Here are two identical cylinders at different distances.

    You can see they have roughly the same distribution of polygons, which is what we want. The same amount of detail will be used to show off displaced edges at any distance.

    We can even set a threshold for the minimum vertex displacement in screen space and use that to eliminate tessellation inside an object and only display extra triangles along the edges.

    This allows you to simply set a target polygon size in screen space without adjusting any per-mesh properties. This method could have prevented the problems Crysis 2 had with polygon density. This also solves the problem that prevented me from using tessellation for terrain. The per-mesh tessellation settings I worked on a couple days ago will be removed since it is not needed.
    Parallax Mapping Fallback
    Finally, I added a simple parallax mapping fallback that gets used when tessellation is disabled. This makes an inexpensive option for low-end machines that still conveys displacement.

    Next I am going to try processing some models that were not designed for tessellation and see if I can use tessellation to add geometric detail to low-poly models without any cracks or artifacts.
  4. Josh
    Heat haze is a difficult problem. A particle emitter is created with a transparent material, and each particle warps the background a bit. The combined effect of lots of particles gives the whole background a nice shimmering wavy appearance. The problem is that when two particles overlap one another they don't blend together, because the last particle drawn is using the background of the solid world for the refracted image. This can result in a "popping" effect when particles disappear, as well as apparent seams on the edges of polygons.

    In order to do transparency with refraction the right way, we are going to render all our transparent objects into a separate color texture and then draw that texture on top of the solid scene. We do this in order to accommodate multiple layers of transparency and refraction. Now, the correct way to handle multiple layers would be to render the solid world, render the first transparency object, then switch to another framebuffer and use the previous framebuffer color attachment for the source of your refraction image. This could be done per-object, although it could get very expensive, flipping back and forth between two framebuffers, but that still wouldn't be enough.
    If we render all the transparent surfaces into a single image, we can blend their normals, refractive index, and other properties, and come up with a single refraction vector that combined the underlying surfaces in the best way possible.
    To do this, the transparent surface color is rendered into the first color attachment. Unlike deferred lighting, the pixels at this point are fully lit.

    The screen normals are stored in an additional color attachment. I am using world normals in this shot but later below I switched to screen normals:

    These images are drawn on top of the solid scene to render all transparent objects at once. Here we see the green box in the foreground is appearing in the refraction behind the glass dragon.

    To prevent this from happening, we need add another color texture to the framebuffer and render the pixel Z position into it. I am using the R32_SFLOAT format. I use the separate blend mode feature in Vulkan, and set the blend mode to minimum so that the smallest value always gets saved in the texture. The Z-position is divided by the camera far range in the fragment shader, so that the saved values are always between 0 and 1. The clear color for this attachment is set to 1,1,1,1, so any value written into the buffer will replace the background. Note this is the depth of the transparent pixels, not the whole scene, so the area in the center where the dragon is occluded by the box is pure white, since those pixels were not drawn.

    In the transparency pass, the Z position of the transparent pixel is compared to the Z position at the refracted texcoords. If the refracted position is closer to the camera than the transparent surface, the refraction is disabled for that pixel and the background directly behind the pixel is shown instead. There is some very slight red visible in the refraction, but no green.

    Now let's see how well this handles heat haze / distortion. We want to prevent the problem when two particles overlap. Here is what a particle emitter looks like when rendered to the transparency framebuffer, this time using screen-space normals. The particles aren't rotating so there are visible repetitions in the pattern, but that's okay for now.

    And finally here is the result of the full render. As you can see, the seams and popping is gone, and we have a heavy but smooth distortion effect. Particles can safely overlap without causing any artifacts, as their normals are just blended together and combined to create a single refraction angle.

  5. Josh

    Articles
    Midjourney is an AI art generator you can interact with on Discord to make content for your game engine. To use it, first join the Discord channel and enter one of the "newbie" rooms. To generate a new image, just type "/imagine" followed by the keywords you want to use. The more descriptive you are, the better. After a few moments four different images will be shown. You can upsample or create new variations of any of the images the algorithm creates.

    And then the magic begins:

    Here are some of the images I "created" in a few minutes using the tool:

    I'm really surprised by the results. I didn't think it was possible for AI to demonstrate this level of spatial reasoning. You can clearly see that it has some kind of understanding of 3D perspective and lighting. Small errors like the misspelling of "Quake" as "Quke" only make it creepier, because it means the AI has a deep level of understanding and isn't just copying and pasting parts of images.
    What do you think about AI-generated artwork? Do you have any of your own images you would like to show off? Let me know in the comments below.
  6. Josh
    Not long ago, I wrote about my experiments with AI-generated textures for games. I think the general consensus at the time was that the technology was interesting but not very useful in its form at the time. Recently, I had reason to look into the OpenAI development SDK, because I wanted to see if it was possible to automatically convert our C++ documentation into documentation for Lua. While looking at that, I started poking around with the image generation API, which is now using DALL-E 2. Step by step, I was able to implement AI texture generation in the new editor and game engine, using only a Lua script file and an external DLL. This is available in version 1.0.2 right now:

    Let's take a deep dive into how this works...
    Extension Script
    The extension is run by placing a file called "OpenAI.lua" in the "Scripts/Start/Extensions" folder. Everything in the Start folder gets run automatically when the editor starts up, in no particular order. At the top of the script we create a Lua table and load a DLL module that contains a few functions we need:
    local extension = {} extension.openai = require "openai" Next we declare a function that is used to process events. We can skip its inner workings for now:
    function extension.hook(event, extension) end We need a way for the user to activate our extension, so we will add a menu item to the "Scripting" submenu. The ListenEvent call will cause our hook function to get called whenever the user selects the menu item for this extension.  Note that we are passing the extension table itself in the event listener's extra parameter. There is no need for us to use a global variable for the extension table, and it's better that we don't.
    local menu = program.menu:FindChild("Scripting", false) if menu ~= nil then local submenu = menu:FindChild("OpenAI", false) if submenu == nil then submenu = CreateMenu("", menu)-- divider submenu = CreateMenu("OpenAI", menu) end extension.menuitem = CreateMenu("Text to Image", submenu) end ListenEvent(EVENT_WIDGETACTION, extension.menuitem, extension.hook, extension) This gives us a menu item we can use to bring up our extension's window.

    The next section creates a window, creates a user interface, and adds some widgets to it. I won't paste the whole thing here, but you can look at the script to see the rest:
    extension.window = CreateWindow("Text to Image", 0, 0, winx, winy, program.window, WINDOW_HIDDEN | WINDOW_CENTER | WINDOW_TITLEBAR) Note the window is using the WINDOW_HIDDEN style flag so it is not visible when the program starts. We're also going to add event listeners to detect when the window is closed, and when a button is pressed:
    ListenEvent(EVENT_WINDOWCLOSE, extension.window, extension.hook, extension) ListenEvent(EVENT_WIDGETACTION, extension.button, extension.hook, extension) The resulting tool window will look something like this:

    Now let's take a look at that hook function. We made three calls to ListenEvent, so that means we have three things the function needs to evaluate. Selecting the menu item for this extension will cause our hidden window to become visible and be activated:
    elseif event.id == EVENT_WIDGETACTION then if event.source == extension.menuitem then extension.window:SetHidden(false) extension.window:Activate() When the user closes the close button on the tool window, the window gets hidden and the main program window is activated:
    if event.id == EVENT_WINDOWCLOSE then if event.source == extension.window then extension.window:SetHidden(true) program.window:Activate() end Finally, we get to the real point of this extension, and write the code that should be executed when the Generate button is pressed. First we get the API key from the text field, passing it to the Lua module DLL by calling openal.setapikey.
    elseif event.source == extension.button then local apikey = extension.apikeyfield:GetText() if apikey == "" then Notify("API key is missing", "Error", true) return false end extension.openai.setapikey(apikey) Next we get the user's description of the image they want, and figure out what size it should be generated at. Smaller images generate faster and cost a little bit less, if you are using a paid OpenAL plan, so they can be good for testing ideas. The maximum size for images is currently 1021x1024.
    local prompt = extension.promptfield:GetText() local sz = 512 local i = extension.sizefield:GetSelectedItem() if i == 1 then sz = 256 elseif i == 3 then sz = 1024 end The next step is to copy the user's settings into the program settings so they will get saved when the program closes. Since the main program is using a C++ table for the settings, both Lua and the main program can easily share the same information:
    --Save settings if type(program.settings.extensions) ~= "userdata" then program.settings.extensions = {} end if type(program.settings.extensions.openai) ~= "userdata" then program.settings.extensions.openai = {} end program.settings.extensions.openai.apikey = apikey program.settings.extensions.openai.prompt = prompt program.settings.extensions.openai.size = {} program.settings.extensions.openai.size[1] = sz program.settings.extensions.openai.size[2] = sz Extensions should save their settings in a sub-table in the "extensions" table, so keep data separate from the main program and other extensions. When these settings are saved in the settings.json file, they will look like this. Although generated images must be square, I opted to save both width and height in the settings, for possible future compatibility.
    "extensions": { "openai": { "apikey": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "prompt": "tiling seamless texture warehouse painted concrete wall abandoned dirty", "size": [ 1024, 1024 ] } }, Finally, we call the module function to generate the image, which may take a couple of minutes. If its successful we load the resulting image as a pixmap, create a texture from it, and then open that texture in a new asset editor window. This is done to eliminate the asset path, so the asset editor doesn't know where the file was loaded from. We also make a call to AssetEditor:Modify, which will cause the window to display a prompt if it is closed without saving. This prevents the user's project folder from filling up with a lot of garbage images they don't want to keep.
    if extension.openai.newimage(sz, sz, prompt, savepath) then local pixmap = LoadPixmap(savepath) if pixmap ~= nil then local tex = CreateTexture(TEXTURE_2D, pixmap.size.x, pixmap.size.y, pixmap.format, {pixmap}) tex.name = "New texture" local asseteditor = program:OpenAsset(tex) if asseteditor ~= nil then asseteditor:Modify() else Print("Error: Failed to open texture in asset editor."); end end else Print("Error: Image generation failed.") end The resulting extension provides an interface we can use to generate a variety of interesting textures. I think you will agree, these are quite a lot better than what we had just a few months ago.





    Of course the Lua debugger in Visual Studio Code came in very handy while developing this.

    That's pretty much all there is to the Lua side of things. Now let's take a closer look at the module code.
    The Module
    Lua modules provide a mechanism whereby Lua can execute C++ code packed into a dynamic linked library. The DLL needs to contain one retired function, which is luaopen_ plus the name of the module, without any extension. The module file is "openai.dll" so we will declare a function called luaopen_openai:
    extern "C" { __declspec(dllexport) int luaopen_openai(lua_State* L) { lua_newtable(L); int sz = lua_gettop(L); lua_pushcfunction(L, openai_newimage); lua_setfield(L, -2, "newimage"); lua_pushcfunction(L, openai_setapikey); lua_setfield(L, -2, "setapikey"); lua_pushcfunction(L, openai_getlog); lua_setfield(L, -2, "getlog"); lua_settop(L, sz); return 1; } } This function creates a new table and adds some function pointers to it, and returns the table. (This is the table we will store in extension.openai). The functions are setapikey(), getlog() and newimage().
    The first function is very simple, and just provides a way for the script to send the user's API key to the module:
    int openai_setapikey(lua_State* L) { APIKEY.clear(); if (lua_isstring(L, -1)) APIKEY = lua_tostring(L, -1); return 0; } The getlog function just returns any printed text, for extra debugging:
    int openai_getlog(lua_State* L) { lua_pushstring(L, logtext.c_str()); logtext.clear(); return 1; } The newimage function is where the action is at, but there's actually two overloads of it. The first one is the "real" function, and the second one is a wrapper that extracts the right function arguments from Lua, and then calls the real function. I'd say the hardest part of all this is interfacing with the Lua stack, but if you just go carefully you can follow the right pattern.
    bool openai_newimage(const int width, const int height, const std::string& prompt, const std::string& path) int openai_newimage(lua_State* L) This is done so the module can be easily compiled and tested as an executable.
    The real newimage function is where all the action is. It sets up a curl instance and communicates with a web server. There's quite a lot of error checking in the response, so don't let that confused you. If the call is successful, then a second curl object is created in order to download the resulting image. This must be done before the curl connection is closed, as the server will not allow access after that happens:
    bool openai_newimage(const int width, const int height, const std::string& prompt, const std::string& path) { bool success = false; if (width != height or (width != 256 and width != 512 and width != 1024)) { Print("Error: Image dimensions must be 256x256, 512x512, or 1024x1024."); return false; } std::string imageurl; if (APIKEY.empty()) return 0; std::string url = "https://api.openai.com/v1/images/generations"; std::string readBuffer; std::string bearerTokenHeader = "Authorization: Bearer " + APIKEY; std::string contentType = "Content-Type: application/json"; auto curl = curl_easy_init(); struct curl_slist* headers = NULL; headers = curl_slist_append(headers, bearerTokenHeader.c_str()); headers = curl_slist_append(headers, contentType.c_str()); nlohmann::json j3; j3["prompt"] = prompt; j3["n"] = 1; switch (width) { case 256: j3["size"] = "256x256"; break; case 512: j3["size"] = "512x512"; break; case 1024: j3["size"] = "1024x1024"; break; } std::string postfields = j3.dump(); curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers); curl_easy_setopt(curl, CURLOPT_URL, url.c_str()); curl_easy_setopt(curl, CURLOPT_POSTFIELDS, postfields.c_str()); curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback); curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer); auto errcode = curl_easy_perform(curl); if (errcode == CURLE_OK) { //OutputDebugStringA(readBuffer.c_str()); trim(readBuffer); if (readBuffer.size() > 1 and readBuffer[0] == '{' and readBuffer[readBuffer.size() - 1] == '}') { j3 = nlohmann::json::parse(readBuffer); if (j3.is_object()) { if (j3["error"].is_object()) { if (j3["error"]["message"].is_string()) { std::string msg = j3["error"]["message"]; msg = "Error: " + msg; Print(msg.c_str()); } else { Print("Error: Unknown error."); } } else { if (j3["data"].is_array() or j3["data"].size() == 0) { if (j3["data"][0]["url"].is_string()) { std::string s = j3["data"][0]["url"]; imageurl = s; // I don't know why the extra string is needed here... readBuffer.clear(); // Download the image file auto curl2 = curl_easy_init(); curl_easy_setopt(curl2, CURLOPT_URL, imageurl.c_str()); curl_easy_setopt(curl2, CURLOPT_WRITEFUNCTION, WriteCallback); curl_easy_setopt(curl2, CURLOPT_WRITEDATA, &readBuffer); auto errcode = curl_easy_perform(curl2); if (errcode == CURLE_OK) { FILE* file = fopen(path.c_str(), "wb"); if (file == NULL) { Print("Error: Failed to write file."); } else { auto w = fwrite(readBuffer.c_str(), 1, readBuffer.size(), file); if (w == readBuffer.size()) { success = true; } else { Print("Error: Failed to write file data."); } fclose(file); } } else { Print("Error: Failed to download image."); } curl_easy_cleanup(curl2); } else { Print("Error: Image URL missing."); } } else { Print("Error: Data is not an array, or data is empty."); } } } else { Print("Error: Response is not a valid JSON object."); Print(readBuffer); } } else { Print("Error: Response is not a valid JSON object."); Print(readBuffer); } } else { Print("Error: Request failed."); } curl_easy_cleanup(curl); return success; } My first attempt at this used a third-party C++ library for OpenAI, but I actually found it was easier to just make the low-level CURL calls myself.
    Now here's the kicker: Any web API in the world will work with almost the exact same code. You now know how to build extensions that communicate with any website with a web interface, including SketchFab, CGTrader, itch.io, and easily interact with them in the Ultra Engine editor. The full source code for this and other Ultra Engine modules is available here:
    https://github.com/UltraEngine/Lua
    This integration of text-to-image tends to do well with base textures that have a uniform distribution of detail. Rock, concrete, ground, plaster, and other materials look great with the DALL-E 2 generator. It doesn't do as well with geometry details and complex structures, since the AI has no real understanding of the purpose of things. The point of this integration was not to make the end-all-be-all AI texture generation. The technology is changing rapidly and will undoubtedly continue to advance. Rather, the point of this exercise was to demonstrate a complex feature added in a Lua extension, without being built into the editor's code. By releasing the source for this I hope to help people start developing their own extensions to add new peripheral features they want to see in the editor and game engine.
     
  7. Josh
    The Sacramento Hacker Lab is a new facility of offices and shared workspace for tech startups and developers to work. They threw a party last night and Chris and I went.
     
    I saw a 3D printer for the first time:

     
    And I got to hold a printed replica of Admiral Ackbar's head. This lead to me randomly yelling out "IT'S A TRAP!" for the remainder of the evening:

     
    The major players in the Sacramento game industry plotting to take over the world:

     
    I think these 4x4's turn into robots, or something:

     
    More people in the front!

     
    Some office are still WIP. Beer makes the work go faster:

     
    A scene from one of the more hipsterish offices:

     
    No drinking until 5. (All the digits are fives):

  8. Josh
    The Leadwerks.com server started acting up earlier today, and I had no idea why. Database errors were occurring, and I had not made any changes to the site. I called our server site in Chicago. First they restored the full site from a backup from 2 A.M. yesterday morning. The matter was still not solved, though. A couple hours later, they determined that an automatic CPanel upgrade had caused PHP extensions to be loaded multiple times. This was fixed, and now we are back to normal. If you've been around a while, you know I have a big fear of data loss but it appears our procedures were able to solve the problem on our dedicated server. You get what you pay for. B)
  9. Josh
    Leadwerks 3 is compiling for Android. There's presently a problem with the file system that is preventing any 3D rendering, but we'll get that worked out shortly. We're targeting Android 2.2.
     
    In order to compile C++ for Android on Windows, we had to install the following:
    -Java SDK
    -Eclipse IDE
    -Android SDK
    -Android NDK
    -CygWin
    -GDB
     
    We also learned that OpenGL ES 2.0 does not run in the Android simulator. For the time being, we have to run the engine on an actual Android device. That was rather surprising, but I think Google will have this functionality added fairly soon. I also learned there is an x86 version of Android, but no one uses it.
     
    Debugging C++ on Android is done with GDB, a command-line debugger. You definitely don't want to use this to do any heavy work. In this case, the cross-platform nature of coding with Leadwerks comes in handy, and you can debug on Windows or OSX and then just compile your finished code for Android, without much testing.
     
    The plan is to allow one-step publishing for Android when you use Lua. You write your program in script, test it on PC or Mac, then you can export a package for Android that's ready to install on your phone, without even having to install the Android SDK. You can also use C++, and it takes more work, but it's not too hard and we'll have instructions on how to get set up.
     
    Behold the mighty blue screen running on an HTC Evo, and tremble!:


  10. Josh
    This is just so cool. You can generate simple primitives for any entity. A convenient "Fit Shape" button will wrap the primitive around the entity, with an option to include the entity's children. However, the real power comes when you generate physics shapes from models. You can right-click on any model and get a menu to generate a physics shape. The polygon mesh and convex hull will look familiar if you've used the PhyGen tool in Leadwerks Engine 2. There's also a new option called "Convex Decomposition".

     
    Convex decomposition is an advanced algorithm that takes a polygonal mesh and outputs a series of convex shapes. This is helpful because real physics calculations require geometry with actual volume instead of just being a polygon soup. Convex decomposition can turn any polygonal mesh into a physics shape, such as the Stanford bunny shown below:

     
    This means you can generate accurate physics shapes for any model, without creating special physics models in 3ds Max or Blender.
     
    Take complex objects like this Utah teapot and generate a physics shape for it, then attach that shape to the model and save it as a prefab. Here you can see the physics shape that was calculated for the teapot using convex decomposition. Navigation is turned on to make the teapot into a dynamic obstacle AI will avoid going through:

     
    It's taken a long time to build a game development platform that accelerates development without restricting you, but I think the results are worth it. I hope you are looking forward to using these tools as much as I am.
  11. Josh
    After a resolving a few odd and ends I was able to create a proof of concept of the deferred environment probe idea I talked about earlier. The environment probe is basically the same as a point light. They have a range and affect a finite area. The color property can be used to adjust the intensity of the ambient lighting / reflections. Basically, you want to cover your indoor environments with these, but it's okay if you miss some spots. The environment probes fade out gradually so you don't have to be too strict about coverage. Also, if two proves overlap the brightest value will be used, but they will not add their lighting together.
     

     
    There are still some issues to figure out, like how to control the miplevel that is used in the cubemap lookup and how to automate cubemap rendering in the editor. I'm not going to try to make the cubemaps update in real-time in the game, because it would be too slow.
     
    The screenshots below show the same scene with flat ambient lighting:

     
    And with an environment probe placed in the scene:

  12. Josh
    Previously I talked about the idea of implementing a new "Ambient Point Light" into the engine. This would work by rendering the surrounding environment to a cubemap, just like a regular point light does. However, this type of light would render to a color buffer instead of just a depth buffer.
     
    The light could then be rendered to display ambient reflections and soft lighting on the surrounding environment. While it would not provide perfect real-time reflections, it would give an extra boost to the appearance of scenes that make heavy use of reflective surfaces.
     
    One of the problems in this type of system is how to handle overlapping lights. It would look weird to have an area where two lights are combining to make the ambient light or reflections brighter.
     
    I found a could create a new "lighten" blend mode with the following code:

    glBlendFunc(GL_ONE, GL_ONE); glBlendEquation(GL_MAX);
     
    This ensures that a light will only brighten a pixel up to its own value, and never beyond. If the pixel is already brighter than the light color it will have no effect. Below you can see two spotlights using this new blend mode. Notice that the area where the two lights both illuminate is never any brighter than either one.
     

     
    This also means soft ambient lighting will only appear in dark areas, and will have no effect on brightly lit surfaces, as it should.
  13. Josh
    The windows class on OSX has been a bit neglected, so i spent some time getting it ready for the final release. Creating a full-screen window on Mac is a little funny. There's a fullscreen mode that's been around a while, but it locks the system so that CMD+TAB doesn't work. That means if your application crashes, or you don't program it to close with the escape key, you'll be stuck in fullscreen mode, with no way out but a computer restart!
     
    You can create a window on top of everything else, and CMD+TAB will still work, but the borderless style you use for this makes it so no keyboard input is recognized! If you press escape, the system just beeps and the window doesn't catch an event. :o
     
    There's also a new full-screen mode for apps in OSX Lion, but it's for GUI apps, and I wanted this to work on Snow Leopard. I knew it was possible because Left 4 Dead 2 on Mac does this. Using the Google, I was finally able to find the information I needed. I had to create a new subclasses NSWindow class to get keyboard input working.
     
    Another problem I had was that when a window was resized, the NSOpenGLContext wasn't sizing with it. Many, many search queries later I found what I needed, and added a couple of methods to my OpenGL view class.
     
    Objective-C may be the greatest thing in the world for writing GUI applications for Mac, but I found it very complex for simply creating a window. It takes about 700 lines of code to create a window that has OpenGL rendering and keyboard and mouse input. Out of all that code, you guys end up with one or two functions that do exactly what you need to create a window and get events. B)

     
    Let's talk events. Event queues are the most compatible with all languages, but a callback is really better. When a window is being resized, the application loses control and doesn't get it back until the user lets up the mouse. That means an event queue system doesn't allow the window to be re-rendered as it is sized, and you get ugly artifacts as it is sizing. Therefore, I think we need a hook system where you can add your own hooks like this:

    System::AddHook(int hookid, char* funcptr, Object* extra) System::AddHook(HOOK_EVENT, myfunction, NULL)
    I think we're also going to set up the App class so that it has a ProcessEvent method, which automatically gets added as a hook:

    App::ProcessEvent(int eventid, int data, iVec2 position) { switch (eventid) { case EVENT_WINDOWSIZE: world->Render(); context->Sync(); } }
    Your thoughts?
  14. Josh
    You may have noticed our deferred decals (presently in beta) tend to shift with distance. Something was wrong with the screen space to world space conversion, but it's a hard problem to describe. I wrote a function based on the work Igor did for his SSLR shader, like this:

    vec4 ScreenPositionToWorldPosition(in vec2 texCoord) { float x = (texCoord.s / buffersize.x - 0.5) * 2.0; float y = (texCoord.t / buffersize.y - 0.5) * 2.0; float z = texelFetch(texture5, ivec2(texCoord),gl_SampleID).r; vec4 posProj = vec4(x,y,z,1.0); vec4 posView = inverse(projectioncameramatrix) * posProj; posView /= posView.w; posView+=cameraposition; return posView; }
     
    OpenGL stores depth values in a non-linear manner, which yields more precision closer to the camera. This allows a 24-bit depth buffer to cover a much greater distance than a linear depth buffer would allow with acceptable visual fidelity. The problem with the above equation is that the exponential value isn't being converted back into a linear value before being multiplied by the inverse camera projection matrix.
     
    I found a PDF with a formula I had come across before and started messing around with it: The trick is this:

    lineardepth = exponentialdepth / 0.5 - 1.0
     
    This equation can change if you have set glDepthRange to anything other than the defaults. (Coincidentally, calling glDepthRange() was what messed up my deferred renderer for iPad a couple years ago and forced me to use an extra floating-point buffer for storing screen depth at my GDC talk.)
     
    Plugging this into my function and eliminating the camera position add results in the following code:

    vec4 ScreenPositionToWorldPosition(in vec2 texCoord) { float x = (texCoord.s / buffersize.x - 0.5) * 2.0; float y = (texCoord.t / buffersize.y - 0.5) * 2.0; float z = texelFetch(texture5, ivec2(texCoord),gl_SampleID).r; z = z / 0.5 - 1.0; vec4 posProj = vec4(x,y,z,1.0); vec4 posView = inverse(projectioncameramatrix) * posProj; posView /= posView.w; return posView; }
     
    This also eliminated a couple of strange divide by twos I had in the texture mapping, which didn't really make sense to me at the time. Below you can see a decal rendered on a surface from a far distance, with no inaccuracies in the texture alignment.
     

     
    In reality, this was probably about five hours of work with "River Monsters" playing in the background.
  15. Josh

    Articles
    Google Draco is a library that aims to do for mesh data what MP3 and OGG did for music. It does not reduce memory usage once a mesh is loaded, but it could reduce file sizes and improve download times. Although mesh data does not tend to use much disk space, I am always interested in optimization. Furthermore, some of the NASA models I work with are very high-poly, and do take up significant disk space. Google offers a very compelling chart showing a compression ratio of about 95%:

    However, there is not much information given about the original mesh. Is it an ASCII .obj file? Of course that would be much bigger than binary data. I wanted to get a clear look at what kind of compression ratios I could expect, within the context of glTF files. I found a farily high-poly model on SketchFab here to work with.

    This model has 2.1 million triangles and 1 million vertices. That should be plenty to test with.
    Now, glTF is actually three different file formats. Normal glTF files store JSON data and come with an extra .bin file for binary data. This stores things like vertex positions and animation data, stuff you probably won't want to edit by hand. The .glb version of the format combines JSON and binary data into a single file, which can be viewed but not edited in a text editing program. Finally, there is also base64 glTF, which stores JSON together with binary data with base64 encoding in a single file. The base64 data looks like gibberish, but the file can be opened in a text editor, modified, and resaved without destroying the binary data.
    I was very curious to see what advantage Google Draco mesh compression would offer. Would it make glTF files significantly smaller, so that your games take up less space and have faster download times?
    To answer this question, I imported the model into Blender and exported several versions. I only exported position, normal, and texture coordinates. I also loaded the uncompressed .glb file in Ultra Engine and resaved it with simple mesh quantization.

    As you can see, mesh quantization (using one byte for each normal component, plus one byte for padding, and two bytes for each texture coordinate component) combined with regular old ZIP compression comes in significantly smaller than Draco compression at the maximum compression level. It's not in the chart, but I also tried ZIP compression the smallest Draco file, and that was still bigger at 28.8 MB.
    You can look at the models yourself here:
    dracotest.zip
    Based on this test, it appears that Google Draco is only marginally smaller than an uncompressed quantitized mesh, and still slightly bigger when ZIP compression are applied to both. Unless someone can show me otherwise, it does not appear that Google Draco mesh compression offers the 95% reduction in file sizes they seem to promise.
    Correction:
    This model was made up of several sub-objects. I collapsed the model and resaved it, and Draco now produces compression more like I was expecting to see:
    collapsed-draco.zip
    Presumably this means whatever data structure they use takes up a certain amount of space (probably an n-dimensional tree), and having fewer of these structures is more optimal.
    Here is the corrected comparison chart. This is good. Draco shrank this model to 7% the size of the uncompressed .glb export:

    This will be very useful for 3D scans and CAD models, as long as they don't contain a lot of articulated subobjects. Original model is on the left, Draco compressed model is on the right:

  16. Josh
    I'm hoping to add analytics to both the editor and the engine API. I need to know how many people are actively using the engine on a week to week basis, and how many of those users are opening the Workshop browser in the editor. There will be an option to opt out of stats collection (a real one, not like Windows 10), and even if you do allow stats to be collected no data will be traceable back to your SteamID or IP address.
     
    I've chosen the GameAnalytics.com API for this.
     
    This will also be extremely useful to integrate into your games for testing. You can set up an event to send to your GameAnalytics account when a player completes an achievement. We could even add a script that does this that the flowgraph can trigger. If you look at your account and see that although 100 people played your game, only 3 of them were able to find the blue key, that tells you your game has a problem! I think this kind of data will be incredibly useful for you. It basically ensures that you're always running a focus group, without having to sit people down in real life and watch them play. Instead of just getting one person's random opinion, you can objectively see how changes to your game affect playability for all users.
     
    This is still in development, but I believe the process will be:
     
    1. Create a GameAnalytics.com account.
    2. Set your API key in the main script.
    3. Start sending events for anything important you want information on such as starting the game, completing a level, or achieving a task.
     

    --Public values Script.eventname="MyEvent"--string "Event Name" function Script:SendEvent()--in if type(self.eventname)==string and self.eventname~="" then Analytics:SendEvent(self.eventname) end end
     
    For standalone games, you should give your users an option to disable stats collection. The game launcher will have a global option built in that covers all games.
     
    Any thoughts on this?
  17. Josh
    The model editor animation bug was the second-worst bug to hit Leadwerks Game Engine in all its history. Reported multiple times, this would cause animated models to discard triangles only in the model editor, only on Linux.
    http://www.leadwerks.com/werkspace/topic/10856-model-editor-freaks-out/
    http://www.leadwerks.com/werkspace/topic/12678-model-animation-vs-flashing-bodyparts/

    Since our animation commands have worked solidly for years, I was at my wits' end trying to figure this out. I strongly suspected a driver bug having to do with sharing uniform buffers across multiple contexts, but the fact it happened on both AMD and Nvidia cards did not support that, or indicated the problem was more low-level within the Linux distro. An engineer from Nvidia wasn't able to find the cause. If correct, this would not be the first driver bug I have found and had confirmed, by the Nvidia, AMD, and Intel driver teams.
    To make things even more difficult, the error only occurred in the release build. Debug builds could not be debugged because no error would occur!
    It never even occurred to me that the actual bone matrix data could be inputted wrong until Leadwerks user Roland reported that the bug was occurring in his game. This was the first time anyone had reported the error was occurring anywhere but the model editor.
    I finally determined that the actual bone matrices being sent to the animation shader contained many values of "-nan", meaning the negative form of "not a number". I was shocked. How could this possibly be when our animation commands have been completely reliable for years?
    I started printing values out and finally traced the problem back to the Quaternion spherical linear interpolation, or Slerp function. Slerp is a function that smoothly interpolates between two quaternion rotations without the problem of gimbal lock. This is the code for the function:
    void Quat::Slerp(const Quat& q, float a, Quat& result) { bool f = false; float b = 1.0f - a; float d = x*q.x + y*q.y + z*q.z + w*q.w; if (d<0.0f) { d = -d; f = true; } if (d<1.0f) { float om = Math::ACos(d); float si = Math::Sin(om); a = Math::Sin(a*om) / si; b = Math::Sin(b*om) / si; } if (f == true) a *= -1.0f; result.x = x*b + q.x*a; result.y = y*b + q.y*a; result.z = z*b + q.z*a; result.w = w*b + q.w*a; } In the function above, "a" is an interpolation value. I found that when a was equal to 0.0 the function would sometimes return the -nan values, but only when compiled in release mode with the GCC compiler! Adding a quick check at the beginning of the function fixed the problem: if (a==0.0f) { result.x = x; result.y = y; result.z = z; result.w = w; return; } And with that, it appears this issue can finally be put to rest. I think the lesson I learned here is always go where the bug leads to, even if you are sure there isn't a problem there.
  18. Josh
    We have our design parameters. We know the editor should let the user control every aspect of the engine in a visual manner, without having to use any other tools or do any editing of text files. We have 3D World Studio to serve as inspiration for the design and feel of the program.
     
    The sky's the limit. You, the users, have told me that you want to invest in a system that does more for you and makes your life easier. I'm happy to provide the basis for your next projects. Thank you for letting me make this for you, because creating 3D tools is what I really love doing.
     
    Look at me, I see a CSG editor grid and I get all sentimental.
     

     
    I like having the scene tree and asset browser in tabs on the right. I tried a couple variations of keeping them both visible at once, and I think it looks silly. The down side of using tabs is you can never drag anything from the asset browser to the scene tree directly. Therefore, it makes the most sense to me to have sounds, scripts, and anything else in the properties editor, so you can drag assets from the asset browser to that. This means scripts, sounds, etc. do not go in the scene tree, and it gets used for only entities and brushes. It seems like a good layout and will support all the drag and drop features we need.
  19. Josh
    I hired a local Android developer to get Leadwerks 3.0 running on Android devices. We don't know a lot yet, other than that we have an OpenGLES renderer, and everything should be cross-platform compilable. The Android version of LE3 is using a minimum requirement of Android 2.2, which is the lowest version that supports OpenGL ES 2.0. This will run on about 75% of Android devices:
     

     
    As you can see here, the proportion of 2.1 devices is steadily dropping. If a linear rate of decrease is maintained, they will be all but nothing in six months:
     

     
    Interestingly, in the LE3 platform poll about 62% of respondants were more interested in Android than iOS support.
     
    I'll let you know when we have something more to show!
  20. Josh
    Today I am working out the file system for Android applications. In the past we just manually copied all our assets to a "/Leadwerks" folder on the device itself. This approach is okay for testing, but it won't work for distributing apps in the Google Play store.
     
    Android uses APK files to store all applications in. An APK file is really just a ZIP file, and all your assets go in a /res/raw directory in the ZIP package.
     
    To access the Android file system from C++, we had to take the following steps:
     
    1. Retrieve the file path to the application APK file (with Java).
    2. Pass this file path to C++.
    3. In C++, load the APK file as a standard zip package.
    4. Change the current directory to the APK folder path + "/res/raw".
     
    All this occurs before the App::Start() function is called, so as soon as you start, you're ready to access all the files your game needs.
     
    It's nice when things work out neatly like that.
  21. Josh
    I spent most of today getting the Android library more polished, especially when handling application switching. It's a little tricky because Android doesn't automatically manage your sound channels and OpenGL resources, so these need to be reloaded when an app regains focus. Here's a video, which I obviously had way too much fun with in iMovie:


  22. Josh
    I got touch input and text rendering working on Android. Happily, I have not encountered any issues with different behavior on any tested OpenGLES devices, including iOS. The framerate on my HTC Evo jumps around quite a bit, which appears to be an issue with the Java garbage collector. Framerate on iOS is a solid 60 FPS, so we should be able to get this sorted out.
     
    Here's a video:
    http://www.leadwerks.com/werkspace/page/videos/_/leadwerks-engine-3/android-progress-r100
     
    I also discovered that LuaJIT for ARM processors was recently released. This is fantastic because it means Lua script will run on Android and iOS at about the same speed as C# and Java. I knew this would eventually be supported, but I didn't know until yesterday it was released about a month ago.
     
    We've had a surprisingly strong positive response from developers over our support for mobile platforms, especially Android. This feedback has been coming from both the Leadwerks community, as well as other places like Google+. My estimation is there's probably ten times as many people interested in mobile game development as there are interested in PC development. Since Android especially is going to be an important platform for us to support, I've decided to implement an OpenGL 2.0 renderer, and make that a higher priority than the OpenGL 1 fallback I originally planned. Leadwerks Engine 2 used OpenGL 2.1, but this will be a much simpler renderer that just matches the functionality of the mobile renderer, so you can get the exact same pixel output across all supported platforms. Expect to see something about as capable as the Half-Life 2 engine.
     
    Of course, the OpenGL 3.2/4 renderer will still be available on Windows. At this time, Apple's OpenGL 3.2 drivers are not functional, but when they are it will be no problem to enable the OpenGL 3 renderer for Mac computers as well. (Tim Cook: Please see bug report #9896622.)
  23. Josh

    Articles
    One of my goals in Ultra Engine is to avoid "black box" file formats and leave all game assets in common file formats that can be easily read in a variety of programs. For this reason, I put a lot of effort into the Pixmap class and the DDS load and save capabilities.
    In Ultra Engine animated textures can be stored in a volume texture. To play the animation, the W component of the UVW texcoord is scrolled. The fragment shader will sample the volume texture between the nearest two slices on the Z axis of the texture, resulting in a smooth transition between frames using linear interpolation. There's no need to constantly swap a lot of textures in a material, as all animation frames are packed away in a single DDS file.
    The code below shows how multiple animation frames can be loaded and saved into a 3D texture:
    int framecount = 128; std::vector<shared_ptr<Pixmap> > pixmaps(framecount); for (int n = 0; n < framecount; ++n) { pixmaps[n] = LoadPixmap("https://raw.githubusercontent.com/Leadwerks/Documentation/master/Assets/Materials/Animations/water1_" + String(n) + ".png"); } SaveTexture("water1.dds", TEXTURE_3D, pixmaps, framecount); Here is the animation playing in the engine:

    My new video project1.mp4 The resulting DDS file is 32 MB for a 256x256x128 RGBA texture:
    water1.zip
    You can open this DDS file in Visual Studio and view it. Note that the properties indicate this is the first slice of 128, verifying that our texture does contain the animation data:

    Adding Mipmaps
    The DDS format supports mipmaps in volume textures. A volume mipmap is just a lower-resolution image of the original, with all dimensions half the size of the previous frame, with a minimum dimension of 1. They are stored in the DDS file in descending order. The code below is a little complicated, but it will reliably compute mipmaps for any volume texture. Note the code is creating another STL vector called "mipchain" where all slices of all mipmaps are stored in order:
    auto plg = LoadPlugin("Plugins/FITextureLoader.dll"); int framecount = 128; std::vector<shared_ptr<Pixmap> > pixmaps(framecount); for (int n = 0; n < framecount; ++n) { pixmaps[n] = LoadPixmap("https://raw.githubusercontent.com/Leadwerks/Documentation/master/Assets/Materials/Animations/water1_" + String(n) + ".png"); } //Build mipmaps iVec3 size = iVec3(pixmaps[0]->size.x, pixmaps[0]->size.y, pixmaps.size()); auto mipchain = pixmaps; while (true) { auto osize = size; size.x = Max(1, size.x / 2); size.y = Max(1, size.y / 2); size.z = Max(1, size.z / 2); for (int n = 0; n < size.z; ++n) { auto a = pixmaps[n * 2 + 0]; auto b = pixmaps[n * 2 + 1]; auto mipmap = CreatePixmap(osize.x, osize.x, pixmaps[0]->format); for (int x = 0; x < pixmaps[0]->size.x; ++x) { for (int y = 0; y < pixmaps[0]->size.y; ++y) { int rgba0 = a->ReadPixel(x, y); int rgba1 = b->ReadPixel(x, y); int rgba = RGBA((Red(rgba0)+Red(rgba1))/2, (Green(rgba0) + Green(rgba1)) / 2, (Blue(rgba0) + Blue(rgba1)) / 2, (Alpha(rgba0) + Alpha(rgba1)) / 2); mipmap->WritePixel(x, y, rgba); } } mipmap = mipmap->Resize(size.x, size.y); pixmaps[n] = mipmap; mipchain.push_back(mipmap); } if (size == iVec3(1, 1, 1)) break; } SaveTexture("water1.dds", TEXTURE_3D, mipchain, framecount); The resulting DDS file is a little bigger (36.5 MB) because it includes the mipmaps.
    water1_mipmaps.zip
    We can open this DDS file in Visual Studio and verify that the mipmaps are present and look correct:

    Texture Compression
    Volume textures can be stored in compressed texture formats. This is particularly useful for volume textures, since they are so big. Compressing all the mipmaps in a texture before saving can be easily done by replacing the last line of code in the previous example with the code below. We're going to use BC5 compression because this is a normal map.
    //Compress all images for (int n = 0; n < mipchain.size(); ++n) { mipchain[n] = mipchain[n]->Convert(TEXTURE_BC5); } SaveTexture("water1.dds", TEXTURE_3D, mipchain, framecount); The resulting DDS file is just 9.14 MB, about 25% the size of our uncompressed DDS file.
    water1_bc5.zip
    When we open this file in Visual Studio, we can verify the texture format is BC5 and the blue channel has been removed. (Only the red and green channels are required for normal maps, as the Z component can be reconstructed in the fragment shader): Other types of textures may use a different compression format.

    This method can be used to make animated water, fire, lava, explosions and other effects packed away into a single DDS file that can be easily read in a variety of programs.
  24. Josh
    The design of Leadwerks 4 was meant to be flexible and easy to use. In Leadwerks 5, our foremost design goals are speed and scalability. In practical terms that means that some options are going to go away in order to give you bigger games that run faster.
    I'm working out the new animation system. There are a few different ways to approach this. In situations like this I find it is best to start by deciding the desired outcome and then figuring out how to achieve that. So what do we want?
    Fast performance with as many animated characters as possible. Hopefully, tens of thousands. Feedback to the CPU on the orientations of bones for things like parenting a weapon to the character's hand, firing a shot, collision detection with limbs, etc. Low memory usage (so we can have lots and lots of characters). In Leadwerks 4, a bone is a type of entity. This is convenient because the same entity positioning commands work just fine with bones, it's easy to parent a weapon to a limb, and there is a lot of consistency. However, this comes at a cost of potential performance as well as memory consumption. A stripped-down Bone class without all the overhead of the entity system would be more efficient when we hit really large numbers of animated models.
    So here's what I am thinking: Bones are a simplified class that do not have all the features of the entity system. The Model class has a "skeleton" member, which is the top-most bone in a hierarchy of bones for that model. You can call animation commands on bones only, and you cannot parent an entity to a bone, since the bone is not an entity. Instead you can attach it by making a copy that is weighted 100% to the bone you specify, and it becomes part of the animated model:
    weapon->Attach(model->FindBone("r_hand")); If you have any hierarchy in that weapon model, like a pivot to indicate where the firing position is, it would be lost, so you will need to retrieve those values in your script and save them before attaching the weapon.
    This also means bones won't appear in the map as an editable entity, which I would argue is a good thing, since they clog up the hierarchy with thousands of extra entities.
    When you call an animation command, it will be sent to the animation thread the next time the game syncs in the World::Update() command. Animations are then performed on a copy of all the visible skeletons in the scene, and their 4x4 matrices are retrieved during the next call to World::Update(). Animation data is then passed to the rendering thread where it is fed into a float texture the animation shader reads to retrieve the bone orientations for each instance in the batch it is rendering.
    This means there is latency in the system and everything is always one frame behind, but your animations will all be performed on a separate thread and thus have basically no cost. In fact with the simplified bone class, it might not even be necessary to use a separate thread, since carrying out the animations is just a lot of quaternion Slerps and matrix multiplications. I'll have to try it and just see what the results are.
    The bottlenecks here are going to be the number of animations we can calculate, the passing of data from the game thread to the animation thread and back, and the passing of data from the rendering thread to the GPU. It's hard to predict what we will hit first, but those are the things I have in mind right now.
    It would be possible to carry out the animation transforms entirely on the GPU, but that would result in us getting no feedback whatsoever on the orientation of limbs. So that's not really useful for anything but a fake tech demo. I don't know, maybe it's possible to get the results asynchronously with a pixel buffer object.
    In addition to animation, having tons of characters also requires changes to the physics and navmesh system, which I am already planning. The end result will be a much more scalable system that always provides fast performance for VR. As we are seeing, the optimizations made for VR are having a beneficial effect on general performance across the board. As explained above, this may sometimes require a little more work on your part to accomplish specific things, but the benefits are well worth it, as we will easily be able to run games with more characters than the video below, in VR, perhaps even on Intel graphics.
     
  25. Josh
    I've been doing some work with animated models lately, and encountered some of the difficulties people have mentioned with model animations. It's not really hard to do, but there are things we can do to make the process faster when dealing with lots of animations. This video explains some of the changes I made to make it easier to get animated models into Leadwerks.
     
    These include:
    Animation name in extraction dialog.
    Animation tab with all sequences displayed at once.
    Import animation frames from a text file.
    Play animation in editor.

     
    These changes will be available on the beta branch on Steam shortly.
     


×
×
  • Create New...