Crazycarpet

Josh · December 25, 2018

I get what you're saying, but the two physics libraries are very similar in how you implement them, switching the system out would be a rather simple process.

Plus I would assume Josh uses some kind of higher level wrapper for the physics APIs that is used in both Turbo and Leadwerks. Changing one would like be nearly as simple as a copy and paste to the other with some minor changes. (That is a guess, I don't know how Turbo's physics were done.)

If Newton does the job I'd say leave it... but it seems like it's causing headaches which with all the robust, free physics APIs out there these days there is no reason to go with a under-featured and under-documented one.

Josh · December 24, 2018

I've been using Bullet in a lot of projects lately, it's really come a long way in the last couple of years. Very fast, more than accurate enough, fully-featured (unlike Newton), and open source... The documentation is also decent, especially stacked up beside Newtons.

If you are planning to switch I'd imagine Bullet would be a pretty quick and painless switch. PhysX is good too but honestly, I prefer Bullet it has more options.

Performance wise it leaves Newton in the dust, the only downside is the rigid body simulations may not be quite as stable but I'm sure they're more than good enough for Leadwerks' needs. Multi-threading physics simulations with Bullet is also very easy and the source code comes with tons of examples.

Josh · April 29, 2018

This is really impressive Josh, can't wait for the release. However still I feel like the instanced rendering is carrying here I'd love to see how much faster LE5 handles animated meshes than LE4. Perhaps a demo of this in the future?

Also, I thought you said LE4 has frustum culling when I was complaining about GPU occlusion culling?

Josh · April 19, 2018

The reason you'd want to multithread the command process is for situations where big, new, powerful GPUs are bored because the CPU's one thread can't send it commands fast enough to utilize it to the fullest extent. That's not a fair analogy so long as your GPU can handle it, why would you not want to throw more work at it? Modern GPUs (10 series, etc) can certainly handle it.

A great GPU can handle anything a single core on your CPU can throw at it with ease, so you want to throw more at it. This is the most common bottleneck in games these days with how powerful GPUs are getting. The better your GPU, the better these optimizations will help, it's more planning for the future because as time goes on you'll see more and more improvements from this type of multi-threading, that's why DX12 and Vulkan moved towards it.

Anyways like I said, it isn't usually necessary but it would be optimum, just food for thought so you consider this design if you move towards a Vulkan renderer. It'd be a shame to use Vulkan and just move all the rendering to a thread, instead of using all available threads for command buffer generation.

Josh · April 18, 2018

Again, Doom doesn't do multi-threading... Why would it be faster than it's OpenGL renderer? They've had years to optimize OpenGL drivers, of course it'll be at least as fast in a single-threaded environment.

It's not magic, it's physics at that point.... Vulkan can use multiple threads to generate command buffers, more at a time; OpenGL can only do 1 at a time. It would indisputably be faster that's just the reality of it.

As time goes on and GPUs get more powerful a renderer in Vulkan that generates cmd buffers on multiple threads would be even faster because not only are you sending more work to the GPU due to the threaded command buffer generation, but the GPU would also be able to handle any work you throw at it. With high end cards today you will see big performance gains, where you wouldn't is with integrated cards... but that shouldn't be a priority.

Furthermore in Vulkan you can physically send draw calls from multiple threads and they are not send to the main thread by the driver, this is one highlight of Vulkan that only DirectX 12 has. Metal is planning this too, I have not read whether or not this is already the case in Metal, of if it's just a future plan.

Josh · April 16, 2018

Doom doesn't use a multi-threaded renderer. Of course Vulkan isn't going to magically make things faster on it's own, it gives you the ability to do it... On OpenGL you don't directly write to command buffers so you can't split the work up between threads. Vulkan in itself does not do anything multi-threading, this is something you have to implement. Vulkan just gives you the tools to design fast multi-threaded designs that were not possible prior to.

I'm not saying this is necessary, your design will be great because the game loop does not have to wait for the renderer. I'm just saying with Vulkan you could get maximum performance, you could still keep the rendering separate of the game loop too then you would end up with both faster and independent rendering.

Just spit-balling ideas because it sounds like you're trying to make LE as fast as possible, and this new API allows you to do what only DX12 could do without worrying about being locked to windows-only.

This optimization would indisputably make LE's renderer way faster, which is perfect for VR. The only question is whether or not it is necessary, is LE fast enough without it in the situations it's designed for? No sense in writing a big complex renderer if the engine is fast enough as is.

Edit:

Also keep in mind that Nvidia's OpenGL drivers are extremely fast and complex, AMDs are not. On AMD cards Vulkan does "magically" make things faster just by implementing it because their driver team went above-and-beyond on their Vulkan drivers.

Josh · April 16, 2018

On 4/14/2018 at 3:24 AM, Josh said:

All the multithreaded graphics APIs actually just accumulate commands in a buffer and then execute them in a single thread

The benefit to the multi-threaded APIs is that every thread has it's own command pool, and each thread can write to a command buffer so you can use any available threads to write to the command buffers. They are in the end submitted together, yes, but getting to the point where all command buffers are good-to-go is way faster. That's why they designed them this way. In the end, less time is spent waiting for 1 CPU thread to write all the command buffers.

Nvidia has a great document about this: https://developer.nvidia.com/sites/default/files/akamai/gameworks/blog/munich/mschott_vulkan_multi_threading.pdf

Josh · April 14, 2018

Very cool, but this is still more rendering separately on a thread than multi-threaded rendering. No matter how you cut it in GL the heavy work can't be spread across multiple threads so your GPU is always bored waiting for the under-used CPU to send it work, although in GL this is as good as it's going to get which is good enough. Still like your MoltenVK idea the best.

Either way it is neat to be able to control the frame rate of physics and game logic separately of rendering.

Josh · April 9, 2018

Can't wait to see what the future holds for Leadwerks. You will be able to make way better use of the CPU's threads with Vulkan so that'll be fun (if it happens).

Don't forget to always use RenderDoc when you're changing up the renderer. Best tool ever made, I swear... although I'm sure you've used it already

March 16, 2018

That sounds like a perfect solution. You shouldn't have to do much for smart pointers to work with ToLua++, they are simply a class. ~~I'd be surprised if ToLua++ couldn't handle them out of box. (Assuming you don't have the std:: prefix in the pkg files.)~~

http://lua-l.lua.narkive.com/JEUvLxvs/tolua-question

Looks like it'd be quite easy to come up with a solution.

March 16, 2018

16 minutes ago, Rick said:

From what I've seen with Squirrel you can basically use it like lua. It has the same idea of tables like Lua, but it also has the C++ like syntax.

Yeah, but it doesn't have nil... so if you do something like access an out-of-range table element, instead of returning nil it will raise an exception... having nil is also what I rely on for my Lua callback system in my engine (although null might be distinguishable from false on the stack in Squirrel too).

Squirrel looks like its come a long way since I last saw it, so I take back what i said. I'd go with Squirrel over Python. Still though, that's a big change for not a big difference. Not to mention I'm not seeing it being any easier for an auto-complete feature than Lua? You won't be creating your C++ classes in Squirrel, you'll be exposing them with t he stack, so it's not like you can parse the code files for auto-completion. If a switch has to be done thought Squirrel looks sweet.

March 16, 2018

If you're going Squirrel, you're better off with Python... why settle for Squirrel when it's a worse version of the same thing? (Granted it is more light-weight) I think the Squirrel syntax is ugly relative to Lua or Python.

I just personally love Lua because it is really, really fast and communication with C is extremely easy.

It seems unnecessary to change something like the languages scripting language solely for the autocomplete feature. Plus could you not do a hack where you just execute the Lua file, silently ignoring errors all the time and then generate auto-complete info based on object's metatable which contains all t's methods and members. I feel like you could find a sloppy way to make this work using a separate Lua state... might not be the easiest solution, but likely easier than changing languages entirely.

March 12, 2018

That would actually be really great to beable to draw widgets on the editor... maybe leave a little blank space for em Although I feel like this can be done regardless so long as the Editor uses LE's UI/Window system. Or if the widget sidebar was a scrollable panel, and you could add elements to it?

Josh · February 24, 2018

Josh is saying team speak not TeamSpeak. Nothing to do with the application TeamSpeak. It's just a phrase where teams speak to each other.

Clickbait at its finest.

August 29, 2017

" Leadwerks 5 is designed to be the most advanced game engine in the world "

LE4 is a great engine, but I mean come on.

AggrorJorn · July 12, 2017

He's selling that script.. so if he made a tutorial on it it'd put all his hard-work to waste.

July 10, 2017

20 hours ago, Josh said:

Not sure yet. If I use a gameengine object then it would contain the graphicsengine (graphicsdriver) as a member. In multithreaded programming this makes life a lot easier, but I don't want to pedantically enforce something just because.

I get why you'd want to do this but would this not be the same thing in principal because your "GameEngine" object would nwo be the bound state? This would have the same implications n a multithreaded environment as GetCurrent() because like the latter, only one thread could write to this "GameEngine" state at a time. Reading is always thread safe.

I guess it doesn't matter because users don't have to play with that stuff, I just figure why not make life easier for yourself?

July 9, 2017

So what about bound-states like your GraphicsDriver and what not? Will you remove these? I feel like that'd be a slightly harder one to remove cause it's require lots of design changes, but on the flip side it may not be neccesary to remove it because you won't have the main thread writing to the GraphicsDriver at any point. (I don't think.)

I'm more asking, do you plan to remove ALL bound states?

June 2, 2017

Nice! any news on any of the physics bugs and if they're fixed?

June 1, 2017

Well you are definitely right, I mean they are certainly easier to write because Vulkan puts most of the responsibility of memory management on the developer I didn't mean to overlook that.

What I more meant in regards to shaders is that if you look at the way AMD/Intel/Nvidia drives handle glsl shaders it's hugely different... with Nvidia's implementation clearly handling shader optimization and what not the best... With Vulkan their drivers will handle the spir-v bytecode more or less in the same way.

Long story short Vulkan gives the developer a lot more opportunity to capitalize on lowering the amount of allocations/deallocations, and concurrent command buffer writing... that handy little pNext variable in the structs also makes implementing things like Nvidia's bindless textures or AMD's rasterization order easily implemented with ~6 lines of code a piece. (I know it's easy in OpenGL as well, but not that easy!)

Can you link me to a driver implementation that multithreads anything in OpenGL? I don't see how they would accomplish this "behind the scenes" as the global states in OpenGL would prevent this... that's why you need a command pool per thread in Vulkan... OpenGL is an API that really doesn't look like it'd be able to accomplish multi-threading and especially not lock-less multi-threading. (When I say lock-less of course I'm excluding the single lock that ways for all the threads to report-back.)

I'm not argueing that you need to "synchronise" you have to wait for all threads to complete their jobs yes, but that doesn't by ANY means suggest that this won't give a performance gain... you get immense performance gains from this method, in fact Unity reported in their multithreaded Vulkan renderer release (that pretty much only optimizing through this method) 60 percent FPS gains out-of-box.

May 31, 2017

11 hours ago, nick.ace said:

Not sure you can do this for any benefit. Yes, you can have command pools per thread, but you need to synchronize the submission of command buffers or you can get undefined behavior. Building command buffers is what multithreading is intended for.

On the contrary, this gives more performance benefits than anything else you can do in Vulkan. Yes, they have to be submitted all at once through a primary command buffer... but it's not about that, it's about how long it takes to generate the sub-command buffers before being able to submit them, instead of writing the drawing for 1 entity at a time, you can write up to as many free threads as you have at once. Threads operate for approx. the same length as eachother per frame.... Instead of having 1 primary command buffer and writing the commands 1 at a time, you're writing commands <free thread #> at a time, then you get to submit the end result much, much sooner... Instead of your GPU being bored and 1 CPU core working hard, all your free CPU cores work hard and your GPU has a lot of work to do, which is great!

You MUST have a command pool per thread, and when you generate cmd buffers you assign them to the pool of the thread you're working in, that's just the way the Vulkan API is designed. We have no say over that, this design is what allows us to do this without locks.

Don't take my word for it, Valve wrote one of their write-ups on how they came up with design, Nvidia and many others follow it as well.

As for the drivers for Vulkan, they're only easier to write because spir-v is a binary language, where as GLSL and some other shading languages may interpret the standards differently. Khronos provided a new verison of glslang that ensures your GLSL is conformant and compiles it to binary.

I'd like to stress that Doom is a horrible example of a Vulkan implementation... they don't take advantage of any of the custom-allocator options that Vulkan provides, nor the easy multithreading options it provides. It was a poor implementation that expected to drop in a young API and see improved performance. As for Nvidia's performance on Vulkan, this will come... AMD had horrible shader optimization which was a big pitfall in their GL drivers, this is not going to be a problem in Vulkan that's why AMD cards perform well... Nvidia always makes great drivers just give them time.

May 23, 2017

Yeah, but Khronos scrapped OpenGL NG for Vulkan lol... and even if Vulkans not the future somethings going to have to replace OpenGL soon. Most engines are adding/have added support for when it's ready, it's simply not ready. Apples lack of support is kind of disappointing, but only weeks after they announced they aren't supporting it MoltenGL came out to allow Vulkan to work on Apple products. The community support is huge. ANYWAYS enough thread-highjacking

(P.S: Roblox, Dota2, Doom, and vkQuake use Vulkan already!)

Does LE use direct state access right now? If not do you plan on changing to use it so you don't have to constantly bind different states?

May 23, 2017

It is cross-platform totally... Khronos group doesn't make claims just to make them. They did an amazing job with OpenGL and an even better one with Vulkan. It's advantage over DX12 is the cross-platform nature. However, DX12 is probably faster because it's single platform support allows them to take advantage of windows only features to give games that little boost.

Vulkan's still young and not ready to replace anything yet. I certainly would not make an engine with it right now.as the drivers are very young. That being said support for Vulkans growing faster than anyone thought, noone thought Intel would release drivers til 2018, they came out in 2016.

That being said it is certainly a better idea to make the engine on OpenGL because Vulkan isn't "mature' yet. The day it is ready you can always make a renderer then. All I'm saying is you can't dismiss the purpose of Vulkan, it solves every problem that limits OpenGL on modern hardware and was improved/added to by many of the biggest name game companies in the world including Valve, Blizzard, Microsoft, Unity, Oculus, HTC, and hundreds of others. These people know what problems we face today, and how to solve them!

Out of curiosity do you plan on using OpenGL's newly supported direct state access? Seems like you could make things a lot faster not having to bind/change global states all the time.

May 23, 2017

All cards get a big boost in Vulkan when you implement a multi-threaded renderer... Vulkan lacks any global states and allows for a command pool to contain command buffers on separate threads. Because of this the threads never have to wait behind a lock to wait for access to some data to become available.

That being said, OpenGL is much much easier to implement and it is certainly fast enough. Your idea is definitely a good one too for the culling & rendering to be in separate threads. Still though it is no match for Vulkan and what you can do with it. Keep in mind Khronos designed Vulkan with multithreading and VR as the primary goal. A programmer can make a renderer with Vulkan that never locks a mutex by simply having 1 (or 2) command pools per thread.

Sounds like some good ideas though, excited to see what you come up with.

Edit: I don't think it's fair to say Vulkan is a niche... Khronos made Vulkan as a successor to OpenGL solely for the purposes of multi-threading and VR because OpenGL just simply isn't capable of allowing so much of the work to be handled by separate threads.

However, the drivers are certainly still young... but I mean OpenGL has been around for so long of course the drivers will be better and more stable, this will come. Vulkan is here and here to stay. Still though, OpenGL is evolving every day and more than capable of creating high performance games.

May 23, 2017

Be careful with that many threads! but is there any specific reason you decided against Vulkan? What parts of the rendering do you plan on letting threads handle with OpenGL? i never knew there was effective ways of threading in GL.

For example in a Vulkan implementation, you'd do something like allow all available threads to generate secondary command buffers for the meshes in the scene, then execute a primary command buffer when all the secondary cmd buffers have been generated.

You really should profile a lot before starting to multithread that heavily, if you look at the update, and render time in Leadwerks... the update time is generally nothing. The render time is what's killing us.

Sign In

Blogs

Forums

Store

Gallery

Videos

Blog Comments posted by Crazycarpet