Development Blog

Josh · April 12, 2018

The increased isolation and simplification of the OpenGL Code also means it is now much much easier to write a custom renderer. It would be pretty simple to create an OpenGL 1 or a DirectX renderer for the engine...or Vulkan support can be added without much trouble.

Einlander · April 12, 2018

This video might be of some inspiration to you. https://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine

What you describe reminds me of this.

Rick · April 13, 2018

Would it be possible to simulate the world physics without OpenGL? Would be nice for multiplayer games that need to run on a server that doesn't have a gfx card.

Josh · April 13, 2018

Yes, I think so.

wayneg · April 13, 2018

Do you think tcp and udp will be on different thread?

Crazycarpet · April 14, 2018

Very cool, but this is still more rendering separately on a thread than multi-threaded rendering. No matter how you cut it in GL the heavy work can't be spread across multiple threads so your GPU is always bored waiting for the under-used CPU to send it work, although in GL this is as good as it's going to get which is good enough. Still like your MoltenVK idea the best.

Either way it is neat to be able to control the frame rate of physics and game logic separately of rendering.

Josh · April 14, 2018

3 hours ago, Crazycarpet said:

Very cool, but this is still more rendering separately on a thread than multi-threaded rendering. No matter how you cut it in GL the heavy work can't be spread across multiple threads so your GPU is always bored waiting for the under-used CPU to send it work, although in GL this is as good as it's going to get which is good enough. Still like your MoltenVK idea the best.

Either way it is neat to be able to control the frame rate of physics and game logic separately of rendering.

All the multithreaded graphics APIs actually just accumulate commands in a buffer and then execute them in a single thread. I plan to separate the culling and rendering threads so that there is absolutely no overhead in the rendering thread. The rendering thread may draw the same list of visible objects several times before the culling thread feeds it a new visibility list. This will involve a fair amount of latency in the system, but the VR head and control orientations will be read each frame at the very last moment possible, so the things that you would notice latency with won't have any. I think it's going to be insanely fast.

Crazycarpet · April 16, 2018

On 4/14/2018 at 3:24 AM, Josh said:

All the multithreaded graphics APIs actually just accumulate commands in a buffer and then execute them in a single thread

The benefit to the multi-threaded APIs is that every thread has it's own command pool, and each thread can write to a command buffer so you can use any available threads to write to the command buffers. They are in the end submitted together, yes, but getting to the point where all command buffers are good-to-go is way faster. That's why they designed them this way. In the end, less time is spent waiting for 1 CPU thread to write all the command buffers.

Nvidia has a great document about this: https://developer.nvidia.com/sites/default/files/akamai/gameworks/blog/munich/mschott_vulkan_multi_threading.pdf

Josh · April 16, 2018

We will see. Doom 2016 ran about the same or even slower on Vulcan in the benchmarks I saw, on Nvidia hardware.

My final render stage is just a list of visible surfaces and that part could easily be split into a bunch of different threads.

But we will see when I actually test it out.

Crazycarpet · April 16, 2018

Doom doesn't use a multi-threaded renderer. Of course Vulkan isn't going to magically make things faster on it's own, it gives you the ability to do it... On OpenGL you don't directly write to command buffers so you can't split the work up between threads. Vulkan in itself does not do anything multi-threading, this is something you have to implement. Vulkan just gives you the tools to design fast multi-threaded designs that were not possible prior to.

I'm not saying this is necessary, your design will be great because the game loop does not have to wait for the renderer. I'm just saying with Vulkan you could get maximum performance, you could still keep the rendering separate of the game loop too then you would end up with both faster and independent rendering.

Just spit-balling ideas because it sounds like you're trying to make LE as fast as possible, and this new API allows you to do what only DX12 could do without worrying about being locked to windows-only.

This optimization would indisputably make LE's renderer way faster, which is perfect for VR. The only question is whether or not it is necessary, is LE fast enough without it in the situations it's designed for? No sense in writing a big complex renderer if the engine is fast enough as is.

Edit:

Also keep in mind that Nvidia's OpenGL drivers are extremely fast and complex, AMDs are not. On AMD cards Vulkan does "magically" make things faster just by implementing it because their driver team went above-and-beyond on their Vulkan drivers.

Josh · April 17, 2018

I’m all for trying it, but at this point we don’t know if it’s just a fix for inefficient AMD OpenGL drivers or if really does increase performance when used correctly. Here's someona claiming that Vulkan is half the speed of OpenGL:
https://steamcommunity.com/app/379720/discussions/0/142261352653876827/

OpenGL has always had low draw call cost compared to DirectX, so it will be interesting to see what the real results are.

Crazycarpet · April 18, 2018

Again, Doom doesn't do multi-threading... Why would it be faster than it's OpenGL renderer? They've had years to optimize OpenGL drivers, of course it'll be at least as fast in a single-threaded environment.

It's not magic, it's physics at that point.... Vulkan can use multiple threads to generate command buffers, more at a time; OpenGL can only do 1 at a time. It would indisputably be faster that's just the reality of it.

As time goes on and GPUs get more powerful a renderer in Vulkan that generates cmd buffers on multiple threads would be even faster because not only are you sending more work to the GPU due to the threaded command buffer generation, but the GPU would also be able to handle any work you throw at it. With high end cards today you will see big performance gains, where you wouldn't is with integrated cards... but that shouldn't be a priority.

Furthermore in Vulkan you can physically send draw calls from multiple threads and they are not send to the main thread by the driver, this is one highlight of Vulkan that only DirectX 12 has. Metal is planning this too, I have not read whether or not this is already the case in Metal, of if it's just a future plan.

catch22 · April 18, 2018

Doom Vulkan results are probably better on AMD. It seems that nvidia hardware is better on DX11 and OpenGL, more traditional rendering methods.

You'll find AMD performing a bit better on Vulkan and DX12, in most cases though.

Josh · April 19, 2018

I have my doubts about OpenGL commands being a significant bottleneck. That's kind of like saying you're going to make a sports car faster by removing the floor mats. Yes, it will be a small bit lighter but I don't think you will see any difference. The number one performance bottleneck we run into is pixel fillrate.

My guess is you will see a massive performance increase with my new architecture, and then Vulkan will produce a small improvement on Nvidia cards, and perhaps a 20-30% improvement on AMD cards. But let's see what the actual numbers turn out to be.

Crazycarpet · April 19, 2018

The reason you'd want to multithread the command process is for situations where big, new, powerful GPUs are bored because the CPU's one thread can't send it commands fast enough to utilize it to the fullest extent. That's not a fair analogy so long as your GPU can handle it, why would you not want to throw more work at it? Modern GPUs (10 series, etc) can certainly handle it.

A great GPU can handle anything a single core on your CPU can throw at it with ease, so you want to throw more at it. This is the most common bottleneck in games these days with how powerful GPUs are getting. The better your GPU, the better these optimizations will help, it's more planning for the future because as time goes on you'll see more and more improvements from this type of multi-threading, that's why DX12 and Vulkan moved towards it.

Anyways like I said, it isn't usually necessary but it would be optimum, just food for thought so you consider this design if you move towards a Vulkan renderer. It'd be a shame to use Vulkan and just move all the rendering to a thread, instead of using all available threads for command buffer generation.

Josh · May 22, 2020

On 4/19/2018 at 9:12 PM, Crazycarpet said:

The reason you'd want to multithread the command process is for situations where big, new, powerful GPUs are bored because the CPU's one thread can't send it commands fast enough to utilize it to the fullest extent. That's not a fair analogy so long as your GPU can handle it, why would you not want to throw more work at it? Modern GPUs (10 series, etc) can certainly handle it.

A great GPU can handle anything a single core on your CPU can throw at it with ease, so you want to throw more at it. This is the most common bottleneck in games these days with how powerful GPUs are getting. The better your GPU, the better these optimizations will help, it's more planning for the future because as time goes on you'll see more and more improvements from this type of multi-threading, that's why DX12 and Vulkan moved towards it.

Anyways like I said, it isn't usually necessary but it would be optimum, just food for thought so you consider this design if you move towards a Vulkan renderer. It'd be a shame to use Vulkan and just move all the rendering to a thread, instead of using all available threads for command buffer generation.

Your general sentiment is correct, but the fact is, the idea of "commands" is kind of antiquated. It's more like "hi, I am the rendering thread, here is a block of bytes you will interpret a few times before the next block of bytes arrives, k thx bye".

Sign In

16 Comments

Recommended Comments

Josh 14,328

Link to comment

Einlander 282

Link to comment

Rick 1,515

Link to comment

Josh 14,328

Link to comment

wayneg 15

Link to comment

Crazycarpet 85

Link to comment

Josh 14,328

Link to comment

Crazycarpet 85

Link to comment

Josh 14,328

Link to comment

Crazycarpet 85

Link to comment

Josh 14,328

Link to comment

Crazycarpet 85

Link to comment

catch22 25

Link to comment

Josh 14,328

Link to comment

Crazycarpet 85

Link to comment

Josh 14,328

Link to comment

Browse

Activity

Store

Support