Jump to content

Thread Questions


SpiderPig
 Share

Go to solution Solved by SpiderPig,

Recommended Posts

I'm creating an excessive amount of threads to test a few ideas of mine and in the process raised a few questions...

  • Is Start() waiting for an available slot?  Because "Done" is not printed until all threads have printed "Hello" which leads me to believe Start() is waiting or is just taking long time to start?  Some other examples aren't doing this.  I think it depends on what I'm asking the thread to do.  It's probably exiting a thread as fast as it's being started.
  • Is starting a thread relatively slow?
  • Pressing the spacebar should clear the memory allocated by the threads?
  • Can a thread be repurposed?  I mean creating a lot of threads takes time and memory.  I'm wondering if it's best to create only the amount that can run and then once one is done I can send it a new function or user data and restart it.

 

#include "UltraEngine.h"
#include "ComponentSystem.h"

using namespace UltraEngine;

void RunThread() {
    auto r = 0.0f;
    for (int i = 0; i < 1000; i++) {
        r += sqrt(Random(2.0f, 1024.0f));
    }

    Print("Hello");
}

int main(int argc, const char* argv[])
{
    auto displays = GetDisplays();
    auto window = CreateWindow("Ultra Engine", 0, 0, 1280, 720, displays[0], WINDOW_CENTER | WINDOW_TITLEBAR);
    auto world = CreateWorld();
    auto framebuffer = CreateFramebuffer(window);

    auto camera = CreateCamera(world);
    camera->SetClearColor(0.125);
    camera->SetFov(70);
    camera->SetPosition(0, 0, -3);

    auto light = CreateDirectionalLight(world);
    light->SetRotation(35, 45, 0);
    light->SetRange(-10, 10);

    auto box = CreateBox(world);
    box->SetColor(0,0,1);

    auto actor = CreateActor(box);
    auto component = actor->AddComponent<Mover>();
    component->rotation.y = 45;

    //Quick to create
    vector<shared_ptr<Thread>> threads;
    threads.reserve(10000);
    for (int id = 0; id < 10000; id++) {
        threads.push_back(CreateThread(RunThread, false));
    }

    //Not so quick to start... is it waiting?
    for (auto t : threads) {
        t->Start();
    }
    Print("Done");//All threads seem to finish before this is called.

    while (window->Closed() == false and window->KeyDown(KEY_ESCAPE) == false)
    {
        if (window->KeyHit(KEY_SPACE)) { threads.clear(); }//<- memory usage in VS dosn't change

        world->Update();
        world->Render(framebuffer);
    }
    return 0;
}

 

Link to comment
Share on other sites

The thread class doesn't actually create a "real" thread until the thread is first launched. In this case, that would be when the Start() method is called.

Thread creation is fast-ish. I mean for something like processing a pixmap across multiple threads it's fine to just create the threads and use them once...but if you are constantly creating threads it is better to have a set of threads waiting for work to do. A semaphore is really good for this, better than a mutex, if you can wrap your mind around how they work.

  • Thanks 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

This is how i have done it with the ShaderWatcher:

void UltraEngine::Utilities::Shader::ShaderWatcher::RunPreprocess(vector<shared_ptr<ShaderFile>> files)
{
	int time = Millisecs();
	Print("Preprocessing... (" + WString(files.size()) + " Shaders)");

	vector<shared_ptr<Thread>> threads;

	for (auto f : files)
	{
		threads.push_back(CreateThread(bind(ThreadPreprocess,_compiler, f), true));
	}

	for (auto t : threads)
		t->Wait();

	Print("Preprocessing finished... (" + WString(Millisecs() - time) + "ms)");
}

A semaphore or mutex isn't needed here as there are no resources shared by any thread.  A mutex is a good way to sync access to specific functions which are not threadsafe. eg: Print. Semaphores (technically a Mutex is just a specialized Version of a semaphore) can be used for syncing as well, but also to limit the amount of maximum parallel threads used for execution. 

  • Thanks 1
  • Intel® Core™ i7-8550U @ 1.80 Ghz 
  • 16GB RAM 
  • INTEL UHD Graphics 620
  • Windows 10 Pro 64-Bit-Version
Link to comment
Share on other sites

Vectors normally don't reduce their memory when they are resized. You can check their capacity to verify this. But compared to 1.6 GB (most of which is the VS debugger) the amount of memory you are freeing here is tiny.

GetMemoryUsage will give you the exact number, only in debug mode.

  • Thanks 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

I tested the capacity for both clear() and erase().  Both times it remains at 10,000, and memory usage actually went up a little.  How then to properly destroy all memory within a vector?

if (window->KeyHit(KEY_SPACE)) {
	Print(GetMemoryUsage());
	threads.erase(threads.begin(), threads.end());
	Print(GetMemoryUsage());
  	Print(threads.capacity());
}

 

Link to comment
Share on other sites

The reason vectors do this is because if you have a vector that is constantly resizing, its faster to just leave the capacity at its max value, because vector resize can be slow. If you are constantly pushing new objects into it, each added item will require a new memory block to be allocated and copied, so it's usually best to just leave it as-is. In fact, this is what the reserve() method is for:

std::vector<int> v;
v.reserve(1000);
for (int n = 0; n < 1000; ++n)
{
    v.push_back(n);
}

 

  • Like 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

Another thing I sometimes do to avoid constantly resizing:

if (v.capacity() == v.size()) v.reserve(v.size() * 1.3)
v.push_back(i);

This will make it so memory allocations only happen "once in a while" instead of constantly allocating and recopying the buffer.

  • Thanks 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

Yeah since learning of the reserve function it has been my best friend to date.  I've been programming in C++ for nearly 20 years and I swear there is still so much to learn.

 

Against what I thought would happen, the below code is actually 4 times slower than the first example I posted.  It seems the constant back and forth with the semaphore waiting and signalling takes a toll.  It took about 40 seconds for the threads to finish. (Debug mode)

A fair comparison would be a good idea.  I was only using 4 threads when the other tests used 8!    Changing that made it 2nd best.  It took 22 seconds to complete.

#include "UltraEngine.h"
#include "ComponentSystem.h"

using namespace UltraEngine;

void DoThis() {
    auto r = 0.0f;
    for (int i = 0; i < 1000; i++) {
        r += sqrt(Random(2.0f, 1024.0f));
    }
}

struct ThreadManager : public Object {
    int execution_count = 0;
    bool exit = false, waiting = false;
    function<void()> my_func;

    shared_ptr<Semaphore> semaphore;
};

shared_ptr<Object> RunThread(shared_ptr<Object> extra) {
    auto manager = extra->As<ThreadManager>();
    while (manager->exit == false) {
        manager->waiting = false;
        manager->execution_count++;
        if (manager->my_func != nullptr) { manager->my_func(); }

        manager->waiting = true;//a state for a semaphore maybe?
        manager->semaphore->Wait();
    }

    return nullptr;
}

int main(int argc, const char* argv[])
{
    auto displays = GetDisplays();
    auto window = CreateWindow("Ultra Engine", 0, 0, 1280, 720, displays[0], WINDOW_CENTER | WINDOW_TITLEBAR);
    auto world = CreateWorld();
    auto framebuffer = CreateFramebuffer(window);

    auto camera = CreateCamera(world);
    camera->SetClearColor(0.125);
    camera->SetFov(70);
    camera->SetPosition(0, 0, -3);

    auto light = CreateDirectionalLight(world);
    light->SetRotation(35, 45, 0);
    light->SetRange(-10, 10);

    auto box = CreateBox(world);
    box->SetColor(0, 0, 1);

    auto actor = CreateActor(box);
    auto component = actor->AddComponent<Mover>();
    component->rotation.y = 45;


    struct ThreadStruct {
        shared_ptr<Thread> thread;
        shared_ptr<ThreadManager> manager;
    };
  
    auto available_threads = MaxThreads();
    vector<shared_ptr<ThreadStruct>> threads;
    threads.reserve(available_threads);
    for (int id = 0; id < available_threads; id++) {
        auto s = make_shared<ThreadStruct>();
        s->manager = make_shared<ThreadManager>();
        s->manager->semaphore = CreateSemaphore();
        s->thread = CreateThread(RunThread, s->manager);
        
        threads.push_back(s);
    }

    
    bool done = false;
    while (window->Closed() == false and window->KeyDown(KEY_ESCAPE) == false)
    {
        auto tn = 0;
        for (auto t : threads) {
            if (t->manager->waiting == true && t->manager->exit == false) {
                if (t->manager->execution_count < 1250) {
                    t->manager->semaphore->Signal();
                }
                else {
                    t->manager->exit = true;
                    t->manager->semaphore->Signal();

                    Print("Done Thread #" + String(tn));
                    tn++;
                }
            }
        }

        world->Update();
        world->Render(framebuffer);
    }
    return 0;
}

Doing it like this took 25 seconds to finish.

vector<shared_ptr<Thread>> threads;
threads.reserve(10000);
for (int id = 0; id < 10000; id++) {
	threads.push_back(CreateThread(RunThread));
}

for (auto t : threads) {
	t->Start();
}

And the fastest is the way @klepto2 has been doing it and how I've been doing it to date.

Finishes in about 10 seconds if you allow the threads to start straight away.  Probably because the first few threads can start while the others are still being created.

vector<shared_ptr<Thread>> threads;
threads.reserve(10000);
for (int id = 0; id < 10000; id++) {
	threads.push_back(CreateThread(RunThread));
}

 

I wonder if a semaphore should have a state than can be checked?

 manager->waiting = true;//a state for a semaphore maybe?

...

semaphore->GetState()

 

Link to comment
Share on other sites

You are reading from and writing to several different variables in different threads, so the values you read could be totally random.

Normally the way you would use these is the main thread has a semaphore that says "new work is ready" and the thread would have a semaphore that says "work is finished". This is based on the idea that the main thread has some point at which the work must be completed, and it will wait until the thread is finished.

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

If it's a thread that just runs until it is finished, while the main loop continues, you can use a mutex lock for that to change a variable that says "the results are ready".

  • Thanks 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

I would do this by creating a custom class derived from Object that stores the vector in a member, then passing that object as the extra parameter in the CreateThread function.

  • Upvote 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

  • Solution

Thanks that certainly is the way to get the most out of the threads!  With this it completed in less than one second.  I had to make sure it was actually still processing the data is was that quick.  Pretty sure it's still doing the same workload.  I can now speed up my voxel terrain by a lot!

#include "UltraEngine.h"
#include "ComponentSystem.h"

using namespace UltraEngine;

float DoThis() {
    auto r = 0.0f;
    for (int i = 0; i < 1000; i++) {
        r += sqrt(Random(2.0f, 1024.0f));
    }

    return r;
}

struct ThreadManager : public Object {
    int current_count = 0, total_count = 1250;//10,000 / 8 (I.e. 10,000 / MaxThreads())
    float result = 0.0f;
    function<float()> my_func;
};

shared_ptr<Object> RunThread(shared_ptr<Object> extra) {
    auto manager = extra->As<ThreadManager>();
    while (manager->current_count < manager->total_count) {
        if (manager->my_func != nullptr) { manager->result = manager->my_func(); }
        manager->current_count++;
    }

    return nullptr;
}

int main(int argc, const char* argv[])
{
    auto displays = GetDisplays();
    auto window = CreateWindow("Ultra Engine", 0, 0, 1280, 720, displays[0], WINDOW_CENTER | WINDOW_TITLEBAR);
    auto world = CreateWorld();
    auto framebuffer = CreateFramebuffer(window);

    auto camera = CreateCamera(world);
    camera->SetClearColor(0.125);
    camera->SetFov(70);
    camera->SetPosition(0, 0, -3);

    auto light = CreateDirectionalLight(world);
    light->SetRotation(35, 45, 0);
    light->SetRange(-10, 10);

    auto box = CreateBox(world);
    box->SetColor(0, 0, 1);

    auto actor = CreateActor(box);
    auto component = actor->AddComponent<Mover>();
    component->rotation.y = 45;


    struct ThreadStruct {
        shared_ptr<Thread> thread;
        shared_ptr<ThreadManager> manager;
    };
  
    auto available_threads = MaxThreads();
    vector<shared_ptr<ThreadStruct>> threads;
    threads.reserve(available_threads);
    for (int id = 0; id < available_threads; id++) {
        auto s = make_shared<ThreadStruct>();
        s->manager = make_shared<ThreadManager>();
        s->manager->my_func = DoThis;
        s->thread = CreateThread(RunThread, s->manager);
        
        threads.push_back(s);
    }

    int thread_index = 0;
    while (window->Closed() == false and window->KeyDown(KEY_ESCAPE) == false)
    {
        for (int id = 0; id < threads.size(); id++) {
            if (threads[id]->thread->GetState() == THREAD_FINISHED) {
                Print("Done Thread #" + String(thread_index) + " - " + String(threads[id]->manager->result));
                thread_index++;

                threads.erase(threads.begin() + id);
                id--;
            }
        }

        world->Update();
        world->Render(framebuffer);
    }
    return 0;
}

 

  • Like 1
Link to comment
Share on other sites

Do you require a mutex if your just reading memory created on a different thread or is it just for writing to that memory?

E.g. I create an octree on the main thread and then pass one of it's nodes (via smart pointer) to a thread where it and it's children are read in order to create vertices and indices in said thread which are then passed back to the main thread once it is done.

Can I create a new child node in the thread without a mutex if the parent node I passed to the thread is not being used in any other thread?

Or is it simply a rule.  Do not access (read or write) memory that is shared between threads (that has the potential to be accessed at the same time) in any circumstance.

Link to comment
Share on other sites

Normally, I would use a mutex for writing and reading. 

Sample:

With only read mutex:

Thread A : Writes to node x the value 1 --> Just begins writing

Thread B : Locks the Mutex and reads the value 0 and unlocks the mutex --> Thread A hasn't finished writing the 1 into the memory

Thread A : Finishes

Thread C : Locks the Mutex and reads the value 1 --> Thread A has finished writing the 1 into the memory and unlocks the mutex

The read results might get out of sync.

 

 

With  read and write mutex:

Thread A : Locks the mutex and writes to node x the value 1 --> Just begins writing

Thread B : Waits for the unlocking of the mutex

Thread A : Unlocks the Mutex: --> Finished writing

Thread B : Locks the mutex and Reads the value 1 from memory and unlocks the mutex afterwards

Thread C : Locks the Mutex and reads the value 1  from memory and unlocks the mutex afterwards 

The results are always in sync.

The read and write approach is of course much slower then just locking the read. You need to make the mutex locks as small as possible and maybe optimize them to only lock when it is really necessary.

 

 

 

 

  • Intel® Core™ i7-8550U @ 1.80 Ghz 
  • 16GB RAM 
  • INTEL UHD Graphics 620
  • Windows 10 Pro 64-Bit-Version
Link to comment
Share on other sites

Small addition: This might not be the case for the int values in this case, they are just used for simplicity. int operations are atomic, and should work, without using  a lock for reading. More complex objects of course can have other behavior and may need read and write mutex or other types of memory barriers.

  • Intel® Core™ i7-8550U @ 1.80 Ghz 
  • 16GB RAM 
  • INTEL UHD Graphics 620
  • Windows 10 Pro 64-Bit-Version
Link to comment
Share on other sites

Yeah I was thinking it might just be safer to mutex the lot.  I'll probably end up making a system that creates all it needs in the thread and then passes the whole thing back to place into the octree.

1 minute ago, klepto2 said:

Small addition: This might not be the case for the int values in this case, they are just used for simplicity. int operations are atomic, and should work, without using  a lock for reading. More complex objects of course can have other behavior and may need read and write mutex or other types of memory barriers.

I was just reading something similar here;

https://www.quora.com/Do-I-have-to-use-a-mutex-to-protect-shared-variables-that-I-use-for-read-only-purposes-without-in-place-modification-in-C++-multithreading

Right now I'm implementing a bool check in the thread to see if it should quite or not.  I figured I might not need a mutex for that.

Link to comment
Share on other sites

You may be using a lot of threads but you probably aren't using them at max capacity. If you check CPU usage you'll probably be surprised how little they are being used.

Ultra uses one high-priority thread for rendering. The main logic thread where your code executes pauses in intervals of 16 milliseconds, so CPU usage should be pretty low. Culling is on another thread, but usage there will be low also. Animation, physics, and navmesh building are each on separate threads, and the animation system may use many threads, but in most cases CPU usage on each will be low unless you are pushing that system.

I'd say MaxThreads() - 1 is a good general rule.

  • Thanks 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...