CPU multithreading is working!
-
- Posts: 28
- Joined: Sun May 13, 2012 7:14 am
Re: CPU multithreading is working!
Exciting! Will the MT version work on iOS devices ?
-
- Posts: 149
- Joined: Fri May 24, 2013 6:08 am
Re: CPU multithreading is working!
I am 99% sure it will, since it now passes muster in msvc/gcc as well as being tested on android/windows/linux. I'm still waiting for D&B to clear us so I can sign on to Apple under my company. Once that happens I'll test mac and ios ... or someone else can take it for a whirl on those platforms in the meantime.
PS - Go go lunkhound
PS - Go go lunkhound
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
The threadsafe version of Bullet relies mainly on C++11 atomics for thread synchronization. This should be widely available on all modern platforms (including iOS devices). If it doesn't work out of the box on iOS it should just be a trivial change to make the detection of atomics support more robust.
I'm not calling it "multithreaded Bullet" because that sort of implies that Bullet would be launching or managing threads in some way. It is more accurate to say that it is "threadsafe" (for certain operations -- 5 currently). All of the actual thread management is left to the client -- one example of which is the MultiThreadedDemo, which uses Intel's Threaded Building Blocks for thread management and task scheduling.
I did it this way because I didn't want to tie Bullet to any particular threading library. Many projects that might be using Bullet will already be using a task scheduler of some kind and will not want Bullet to force a different one on them.
So you'll also need to get a version of TBB for your platform to get the MultiThreadedDemo running "as is". However, it shouldn't be too difficult to convert MultiThreadedDemo to another task-scheduler/threadpool library. All of the TBB specific code is surrounded by "#if USE_TBB", and there isn't that much of it, and it all uses the same idiom -- parallel_for.
The MultiThreadedDemo uses the "standard" Bullet components -- btDbvtBroadphase, btDiscreteDynamicsWorld, and btSequentialImpulseConstraintSolver. All 5 threadsafe ops have been tested with those components. If you are using one of the alternative solvers and/or physics worlds, then some of those may not be threadsafe.
The narrowphase should work regardless, as far as I know there is only one dispatcher to choose from.
The parallel island solving won't work with the MultiBodyDynamicsWorld because of a shared array of MultiBodyConstraints. And I have no idea about the soft-body dynamics world.
The MLCP solver is apparently not threadsafe. I don't know about any of the other alternative solvers.
The other 3 areas, predictUnconstraintMotion, createPredictiveContacts, and integrateTransforms are all based on overriding methods on the discrete dynamics world. They might also work on other physics worlds that are also derived from that one, but I don't know.
If you do decide to give it a spin, please post about it in this thread. I'll be glad to help out with any issues you run into.
Much thanks to c6burns for his help getting it working on Android/Linux/GCC, and for the CMake love!
I'm not calling it "multithreaded Bullet" because that sort of implies that Bullet would be launching or managing threads in some way. It is more accurate to say that it is "threadsafe" (for certain operations -- 5 currently). All of the actual thread management is left to the client -- one example of which is the MultiThreadedDemo, which uses Intel's Threaded Building Blocks for thread management and task scheduling.
I did it this way because I didn't want to tie Bullet to any particular threading library. Many projects that might be using Bullet will already be using a task scheduler of some kind and will not want Bullet to force a different one on them.
So you'll also need to get a version of TBB for your platform to get the MultiThreadedDemo running "as is". However, it shouldn't be too difficult to convert MultiThreadedDemo to another task-scheduler/threadpool library. All of the TBB specific code is surrounded by "#if USE_TBB", and there isn't that much of it, and it all uses the same idiom -- parallel_for.
The MultiThreadedDemo uses the "standard" Bullet components -- btDbvtBroadphase, btDiscreteDynamicsWorld, and btSequentialImpulseConstraintSolver. All 5 threadsafe ops have been tested with those components. If you are using one of the alternative solvers and/or physics worlds, then some of those may not be threadsafe.
The narrowphase should work regardless, as far as I know there is only one dispatcher to choose from.
The parallel island solving won't work with the MultiBodyDynamicsWorld because of a shared array of MultiBodyConstraints. And I have no idea about the soft-body dynamics world.
The MLCP solver is apparently not threadsafe. I don't know about any of the other alternative solvers.
The other 3 areas, predictUnconstraintMotion, createPredictiveContacts, and integrateTransforms are all based on overriding methods on the discrete dynamics world. They might also work on other physics worlds that are also derived from that one, but I don't know.
If you do decide to give it a spin, please post about it in this thread. I'll be glad to help out with any issues you run into.
Much thanks to c6burns for his help getting it working on Android/Linux/GCC, and for the CMake love!
-
- Posts: 456
- Joined: Tue Dec 25, 2007 1:06 pm
Re: CPU multithreading is working!
I don't usually like to pop-in and interrupt an ongoing discussion: however I'd like to make a few brief questions/considerations, soon after having thanked lunkhound for having restored/rewritten/reconsidered/reforked the Bullet Multithreaded branch .
Here are my questions/considerations (I still haven't tested the library yet, I'm just taking a look at its code):
1) In the old/legacy Bullet Multithreaded all the callbacks (e.g. the custom material callback) were forbidden (= no more usable): does this limitation apply to your version as well?
2) As far as I can understand in your source code the threading stuff is confined to:
a) LinearMath/btThreads.h/cpp: for the basic stuff (e.g. mutex,lock,thread). This is the only part of the code that uses C++11 and defaults to MSVC++ intrinsics if not supported (there is no fallback on other systems).
b) Demos/MultiThreadedDemo/MultiThreadedDemo.h/cpp: for the advanced stuff.This is the only part of the code that uses the TBB library (for tbb::blocked_range AFAICS).
Is this correct?
I was thinking that maybe by using http://tinythreadpp.bitsnbites.eu/ (a subset of C++11 in a single cpp file with a liberal license) we could add a fallback to point a (or, with a clever use of typedefs and using directives, we could even think of switching seamlessly between C++11/boost_thread and tinythreads++ using some precompiler definition, by forcing these three libraries to use the same API).
Maybe for point b we can think about providing an alternative demo using OpenMP as a replacement of TBB (however. I'm not sure if there's something compatible to tbb::blocked_range: I'll have to check the openMP API). Probably here the best option would be not to use any additional advanced library at all, and build some infrastructure using btThreads.h/cpp to have less dependencies, but I guess it won't be as easy/efficient: so it's OK for me.
IMPORTANT: These are just some optional ideas in case you find them useful.
I'm already satisfied with your work so far Thank you!
Here are my questions/considerations (I still haven't tested the library yet, I'm just taking a look at its code):
1) In the old/legacy Bullet Multithreaded all the callbacks (e.g. the custom material callback) were forbidden (= no more usable): does this limitation apply to your version as well?
2) As far as I can understand in your source code the threading stuff is confined to:
a) LinearMath/btThreads.h/cpp: for the basic stuff (e.g. mutex,lock,thread). This is the only part of the code that uses C++11 and defaults to MSVC++ intrinsics if not supported (there is no fallback on other systems).
b) Demos/MultiThreadedDemo/MultiThreadedDemo.h/cpp: for the advanced stuff.This is the only part of the code that uses the TBB library (for tbb::blocked_range AFAICS).
Is this correct?
I was thinking that maybe by using http://tinythreadpp.bitsnbites.eu/ (a subset of C++11 in a single cpp file with a liberal license) we could add a fallback to point a (or, with a clever use of typedefs and using directives, we could even think of switching seamlessly between C++11/boost_thread and tinythreads++ using some precompiler definition, by forcing these three libraries to use the same API).
Maybe for point b we can think about providing an alternative demo using OpenMP as a replacement of TBB (however. I'm not sure if there's something compatible to tbb::blocked_range: I'll have to check the openMP API). Probably here the best option would be not to use any additional advanced library at all, and build some infrastructure using btThreads.h/cpp to have less dependencies, but I guess it won't be as easy/efficient: so it's OK for me.
IMPORTANT: These are just some optional ideas in case you find them useful.
I'm already satisfied with your work so far Thank you!
-
- Posts: 149
- Joined: Fri May 24, 2013 6:08 am
Re: CPU multithreading is working!
I don't think you're interrupting at all. The more the merrier!Flix wrote:I don't usually like to pop-in and interrupt an ongoing discussion
2a) yes, just the locks in btThreads use c++11 atomics
2b) yes, just the code in that demo uses TBB for threading
I think it's smart to have kept threading out of bullet itself. I am ambivalent about threading libraries, but I realize many people are not and this is a good strategy for wider adoption without a threading holy war. For now I am content with adding TBB as a dependency to my project and leveraging what lunkhound has done in that demo. Personally, I just wanted an easy win to push my simulations a bit farther with minimal effort as I am in a fairly late stage of development in my current project
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
No, I can't see any reason why such callbacks would cause a problem. In my own code I'm using the gContactAddedCallback without issues so far. Obviously the callback will have to be threadsafe.Flix wrote:I don't usually like to pop-in and interrupt an ongoing discussion: however I'd like to make a few brief questions/considerations, soon after having thanked lunkhound for having restored/rewritten/reconsidered/reforked the Bullet Multithreaded branch .
Here are my questions/considerations (I still haven't tested the library yet, I'm just taking a look at its code):
1) In the old/legacy Bullet Multithreaded all the callbacks (e.g. the custom material callback) were forbidden (= no more usable): does this limitation apply to your version as well?
a - Yes. Pretty much all that is needed for (a) is enough atomic operations to make a basic mutex. I started out using OS provided mutexes (i.e. Windows critical sections), but they were slow compared to the lightweight mutex I ended up with. It should be quite straightforward to add more fallbacks as needed.Flix wrote:2) As far as I can understand in your source code the threading stuff is confined to:
a) LinearMath/btThreads.h/cpp: for the basic stuff (e.g. mutex,lock,thread). This is the only part of the code that uses C++11 and defaults to MSVC++ intrinsics if not supported (there is no fallback on other systems).
b) Demos/MultiThreadedDemo/MultiThreadedDemo.h/cpp: for the advanced stuff.This is the only part of the code that uses the TBB library (for tbb::blocked_range AFAICS).
Is this correct?
b - Yes, the MultiThreadedDemo is where all of the actual thread management takes place. It uses TBB to initialize/cleanup a threadpool, and it uses tbb::parallel_for to send tasks to the thread pool. The blocked_range is just a struct of ints to pass along the begin and end of the for-loop.
I think since the requirements are so small for what btThreads actually needs (it's really just a compare-and-swap, and atomic load) that it doesn't really make sense to add any extra dependencies on external libraries for it. A few lines of asm or intrinsics for various fallback cases should cover it. The btMutex class is very similar to tinythreads::fast_mutex. Although I don't like the fact that the fast_mutex header includes other headers for the sake of inlining its member functions.Flix wrote: I was thinking that maybe by using http://tinythreadpp.bitsnbites.eu/ (a subset of C++11 in a single cpp file with a liberal license) we could add a fallback to point a (or, with a clever use of typedefs and using directives, we could even think of switching seamlessly between C++11/boost_thread and tinythreads++ using some precompiler definition, by forcing these three libraries to use the same API).
I was thinking of OpenMP as well (as an alternative to TBB). I think it does have parallel_for, but I haven't actually used it. It would be nice to have at least 2 choices for task schedulers to underscore the fact that this isn't tied to the architecture of any one particular task scheduler.Flix wrote:Maybe for point b we can think about providing an alternative demo using OpenMP as a replacement of TBB (however. I'm not sure if there's something compatible to tbb::blocked_range: I'll have to check the openMP API). Probably here the best option would be not to use any additional advanced library at all, and build some infrastructure using btThreads.h/cpp to have less dependencies, but I guess it won't be as easy/efficient: so it's OK for me.
TBB works very well, but I don't like how bloated it is. The code for it is incomprehensible -- just layers upon layers of templates scattered across dozens and dozens of headers. But it seems to be the "standard" and is cross-platform, and doesn't require any special compiler support.
Another option here would be something like JobSwarm which is a simple task scheduler in just a few source files. It just needs a bit of work to make it more cross platform friendly.
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
Alright, now you can choose between TBB and OpenMP.
TBB is the default. To switch to OpenMP, open up MultiThreadedDemo.cpp and set USE_TBB to 0 and USE_OPENMP to 1.
Then make sure your compiler options are set to OpenMP mode.
Performance between the two seems about the same on my machine.
Oh, I noticed that right now the MultiThreadedDemo is hardcoding the number of threads to 4. If you are doing performance testing, make sure to set numThreads to match your hardware!
TBB is the default. To switch to OpenMP, open up MultiThreadedDemo.cpp and set USE_TBB to 0 and USE_OPENMP to 1.
Then make sure your compiler options are set to OpenMP mode.
Performance between the two seems about the same on my machine.
Oh, I noticed that right now the MultiThreadedDemo is hardcoding the number of threads to 4. If you are doing performance testing, make sure to set numThreads to match your hardware!
-
- Posts: 463
- Joined: Fri Nov 30, 2012 4:50 am
Re: CPU multithreading is working!
I'll double check my code again (apparently thread-safe now, but still want to make sure), and then I'll post a patch on that issue.lunkhound wrote: The MLCP solver is apparently not threadsafe. I don't know about any of the other alternative solvers.
Yup, on window's it's actually easy enough, just need to use some derivative of:lunkhound wrote:Oh, I noticed that right now the MultiThreadedDemo is hardcoding the number of threads to 4. If you are doing performance testing, make sure to set numThreads to match your hardware!
Code: Select all
#include "process.h"
SYSTEM_INFO sysinfo;
GetSystemInfo(&sysinfo);
numThreads = sysinfo.dwNumberOfProcessors;//not recommended for multiprocessor though
-
- Posts: 149
- Joined: Fri May 24, 2013 6:08 am
Re: CPU multithreading is working!
Since we've already gone down the c++11 road, you could use:
Otherwise it's a big pain to write a cross platform method. There is one in Ogre 2.0, but I like the 1 line c++11 method
Code: Select all
std::thread::hardware_concurrency()
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
It turns out both TBB and OpenMP have similar functions to query the number of hardware threads. For OpenMP its:
and for TBB its:
So I removed the hardcoding to 4 threads. My machine is a 4-core with hyperthreading, so both of those report 8 threads on mine.
What's interesting is that with TBB, I see a noticeable performance improvement using 8 threads vs 4. I didn't really expect that.. hyperthreading is actually good for something!
With OpenMP on the other hand, performance gets wildly inconsistent and really bad using 8 threads vs 4. There is something really strange going on there -- the profiling indicates that these performance spikes are coming ONLY from predictUnconstraintMotion. Its kinda baffling.
Oh and the demo now lets you dial the number of threads up or down using '+' and '-' keys.
Code: Select all
omp_get_max_threads();
Code: Select all
tbb::task_scheduler_init::default_num_threads();
What's interesting is that with TBB, I see a noticeable performance improvement using 8 threads vs 4. I didn't really expect that.. hyperthreading is actually good for something!
With OpenMP on the other hand, performance gets wildly inconsistent and really bad using 8 threads vs 4. There is something really strange going on there -- the profiling indicates that these performance spikes are coming ONLY from predictUnconstraintMotion. Its kinda baffling.
Oh and the demo now lets you dial the number of threads up or down using '+' and '-' keys.
-
- Posts: 456
- Joined: Tue Dec 25, 2007 1:06 pm
Re: CPU multithreading is working!
! Just to say I'm very happy I can use callbacks and the OpenMP version!
Only one additional thing:
On Linux 64bit in LinearMath/btThreads.cpp, I had to change:to:otherwise the cast loses precision and the code won't compile without using the -fpermissive flag on gcc.
Only one additional thing:
On Linux 64bit in LinearMath/btThreads.cpp, I had to change:
Code: Select all
bool btIsAligned( const void* ptr, unsigned int alignment )
{
btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2
return ( ( (unsigned int) ptr )&( alignment - 1 ) ) == 0;
}
Code: Select all
bool btIsAligned( const void* ptr, unsigned int alignment )
{
btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2
return ( ( (size_t) ptr )&( alignment - 1 ) ) == 0;
}
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
Good find, thanks! I'll fix it. Glad to know its been tested on 64-bit!Flix wrote: ! Just to say I'm very happy I can use callbacks and the OpenMP version!
Only one additional thing:
On Linux 64bit in LinearMath/btThreads.cpp, I had to change:to:Code: Select all
bool btIsAligned( const void* ptr, unsigned int alignment ) { btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2 return ( ( (unsigned int) ptr )&( alignment - 1 ) ) == 0; }
otherwise the cast loses precision and the code won't compile without using the -fpermissive flag on gcc.Code: Select all
bool btIsAligned( const void* ptr, unsigned int alignment ) { btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2 return ( ( (size_t) ptr )&( alignment - 1 ) ) == 0; }
[edit:] fixed.
-
- Posts: 77
- Joined: Tue Dec 27, 2011 11:51 am
Re: CPU multithreading is working!
EDIT: This is working great any chance this would get merged back into the main trunk
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
I'm planning to make a pull request soon. I'm just waiting to see if any bugs crop up and hopefully get a few more reports from people using it.Granyte wrote:EDIT: This is working great any chance this would get merged back into the main trunk
I'm still very interested in hearing about how it works on various platforms/OSes/compilers. As far as I've heard thus far, it has only been tested with MSVC 2013/Windows, GCC/Android, and GCC/Linux.
And so far no reports about the performance aside from what I reported.
Glad to hear its working great. Can you elaborate on that a bit?
[edit] I went ahead and made a pull request. I figure it may take a while, and may as well get it started.
-
- Posts: 77
- Joined: Tue Dec 27, 2011 11:51 am
Re: CPU multithreading is working!
Well so far it's working good but I have other issue with bullet physics that until resolved prevent me from testing more in depth