CPU multithreading is working!

lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound » Thu Nov 03, 2016 7:21 pm

Really glad to see the merge, thank you!

One thing that I'd like to mention for anyone trying to integrate multithreaded bullet into their own project:

There's a "gotcha" that happens if the pool allocator is called from a translation unit (.cpp file) outside of Bullet (like in the collision dispatcher derived class). If that external file is compiled without BT_THREADSAFE=1, then the pool allocator inline functions won't use the mutex, and you'll have a race condition.

There is a simple fix, just add this line:

Code: Select all

#define BT_THREADSAFE 1
At the top of any cpp file that calls the pool allocator in your project. Must be BEFORE any bullet includes.

This could be fixed in bullet by moving those pool allocator methods into a .cpp file in bullet, but then the functions can't be inlined and that could have a potential performance hit for single threaded bullet. I tried very hard to avoid compromising single-threaded bullet with my changes. Erwin, do you have a sense of how performance sensitive those pool allocator functions are?

On the graph coloring constraint solver, I may take a crack at that. I'd really like to see that too.

kermado
Posts: 20
Joined: Tue Jan 12, 2016 11:20 am

Re: CPU multithreading is working!

Post by kermado » Fri Nov 04, 2016 12:56 pm

lunkhound wrote:On the graph coloring constraint solver, I may take a crack at that. I'd really like to see that too.
Yes please! I'd love to see this added.

aviator777
Posts: 1
Joined: Thu Apr 02, 2015 5:02 pm

Re: CPU multithreading is working!

Post by aviator777 » Sat Apr 15, 2017 2:15 pm

Without going into deep details and through topic what are the minimum steps required to set-up basic physics multi-threading ?
Is it just replacing btDiscreteDynamicsWorld with btDiscreteDynamicsWorldMt ?

benelot
Posts: 350
Joined: Sat Jul 04, 2015 10:33 am
Location: Bern, Switzerland
Contact:

Re: CPU multithreading is working!

Post by benelot » Sat Apr 15, 2017 6:53 pm

Check out the multithreaded demo in the example browser:
You should see easily what is necessary to set it up:
https://github.com/bulletphysics/bullet ... readedDemo

lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound » Sun Apr 16, 2017 1:19 pm

aviator777 wrote:Without going into deep details and through topic what are the minimum steps required to set-up basic physics multi-threading ?
Is it just replacing btDiscreteDynamicsWorld with btDiscreteDynamicsWorldMt ?
No it isn't that simple. Bullet 2.x is single-threaded by default and does not have it's own threading system (task manager/job manager).

However, you can use Cmake to configure the Bullet 2.x core libraries (LinearMath, BulletCollision, BulletDynamics) to have certain operations be threadsafe.
In addition, you can also use Cmake to configure the bullet examples (some of them) to run multi-threaded using certain supported threading APIs (OpenMP, Intel's TBB (Threaded Building Blocks), and Microsoft's PPL (Parallel Patterns Library)).

If you want to integrate Bullet with multi-threading into your own application, you'll want to build the multi-threaded demo (as detailed below) so you have a working example to look at. Then you'll need to decide which threading task manager you want to use -- OpenMP, TBB, or even a custom one if your application already has a threading system (as long as it can support a "parallel-for" similar to what OpenMP and TBB have).
Then have a look at the source file https://github.com/bulletphysics/bullet ... MTBase.cpp
This is where all of the multithreading is done. Look at every instance in that file where "parallelFor" is called. You'll want to copy much of the code from that file into your own application (minus the example framework stuff).
This gives you the most flexibility in how Bullet multi-threading works in your application. If the multi-threading was built into the core Bullet libraries, then it would be tied to a specific threading API which could conflict with how the application uses threads. I wish it was simpler.

Here is how you use Cmake to configure the various options:

For OpenMP there are no external dependencies, so enabling it is just turning on a flag in Cmake. OpenMP works with Windows/Visual Studio and also Linux/GCC. When I tested it with Linux/Clang there was a problem with a missing "omp.h" system header, so it doesn't work with Clang at the moment.

Microsoft's Parallel Patterns Library (PPL) is also supported and has no dependencies, but it's Visual Studio only.

For Intel's TBB you'll need to install it separately, and then point Cmake to the include and lib directories.
Download a precompiled distribution of TBB from https://www.threadingbuildingblocks.org/download (choose your OS), or you can grab it from https://github.com/wjakob/tbb and build it from source. The latest version of TBB should work (just tested it on Windows), I've also tested it in the past with 4.3 and 4.4 versions with no issues. The version on Github is unofficial but adds a Cmake build system which could be nice for those who wish to build from source.

To configure Bullet for the multithreaded demo, use the Cmake build system to enable the flags

- BULLET2_USE_THREAD_LOCKS
- BUILD_BULLET2_DEMOS

and at least one of the following flags:

- BULLET2_MULTITHREADED_OPEN_MP_DEMO (Windows/Visual Studio and linux/GCC; not Clang)
- BULLET2_MULTITHREADED_PPL_DEMO (Windows/Visual Studio 2010 or later only)
- BULLET2_MULTITHREADED_TBB_DEMO (requires external dependency -- see below)

Once TBB is installed somewhere, go back to the Bullet Cmake configuration and set paths for:

- BULLET2_TBB_INCLUDE_DIR (point this to the top "include" directory in the TBB install)
- BULLET2_TBB_LIB_DIR (point this to the directory with the correct "tbb.lib" file for your OS and compiler)

aviator
Posts: 13
Joined: Thu Apr 02, 2015 5:15 pm

Re: CPU multithreading is working!

Post by aviator » Sun Apr 16, 2017 4:48 pm

lunkhound wrote: No it isn't that simple. Bullet 2.x is single-threaded by default and does not have it's own threading system (task manager/job manager).

However, you can use Cmake to configure the Bullet 2.x core libraries (LinearMath, BulletCollision, BulletDynamics) to have certain operations be threadsafe.
In addition, you can also use Cmake to configure the bullet examples ...
Thank you Lunkhound for a detailed answer, this is what I was fearing, seems that out project for the current time will need to stick with a single-threaded version.
The problem is that that the provided framework with bullet makes the whole thing more difficult to understand, one day we could create a detailed documentation for Bullet Physics.

lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound » Sun Apr 16, 2017 8:53 pm

aviator wrote:
lunkhound wrote: No it isn't that simple. Bullet 2.x is single-threaded by default and does not have it's own threading system (task manager/job manager).

However, you can use Cmake to configure the Bullet 2.x core libraries (LinearMath, BulletCollision, BulletDynamics) to have certain operations be threadsafe.
In addition, you can also use Cmake to configure the bullet examples ...
Thank you Lunkhound for a detailed answer, this is what I was fearing, seems that out project for the current time will need to stick with a single-threaded version.
The problem is that that the provided framework with bullet makes the whole thing more difficult to understand, one day we could create a detailed documentation for Bullet Physics.
I find it far more useful to have a working example than detailed documentation. Bullet has lots of great examples (including multithreading).
The problem with detailed documentation is that it tends to get out-of-date as the code changes, and then it is unreliable. Most people don't want to write documentation in their free time, so it tends to not happen.

Mako_energy02
Posts: 171
Joined: Sun Jan 17, 2010 4:47 am

Re: CPU multithreading is working!

Post by Mako_energy02 » Wed May 17, 2017 1:48 am

I have recently gotten around to finally try updating Bullet to the latest trunk version and I was looking forward to the multi-threading available as outlined in this thread. However since looking over the instructions (a few posts up) and assessing the work that would need to be done on my part to get it running...is a bit disheartening. The amount of code that needs to be added to get what should be an out-of-the-box feature is staggering.

Is there anything that prohibits a deeper level of integration for multi-threading? I understand it would require some refactoring of fairly core classes, but is that a no-no?

Also, I have to bite on the documentation comment. You need both(Demos and Docs). You really, really do. Demos show how to assemble the demos, some things can be modified but as soon as the demo breaks all hell breaks loose. Documentation gives you the means of making sense of the break. As well as a stronger foundation of knowledge to make the necessary edits. Without both you can't generate proper experts in a timely manner, which Bullet suffers from greatly.

As for the issues of making documentation...treat it like you would code. Literally. The code is not done until it's done. The refactor is not done until the documentation is updated. If you really want to go crazy, do what I do, integrate doxygen into your IDE such that you run doxygen every compilation and any doxygen warning is treated like an error during compilation. That forces you into maintaining docs well. Good documentation isn't a hard thing to do. Just too many people are stubborn about it.

lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound » Sun May 21, 2017 12:40 am

Mako_energy02 wrote:I have recently gotten around to finally try updating Bullet to the latest trunk version and I was looking forward to the multi-threading available as outlined in this thread. However since looking over the instructions (a few posts up) and assessing the work that would need to be done on my part to get it running...is a bit disheartening. The amount of code that needs to be added to get what should be an out-of-the-box feature is staggering.

Is there anything that prohibits a deeper level of integration for multi-threading? I understand it would require some refactoring of fairly core classes, but is that a no-no?
Well Erwin wants to focus on Bullet 3, so I think he is resistant to any substantial changes to Bullet 2.x. Certainly API breaking changes are a no-no. Also anything that degrades the performance of single-threaded Bullet is a no-no.

I agree that the way that multi-threading is integrated into Bullet leaves a lot to be desired.

The good news is that most of the required changes to core classes are already in. The only changes that live outside of the Bullet 2 libraries (in examples/MultiThreadedDemo/CommonRigidBodyMTBase.cpp) are the parts that need to call into the task-scheduler (the task scheduler interface is in examples/MultiThreadedDemo/ParallelFor.h).
If we were to move ParallelFor.h into the core lib then we could integrate the rest of it as well. It could be done in a way that doesn't touch the regular Bullet classes and only adds to the "Mt" classes that already exist.

The ParallelFor.h header would need some cleanup before it could go into the core lib though. I'll need to think about how that could be done.

lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound » Sun May 21, 2017 6:44 pm

By the way, if you were getting a crash in the example browser when turning on multi-threading, it should be fixed in the latest.

lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound » Sun Jun 04, 2017 3:08 am

This change

https://github.com/bulletphysics/bullet3/pull/1144

which has now been merged, makes it substantially easier to integrate multithreaded Bullet 2.x into a project.
It is no longer necessary to subclass any Bullet classes at all. Just do these things:
  • - Use btCollisionDispatcherMt and btDiscreteDynamicsWorldMt instead of the usual (non-Mt versions)
    - call btSetTaskScheduler() and pass in your preferred task scheduler
There are now 4 task schedulers provided:
  • - OpenMP
    - Intel TBB
    - PPL
    - "Thread Support" - a basic task scheduler implemented on either Windows threads or Posix threads
The Thread Support task scheduler is not yet included in the core libs, so to use it in your own project you'll need to copy a handful of source files out of examples/MultiThreading.
But I expect it will be moved into the core libs before long.

The Cmake flags for OpenMP/TBB/PPL have been renamed to reflect the fact that they are included in the core libs and not just part of some demo.
The new names are:
  • BULLET2_USE_OPEN_MP_MULTITHREADING
    BULLET2_USE_TBB_MULTITHREADING
    BULLET2_USE_PPL_MULTITHREADING
Here is a code snippet showing how one would set up a physics world for multithreading using OpenMP:

Code: Select all

        m_collisionConfiguration = new btDefaultCollisionConfiguration();
        m_dispatcher = new btCollisionDispatcherMt( m_collisionConfiguration, 40 );  // set up multithreading for narrowphase collision detection
        m_broadphase = new btDbvtBroadphase();
        btConstraintSolverPoolMt* solverPool = new btConstraintSolverPoolMt( BT_MAX_THREAD_COUNT );
        m_dynamicsWorld = new btDiscreteDynamicsWorldMt( m_dispatcher, m_broadphase, solverPool, m_collisionConfiguration );
        m_solver = solverPool;
        btSetTaskScheduler( btGetOpenMPTaskScheduler() );  // set OpenMP as task scheduler (requires Cmake flag BULLET2_USE_OPEN_MP_MULTITHREADING set)
Note the use of "Mt" suffixed classes.

Mako_energy02
Posts: 171
Joined: Sun Jan 17, 2010 4:47 am

Re: CPU multithreading is working!

Post by Mako_energy02 » Tue Jun 06, 2017 3:02 am

I read through the PR (I hadn't realized it existed until you posted! D:) and it looks like it contains a bunch of QoL improvements. Thanks.

However, a singular thought occurs to me since you opted to create a Multi-threaded (Mt) world class. What about soft bodies?

lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound » Wed Jun 07, 2017 4:33 pm

Mako_energy02 wrote:I read through the PR (I hadn't realized it existed until you posted! D:) and it looks like it contains a bunch of QoL improvements. Thanks.

However, a singular thought occurs to me since you opted to create a Multi-threaded (Mt) world class. What about soft bodies?
The Mt dynamics world doesn't support soft bodies. Since the soft body world class is derived from the regular (non-Mt) world class, it won't run the simulation islands in parallel. However, you can still use the Mt collision dispatcher -- that should work with any of the physics worlds.

The multi-body physics world also doesn't run islands in parallel and there are a few reasons for that (which may have been mentioned earlier in this thread, I can't remember). Anyway one of the big reasons for creating the Mt-world was to avoid breaking the multi-body world.

I'm not that familiar with the soft body code, but from a glance I can't see any reasons why it couldn't run islands in parallel. If you need soft bodies and want to try running the islands in parallel, try modifying the soft body world to derive from the Mt-world and see if it works.

If you try it please report back, I'd be curious to know if it works.

Mako_energy02
Posts: 171
Joined: Sun Jan 17, 2010 4:47 am

Re: CPU multithreading is working!

Post by Mako_energy02 » Sun Aug 20, 2017 12:00 am

A slight progress update with my experiences using the multithreaded stuff. First, I haven't yet gotten around to making a btSoftRigidDynamicsWorldMt class yet. I've only been treating this as a side project to other goals. But...other stuff I've found...

I didn't want to use any of the TaskSchedulers you provided as I didn't want to add any extra dependencies to the system. I briefly considered pulling the "Default" task scheduler provided in the demos that uses the older thread support classes. However I don't know why they weren't included in the first place so I am unsure how fit they are for production use. Additionally, it's quite a bit of code spread out across more than a few files. That's a bit more than I am comfortable with grabbing and dropping into my project. So I ended up writing a super basic TaskScheduler based on C++11 threading. Nothing fancy about it. Side note: I had to add declarations for "void btPushThreadsAreRunning()" and "void btPopThreadsAreRunning()" to btThreads.h to write it like the other provided schedulers.

Initially I was getting crashes as soon as stepSimulation ran with bodies in the world. This turned out to be fixed by calling:

Code: Select all

btResetThreadIndexCounter();
I was just throwing stuff at the wall to see what stuck. Why does that make or break the system?

After "resolving" that, I am now faced with what appears to be an infinite loop that occurs when an object enters the AABB of/collides with another object using a "btBvhTriangleMeshShape". I haven't yet completely ruled out my own code being the culprit. Is this something you tested?

lunkhound
Posts: 99
Joined: Thu Nov 21, 2013 8:57 pm

Re: CPU multithreading is working!

Post by lunkhound » Tue Aug 22, 2017 6:35 am

Mako_energy02 wrote:A slight progress update with my experiences using the multithreaded stuff. First, I haven't yet gotten around to making a btSoftRigidDynamicsWorldMt class yet. I've only been treating this as a side project to other goals. But...other stuff I've found...

I didn't want to use any of the TaskSchedulers you provided as I didn't want to add any extra dependencies to the system. I briefly considered pulling the "Default" task scheduler provided in the demos that uses the older thread support classes. However I don't know why they weren't included in the first place so I am unsure how fit they are for production use. Additionally, it's quite a bit of code spread out across more than a few files. That's a bit more than I am comfortable with grabbing and dropping into my project. So I ended up writing a super basic TaskScheduler based on C++11 threading. Nothing fancy about it. Side note: I had to add declarations for "void btPushThreadsAreRunning()" and "void btPopThreadsAreRunning()" to btThreads.h to write it like the other provided schedulers.
When first trying to stand up a project based on multi-threaded Bullet, I highly recommend getting it stood up with one of the provided task schedulers as the first step.
Then, once that part is working, work towards writing your own task scheduler. And if any crashes or weird bugs show up, you can switch to the old task scheduler to see if the problem goes away.
Otherwise, it's hard to know where the problem is coming from.
If your compiler has OpenMP support (which I believe most do), you could try that task scheduler as it has no dependencies -- i.e. it is a compiler feature, not a library.

The "Default" task scheduler isn't included in the core libs yet because the old thread support classes have some API problems that I think should be cleaned up before it goes into core. I made a pull-request (https://github.com/bulletphysics/bullet3/pull/1194) a while back to start this clean up, but it has been languishing for some reason.
Mako_energy02 wrote:Initially I was getting crashes as soon as stepSimulation ran with bodies in the world. This turned out to be fixed by calling:

Code: Select all

btResetThreadIndexCounter();
I was just throwing stuff at the wall to see what stuck. Why does that make or break the system?
Bullet multithreading makes certain assumptions about how the task scheduler deals with thread creation and destruction. Basically if the task scheduler destroys any of it's threads, it should destroy ALL threads and call ResetThreadIndexCounter, otherwise the internal thread counter could wrap around and threads will start stomping on each other's data. Also note that if you call ResetThreadIndexCounter without destroying all threads, that will lead to more than one thread with the same thread-id, and threads will again stomp on each other.
Mako_energy02 wrote:After "resolving" that, I am now faced with what appears to be an infinite loop that occurs when an object enters the AABB of/collides with another object using a "btBvhTriangleMeshShape". I haven't yet completely ruled out my own code being the culprit. Is this something you tested?
The example browser uses BvhTriangleMeshShape for background collision in one of the demos under "benchmark", and it runs fine with multithreading. I assume you probably know that BvhTriangleMeshShape can only collide with convex shapes (i.e. no mesh vs mesh collision).

Also, I'm working on more multithreading stuff -- a parallel constraint solver that can speed up the solving of a single large island -- something the current version doesn't do. As part of these changes, I'm changing the ITaskScheduler interface to support additional features. So just a heads-up, if you are writing your own task scheduler -- there are some changes coming.

Post Reply