Hi there,
We have posted a message earlier [1] that talks about parallelization experiments we have performed on a modified version of a Bullet benchmark. The benchmark is AppBenchmarks, specifically the 1000 stacks example which we modified to increase the number of islands.
AppBenchmarks can be compiled to use two parallel algorithms from the Bullet Physics distribution. The first is a parallel constraint solver (btParallelConstraintSolver) and the second is a parallel implementation of the collision dispatcher (SpuGatheringCollision). In our tests we didn't get any benefit from the parallel constraint solver. In fact, we have observed that the parallel constraint solver is often slower than the sequential one. For the parallel collision dispatcher we did observe speed ups from using multiple cores.
Is it known whether btParallelConstraintSolver gives any speedup? Under what workloads does that happen?
Thank you!
Cheers,
Kristian Kolev and Alexey Rodriguez
[1] http://bulletphysics.org/Bullet/phpBB3/ ... f=6&t=8660
			
			
									
						
										
						Does btParallelConstraintSolver give any speedup?
- 
				alexey.rodriguez
- Posts: 2
- Joined: Mon Dec 10, 2012 10:45 pm
- 
				Erwin Coumans  
- Site Admin
- Posts: 4221
- Joined: Sun Jun 26, 2005 6:43 pm
- Location: California, USA
Re: Does btParallelConstraintSolver give any speedup?
Indeed, the btParallelConstraintSolver usually doesn't give speedup: it lacks several of the optimizations that the regular btSequentialImpulseConstraintSolver has.In our tests we didn't get any benefit from the parallel constraint solver. In fact, we have observed that the parallel constraint solver is often slower than the sequential one.
Multithreading the btSequentialImpulseConstraintSolver using constraint splitting (see CustomSplitConstraints in btParallelConstraintSolver.cpp) will likely give good speedup,
both for single large islands and multiple islands.
If you have a patch to run separate simulation islands multithreaded, please share it and I'll look if we can apply your patch.
The SpuGatheringCollisionDispatcher can improve the narrowphase collision detection performance, but it doesn't support all features/collision shape types.For the parallel collision dispatcher we did observe speed ups from using multiple cores.
The latest version of the regular btCollisionDispatcher should be much easier to parallelize, with full features. The only shared resource is the (de) allocation of contact manifolds in the methods btCollisionDispatcher::getNewManifold and btCollisionDispatcher::releaseManifold. Once those two functions are made thread-safe (for example using an atomic compare and swap) the narrowphase becomes embarrasingly parallel.
Thanks and please keep us updated!
Erwin
By the way, most effort in Bullet went into the regular (sequential) physics pipeline, the PlayStation 3 SPU version and nowadays the OpenCL parallel version.