CPU multithreading is working!
-
- Posts: 28
- Joined: Sun May 13, 2012 7:14 am
Re: CPU multithreading is working!
Awesome work! Will the multi-threading optimization work on iOS/Neon processors ?
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
It might help to know some additional details about your setup.ai-music wrote:Nice work! Thanks. Really good initiative!
But check CCD, it working incorrect. Trouble like this - https://code.google.com/p/bullet/issues/detail?id=356
video - http://www.youtube.com/watch?v=Q17MnAMujTI
And when i tested your code in my application (miltithreading mode with kinematic characters, dynamic rigid bodys, static concave meshes) i noticed not smoothing work of physical system. Sometimes SimulationStep takes too long. It is not noticeable on a simple example with cubes on static plane.
Regards.
Which constraint solver are you using? Was it the MLCP one?
If it is the MLCP solver, can you try it with the sequential impulse solver and see if the problem persists?
Which task scheduler are you using?
Another thing to try is reducing the number of threads. Some CPUs will report 2 hardware threads per core (hyperthreading) which may lead to too many threads. In that case you may get better performance with just one thread per core.
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
In theory it should work on iOS (assuming TBB or OpenMP is available). However I don't know if anyone has built it for iOS. Some cmake tweaks might be needed.kingchurch wrote:Awesome work! Will the multi-threading optimization work on iOS/Neon processors ?
If you try it on iOS, please report the results back here.
-
- Posts: 8
- Joined: Wed Jun 10, 2015 2:41 pm
Re: CPU multithreading is working!
Setup like your sample (MultiThreadingDemo) - sequential impulse solver etc... CCD working incorrect without multithreading too (but clean bullet-3-master (2.83) working correct). Maybe you can see this bug when you try shooting (some CCD rigid body like a bullet) to example boxes. And maybe createPredictiveContact() or the nearest functions do not work properly.lunkhound wrote: Which constraint solver are you using? Was it the MLCP one?
If it is the MLCP solver, can you try it with the sequential impulse solver and see if the problem persists?
OpenMP for MSVC 2010.lunkhound wrote: Which task scheduler are you using?
Test processors is AMD (FX) QuadCore and AMD Athlon 64 DualCore. i tried to reduce and expand number of threads, but I got the same result on hard complex physical scene (concave static, convex kinematic and dynamic rigid bodys)...lunkhound wrote: Another thing to try is reducing the number of threads. Some CPUs will report 2 hardware threads per core (hyperthreading) which may lead to too many threads. In that case you may get better performance with just one thread per core.
UPDATE: I will try to fix OpenMP version for MSVC 2010 like this : http://stackoverflow.com/questions/4738 ... 8-and-2010
Regard.
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
Can you post a snippet of code showing how you create the CCD rigid body?ai-music wrote: Setup like your sample (MultiThreadingDemo) - sequential impulse solver etc... CCD working incorrect without multithreading too (but clean bullet-3-master (2.83) working correct). Maybe you can see this bug when you try shooting (some CCD rigid body like a bullet) to example boxes. And maybe createPredictiveContact() or the nearest functions do not work properly.
-
- Posts: 8
- Joined: Wed Jun 10, 2015 2:41 pm
Re: CPU multithreading is working!
lunkhound wrote: Can you post a snippet of code showing how you create the CCD rigid body?
Code: Select all
//CCD example
//by default CCD is enabled (world->getDispatchInfo().m_useContinuous == true)
//when create bullet dynamic rigid body (sphere shape with radius == 1.f and mass == 1.f) use this:
body->setCcdSweptSphereRadius(0.5f); //max 1.f
body->setCcdMotionThreshold(1.f);
//for shooting use this:
btVector3 dir(0.f, 0.f, 1.f); //any direction
dir *= 250.f; //any multiply-factor
body->applyCentralImpulse(dir);
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
I tried to reproduce Ccd problem you mentioned. Here is what I did.
Go to the file bullet3/examples/Benchmarks/BenchmarkDemo.cpp, line 116 (right after the resetCamera() method of the BenchmarkDemo class). Add the following method:
Compile in Release (benchmark demos won't appear otherwise). When you run any of the benchmark demos in the example browser, pressing the 'n' key will launch a sphere downwards at 250 meters per second. I tried it with "1000 stack" (adjusted the start location to impact the pyramid of boxes), as well as the "Convex stack", "prim vs mesh", and "convex vs mesh".
I applied this change on top of my patch as well as before my patch. I couldn't see any differences in behavior, and Ccd appeared to be working fine.
I also pasted that code (with slight modifications) into the MultithreadedDemo, and it seemed to work there as well.
If you can reproduce the problem in the example browser, I'll take another look.
Go to the file bullet3/examples/Benchmarks/BenchmarkDemo.cpp, line 116 (right after the resetCamera() method of the BenchmarkDemo class). Add the following method:
Code: Select all
virtual bool keyboardCallback( int key, int state )
{
bool handled = false;
if ( state )
{
if ( key == 'n' )
{
// Ccd ball
btTransform sphereTrans;
sphereTrans.setIdentity();
sphereTrans.setOrigin( btVector3( -20.f, 200.f, -20.f ) );
btSphereShape* ball = new btSphereShape( 1.f );
m_collisionShapes.push_back( ball );
btRigidBody* ballBody = createRigidBody( 1.f, sphereTrans, ball );
ballBody->setCcdMotionThreshold( 1.f );
ballBody->setCcdSweptSphereRadius( 0.5f );
ballBody->setLinearVelocity( btVector3( 0.f, -250.f, 0.f ) );
m_guiHelper->createCollisionShapeGraphicsObject( ball );
m_guiHelper->createCollisionObjectGraphicsObject( ballBody, btVector3( 1.f, 1.f, 0.f ) );
handled = true;
}
}
return handled;
}
I applied this change on top of my patch as well as before my patch. I couldn't see any differences in behavior, and Ccd appeared to be working fine.
I also pasted that code (with slight modifications) into the MultithreadedDemo, and it seemed to work there as well.
If you can reproduce the problem in the example browser, I'll take another look.
-
- Posts: 8
- Joined: Wed Jun 10, 2015 2:41 pm
Re: CPU multithreading is working!
Try this code: (press 'n' key once for long time) and you can see effect as at video: https://youtu.be/A8SPOrGukcwlunkhound wrote: If you can reproduce the problem in the example browser, I'll take another look.
You do not have the required permissions to view the files attached to this post.
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
Pretty sure I found the bug. I updated my repo with the fix.ai-music wrote:Try this code: (press 'n' key once for long time) and you can see effect as at video: https://youtu.be/A8SPOrGukcwlunkhound wrote: If you can reproduce the problem in the example browser, I'll take another look.
Thanks for your help in tracking that down!
-
- Posts: 8
- Joined: Wed Jun 10, 2015 2:41 pm
Re: CPU multithreading is working!
Thanks for answers. I'll try to find out the cause of the brake openMP at MSVC 2010.
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
If the multithreadedDemo is crashing when you launch spheres into the scene, it may be running out of persistent manifolds.ai-music wrote:Thanks for answers. I'll try to find out the cause of the brake openMP at MSVC 2010.
In that case, remove this line:
Code: Select all
m_dispatcher->setDispatcherFlags( btCollisionDispatcher::CD_DISABLE_CONTACTPOOL_DYNAMIC_ALLOCATION );
Try this version (use 'y' key to launch spheres):
You do not have the required permissions to view the files attached to this post.
-
- Posts: 8
- Joined: Wed Jun 10, 2015 2:41 pm
Re: CPU multithreading is working!
The same result: https://youtu.be/faK1yXDL6fM
I think that bug around functions included in internalSimulationStep()... Need to compare with clean-2.83.
UPDATE: for more performance OpenMP at all versions of MSVC add env. variable:
ref.: http://stackoverflow.com/questions/2074 ... controlled
I think that bug around functions included in internalSimulationStep()... Need to compare with clean-2.83.
UPDATE: for more performance OpenMP at all versions of MSVC add env. variable:
Code: Select all
#ifdef WIN32
_putenv_s("OMP_WAIT_POLICY", "PASSIVE");
#endif
-
- Posts: 99
- Joined: Thu Nov 21, 2013 8:57 pm
Re: CPU multithreading is working!
Did you apply the bugfix? That video looks exactly like the bug that I fixed.ai-music wrote:The same result: https://youtu.be/faK1yXDL6fM
I think that bug around functions included in internalSimulationStep()... Need to compare with clean-2.83.
-
- Posts: 8
- Joined: Wed Jun 10, 2015 2:41 pm
Re: CPU multithreading is working!
Oh thanks. It works. I'll test multithread mode and let you know in case of failure.lunkhound wrote: Did you apply the bugfix? That video looks exactly like the bug that I fixed.
-
- Posts: 8
- Joined: Wed Jun 10, 2015 2:41 pm
Re: CPU multithreading is working!
Finally, after tests i changed sheduler to PPL (at MSVC 2010 OpenMP works not smoothly).
But for MSVC 2010 need to change code for PPL in ParallelFor.h (because partitioner-argument is not supported for parallel_for) like this:
And when used only one of (TBB or PPL) scheduler - initTaskScheduler() set api == apiNone. Small fix:
UPDATE: fix for Debug mode (exception when the scene is clean):
And now my gameEngine works even faster. Thanks a lot for your work!
PS: Erwin has to know about it and maybe add MT-support officialy.
But for MSVC 2010 need to change code for PPL in ParallelFor.h (because partitioner-argument is not supported for parallel_for) like this:
Code: Select all
template <class TBody>
struct PplBodyAdapter
{
int i_grain;
int i_end;
const TBody* mBody;
void operator()( int i ) const
{
mBody->forLoop( i, (std::min)( i + i_grain, i_end ));
}
};
#endif // #if USE_PPL
Code: Select all
//ParallelFor function
#if USE_PPL
if ( gTaskApi == apiPpl )
{
// PPL dispatch
PplBodyAdapter<TBody> pplBody;
pplBody.mBody = &body;
pplBody.i_grain = grainSize;
pplBody.i_end = iEnd;
Concurrency::parallel_for( iBegin,
iEnd,
grainSize,
pplBody);
return;
}
#endif //#if USE_PPL
Code: Select all
static void initTaskScheduler()
{
#ifdef USE_PPL
setTaskApi( apiPpl );
#endif
#ifdef USE_TBB
setTaskApi( apiTbb );
#endif
#ifdef USE_OPENMP
setTaskApi( apiOpenMP );
#endif
}
Code: Select all
virtual void dispatchAllCollisionPairs( btOverlappingPairCache* pairCache, const btDispatcherInfo& info, btDispatcher* dispatcher ) BT_OVERRIDE
{
int grainSize = 40; // iterations per task
int pairCount = pairCache->getNumOverlappingPairs();
if (pairCount > 0) //ADDED
{
Updater updater;
updater.mCallback = getNearCallback();
updater.mPairArray = pairCache->getOverlappingPairArrayPtr(); //here is exeption (null pointer access)
updater.mDispatcher = this;
updater.mInfo = &info;
btPushThreadsAreRunning();
parallelFor( 0, pairCount, grainSize, updater );
btPopThreadsAreRunning();
}
}
PS: Erwin has to know about it and maybe add MT-support officialy.