Re: CPU multithreading is working!
Posted: Wed Jun 10, 2015 10:34 pm
Awesome work! Will the multi-threading optimization work on iOS/Neon processors ?
https://pybullet.org/Bullet/phpBB3/
It might help to know some additional details about your setup.ai-music wrote:Nice work! Thanks. Really good initiative!
But check CCD, it working incorrect. Trouble like this - https://code.google.com/p/bullet/issues/detail?id=356
video - http://www.youtube.com/watch?v=Q17MnAMujTI
And when i tested your code in my application (miltithreading mode with kinematic characters, dynamic rigid bodys, static concave meshes) i noticed not smoothing work of physical system. Sometimes SimulationStep takes too long. It is not noticeable on a simple example with cubes on static plane.
Regards.
In theory it should work on iOS (assuming TBB or OpenMP is available). However I don't know if anyone has built it for iOS. Some cmake tweaks might be needed.kingchurch wrote:Awesome work! Will the multi-threading optimization work on iOS/Neon processors ?
Setup like your sample (MultiThreadingDemo) - sequential impulse solver etc... CCD working incorrect without multithreading too (but clean bullet-3-master (2.83) working correct). Maybe you can see this bug when you try shooting (some CCD rigid body like a bullet) to example boxes. And maybe createPredictiveContact() or the nearest functions do not work properly.lunkhound wrote: Which constraint solver are you using? Was it the MLCP one?
If it is the MLCP solver, can you try it with the sequential impulse solver and see if the problem persists?
OpenMP for MSVC 2010.lunkhound wrote: Which task scheduler are you using?
Test processors is AMD (FX) QuadCore and AMD Athlon 64 DualCore. i tried to reduce and expand number of threads, but I got the same result on hard complex physical scene (concave static, convex kinematic and dynamic rigid bodys)...lunkhound wrote: Another thing to try is reducing the number of threads. Some CPUs will report 2 hardware threads per core (hyperthreading) which may lead to too many threads. In that case you may get better performance with just one thread per core.
Can you post a snippet of code showing how you create the CCD rigid body?ai-music wrote: Setup like your sample (MultiThreadingDemo) - sequential impulse solver etc... CCD working incorrect without multithreading too (but clean bullet-3-master (2.83) working correct). Maybe you can see this bug when you try shooting (some CCD rigid body like a bullet) to example boxes. And maybe createPredictiveContact() or the nearest functions do not work properly.
lunkhound wrote: Can you post a snippet of code showing how you create the CCD rigid body?
Code: Select all
//CCD example
//by default CCD is enabled (world->getDispatchInfo().m_useContinuous == true)
//when create bullet dynamic rigid body (sphere shape with radius == 1.f and mass == 1.f) use this:
body->setCcdSweptSphereRadius(0.5f); //max 1.f
body->setCcdMotionThreshold(1.f);
//for shooting use this:
btVector3 dir(0.f, 0.f, 1.f); //any direction
dir *= 250.f; //any multiply-factor
body->applyCentralImpulse(dir);
Code: Select all
virtual bool keyboardCallback( int key, int state )
{
bool handled = false;
if ( state )
{
if ( key == 'n' )
{
// Ccd ball
btTransform sphereTrans;
sphereTrans.setIdentity();
sphereTrans.setOrigin( btVector3( -20.f, 200.f, -20.f ) );
btSphereShape* ball = new btSphereShape( 1.f );
m_collisionShapes.push_back( ball );
btRigidBody* ballBody = createRigidBody( 1.f, sphereTrans, ball );
ballBody->setCcdMotionThreshold( 1.f );
ballBody->setCcdSweptSphereRadius( 0.5f );
ballBody->setLinearVelocity( btVector3( 0.f, -250.f, 0.f ) );
m_guiHelper->createCollisionShapeGraphicsObject( ball );
m_guiHelper->createCollisionObjectGraphicsObject( ballBody, btVector3( 1.f, 1.f, 0.f ) );
handled = true;
}
}
return handled;
}
Try this code: (press 'n' key once for long time) and you can see effect as at video: https://youtu.be/A8SPOrGukcwlunkhound wrote: If you can reproduce the problem in the example browser, I'll take another look.
Pretty sure I found the bug. I updated my repo with the fix.ai-music wrote:Try this code: (press 'n' key once for long time) and you can see effect as at video: https://youtu.be/A8SPOrGukcwlunkhound wrote: If you can reproduce the problem in the example browser, I'll take another look.
If the multithreadedDemo is crashing when you launch spheres into the scene, it may be running out of persistent manifolds.ai-music wrote:Thanks for answers. I'll try to find out the cause of the brake openMP at MSVC 2010.
Code: Select all
m_dispatcher->setDispatcherFlags( btCollisionDispatcher::CD_DISABLE_CONTACTPOOL_DYNAMIC_ALLOCATION );
Code: Select all
#ifdef WIN32
_putenv_s("OMP_WAIT_POLICY", "PASSIVE");
#endif
Did you apply the bugfix? That video looks exactly like the bug that I fixed.ai-music wrote:The same result: https://youtu.be/faK1yXDL6fM
I think that bug around functions included in internalSimulationStep()... Need to compare with clean-2.83.
Oh thanks. It works. I'll test multithread mode and let you know in case of failure.lunkhound wrote: Did you apply the bugfix? That video looks exactly like the bug that I fixed.
Code: Select all
template <class TBody>
struct PplBodyAdapter
{
int i_grain;
int i_end;
const TBody* mBody;
void operator()( int i ) const
{
mBody->forLoop( i, (std::min)( i + i_grain, i_end ));
}
};
#endif // #if USE_PPL
Code: Select all
//ParallelFor function
#if USE_PPL
if ( gTaskApi == apiPpl )
{
// PPL dispatch
PplBodyAdapter<TBody> pplBody;
pplBody.mBody = &body;
pplBody.i_grain = grainSize;
pplBody.i_end = iEnd;
Concurrency::parallel_for( iBegin,
iEnd,
grainSize,
pplBody);
return;
}
#endif //#if USE_PPL
Code: Select all
static void initTaskScheduler()
{
#ifdef USE_PPL
setTaskApi( apiPpl );
#endif
#ifdef USE_TBB
setTaskApi( apiTbb );
#endif
#ifdef USE_OPENMP
setTaskApi( apiOpenMP );
#endif
}
Code: Select all
virtual void dispatchAllCollisionPairs( btOverlappingPairCache* pairCache, const btDispatcherInfo& info, btDispatcher* dispatcher ) BT_OVERRIDE
{
int grainSize = 40; // iterations per task
int pairCount = pairCache->getNumOverlappingPairs();
if (pairCount > 0) //ADDED
{
Updater updater;
updater.mCallback = getNearCallback();
updater.mPairArray = pairCache->getOverlappingPairArrayPtr(); //here is exeption (null pointer access)
updater.mDispatcher = this;
updater.mInfo = &info;
btPushThreadsAreRunning();
parallelFor( 0, pairCount, grainSize, updater );
btPopThreadsAreRunning();
}
}