Page 1 of 1

Bullet 2.73 has been updated to Bullet 2.73 SP1.

Posted: Wed Dec 03, 2008 6:58 am
by Erwin Coumans
It contains recent bug fixes and a new optimized x86 SIMD SSE innerloop for the constraint solver.

Please download the latest version from Google Code:


Re: Bullet 2.73 has been updated to Bullet 2.73 SP1.

Posted: Wed Dec 03, 2008 3:25 pm
by Dragonlord
Do you really have to make this optimization ( SSE ) on your own? Should GNU/GCC not do this already for you when possible?

Re: Bullet 2.73 has been updated to Bullet 2.73 SP1.

Posted: Wed Dec 03, 2008 6:05 pm
by Erwin Coumans
Do you really have to make this optimization ( SSE ) on your own? Should GNU/GCC not do this already for you when possible?
Can you try it out?

I'm curious to see the results, but I doubt GCC removes the branches automatically and perform auto-vectorization as good as manual work. Visual Studio 2008 doesn't perform auto-vectorization and the code has several branches. Is there some MSVC setting that needs to be enabled for auto-vectorization?

Manual SSE vectorization and removal of the branches is a big performance improvement of around 40%. Given that the constraint solver typically takes 45% of the total time, this can be almost 20% overall physics performance. Currently, only contact and friction constraints and the generic 6dof constraint uses this SSE code, but my colleague Roman is helping to make sure all constraint types use this use this SSE code.

The manual optimized SSE version is totally branchless and doesn't use FPU, see the assembly code (SIMD/SSE to the right).

Code: Select all

  00468005  fld         dword ptr [ecx+94h]             0041B4B8  fld         dword ptr [ecx+60h]         
  0046800B  fld         dword ptr [ecx+98h]             0041B4BB  fstp        dword ptr [esp+1Ch]         
  00468011  fmul        dword ptr [ecx+60h]             0041B4BF  movss       xmm0,dword ptr [esp+1Ch]    
  00468014  fsubp       st(1),st                        0041B4C5  movaps      xmm2,xmmword ptr [edx+10h]  
  00468016  fstp        dword ptr [esp+18h]             0041B4C9  movaps      xmm1,xmmword ptr [ecx]      
  0046801A  fld         dword ptr [esp+18h]             0041B4CC  movaps      xmm3,xmmword ptr [ecx+10h]  
  0046801E  fld         dword ptr [edx+14h]             0041B4D0  movaps      xmm4,xmmword ptr [edx]      
  00468021  fmul        dword ptr [ecx+4]               0041B4D3  movaps      xmm6,xmmword ptr [eax+10h]  
  00468024  fld         dword ptr [ecx]                 0041B4D7  mulps       xmm1,xmm2                   
  00468026  fmul        dword ptr [edx+10h]             0041B4DA  movaps      xmm2,xmm3                   
  00468029  faddp       st(1),st                        0041B4DD  mulps       xmm2,xmm4                   
  0046802B  fld         dword ptr [edx+18h]             0041B4E0  movaps      xmm4,xmmword ptr [eax]      
  0046802E  fmul        dword ptr [ecx+8]               0041B4E3  mulps       xmm3,xmm4                   
  00468031  faddp       st(1),st                        0041B4E6  movaps      xmm4,xmmword ptr [ecx+20h]  
  00468033  fstp        dword ptr [esp+18h]             0041B4EA  mulps       xmm4,xmm6                   
  00468037  fld         dword ptr [esp+18h]             0041B4ED  movss       xmm6,dword ptr [ecx+80h]    
  0046803B  fld         dword ptr [ecx+14h]             0041B4F5  movaps      xmm7,xmm2                   
  0046803E  fmul        dword ptr [edx+4]               0041B4F8  shufps      xmm7,xmm2,0AAh              
  00468041  fld         dword ptr [ecx+10h]             0041B4FC  shufps      xmm6,xmm6,0                 
  00468044  fmul        dword ptr [edx]                 0041B500  movaps      xmmword ptr [esp+180h],xmm6 
  00468046  faddp       st(1),st                        0041B508  movaps      xmm6,xmm2                   
  00468048  fld         dword ptr [ecx+18h]             0041B50B  shufps      xmm6,xmm2,55h               
  0046804B  fmul        dword ptr [edx+8]               0041B50F  shufps      xmm2,xmm2,0                 
  0046804E  faddp       st(1),st                        0041B513  addps       xmm7,xmm6                   
  00468050  fstp        dword ptr [esp+18h]             0041B516  addps       xmm7,xmm2                   
  00468054  fadd        dword ptr [esp+18h]             0041B519  movss       xmm5,dword ptr [ecx+0A0h]   
  00468058  fstp        dword ptr [esp+18h]             0041B521  movaps      xmm2,xmm1                   
  0046805C  fld         dword ptr [esp+18h]             0041B524  shufps      xmm2,xmm1,0AAh              
  00468060  fmul        dword ptr [ecx+80h]             0041B528  movaps      xmm6,xmm1                   
  00468066  fsubp       st(1),st                        0041B52B  shufps      xmm6,xmm1,55h               
  00468068  fstp        dword ptr [esp+18h]             0041B52F  addps       xmm2,xmm6                   
  0046806C  fld         dword ptr [esp+18h]             0041B532  shufps      xmm1,xmm1,0                 
  00468070  fld         dword ptr [edi+14h]             0041B536  addps       xmm2,xmm1                   
  00468073  fmul        dword ptr [ecx+24h]             0041B539  movss       xmm1,dword ptr [ecx+94h]    
  00468076  fld         dword ptr [edi+10h]             0041B541  addps       xmm7,xmm2                   
  00468079  fmul        dword ptr [ecx+20h]             0041B544  mulps       xmm7,xmmword ptr [esp+180h] 
  0046807C  faddp       st(1),st                        0041B54C  movss       xmm2,dword ptr [ecx+98h]    
  0046807E  fld         dword ptr [edi+18h]             0041B554  shufps      xmm2,xmm2,0                 
  00468081  fmul        dword ptr [ecx+28h]             0041B558  shufps      xmm0,xmm0,0                 
  00468084  faddp       st(1),st                        0041B55C  movaps      xmmword ptr [esp+30h],xmm0  
  00468086  fstp        dword ptr [esp+18h]             0041B561  movaps      xmm6,xmmword ptr [esp+30h]  
  0046808A  fld         dword ptr [esp+18h]             0041B566  movss       xmm0,dword ptr [ecx+9Ch]    
  0046808E  fld         dword ptr [ecx+14h]             0041B56E  mulps       xmm2,xmm6                   
  00468091  fmul        dword ptr [edi+4]               0041B571  shufps      xmm1,xmm1,0                 
  00468094  fld         dword ptr [edi]                 0041B575  subps       xmm1,xmm2                   
  00468096  fmul        dword ptr [ecx+10h]             0041B578  subps       xmm1,xmm7                   
  00468099  faddp       st(1),st                        0041B57B  movaps      xmm2,xmm4                   
  0046809B  fld         dword ptr [ecx+18h]             0041B57E  shufps      xmm2,xmm4,0AAh              
  0046809E  fmul        dword ptr [edi+8]               0041B582  movaps      xmm7,xmm4                   
  004680A1  faddp       st(1),st                        0041B585  shufps      xmm7,xmm4,55h               
  004680A3  fstp        dword ptr [esp+18h]             0041B589  addps       xmm2,xmm7                   
  004680A7  fsub        dword ptr [esp+18h]             0041B58C  shufps      xmm4,xmm4,0                 
  004680AB  fstp        dword ptr [esp+18h]             0041B590  addps       xmm2,xmm4                   
  004680AF  fld         dword ptr [esp+18h]             0041B593  movaps      xmm4,xmm3                   
  004680B3  fmul        dword ptr [ecx+80h]             0041B596  shufps      xmm4,xmm3,0AAh              
  004680B9  fsubp       st(1),st                        0041B59A  movaps      xmm7,xmm3                   
  004680BB  fstp        dword ptr [esp+20h]             0041B59D  shufps      xmm7,xmm3,55h               
  004680BF  fld         dword ptr [esp+20h]             0041B5A1  shufps      xmm3,xmm3,0                 
  004680C3  fld         st(0)                           0041B5A5  addps       xmm4,xmm7                   
  004680C5  fadd        dword ptr [ecx+60h]             0041B5A8  addps       xmm4,xmm3                   
  004680C8  fstp        dword ptr [esp+18h]             0041B5AB  subps       xmm2,xmm4                   
  004680CC  fld         dword ptr [esp+18h]             0041B5AE  mulps       xmm2,xmmword ptr [esp+180h] 
  004680D0  fld         dword ptr [ecx+9Ch]             0041B5B6  subps       xmm1,xmm2                   
  004680D6  fcomp       st(1)                           0041B5B9  movaps      xmm7,xmm1                   
  004680D8  fnstsw      ax                              0041B5BC  movaps      xmm3,xmm7                   
  004680DA  test        ah,41h                          0041B5BF  shufps      xmm0,xmm0,0                 
  004680DD  jne         004680FE                        0041B5C3  shufps      xmm5,xmm5,0                 
  004680DF  movss       xmm0,dword ptr [ecx+9Ch]        0041B5C7  addps       xmm3,xmm6                   
  004680E7  fstp        st(1)                           0041B5CA  movaps      xmm1,xmm3                   
  004680E9  fstp        st(0)                           0041B5CD  cmpltps     xmm1,xmm0                   
  004680EB  fld         dword ptr [ecx+9Ch]             0041B5D1  movaps      xmm4,xmm1                   
  004680F1  fsub        dword ptr [ecx+60h]             0041B5D4  andnps      xmm4,xmm3                   
  004680F4  fstp        dword ptr [esp+20h]             0041B5D7  movaps      xmm2,xmm3                   
  004680F8  fld         dword ptr [esp+20h]             0041B5DA  movaps      xmm3,xmm1                   
  004680FC  jmp         00468130                        0041B5DD  andps       xmm3,xmm0                   
  004680FE  fld         dword ptr [ecx+0A0h]            0041B5E0  orps        xmm4,xmm3                   
  00468104  fcompp                                      0041B5E3  movups      xmmword ptr [ecx+60h],xmm4  
  00468106  fnstsw      ax                              0041B5E7  movaps      xmm4,xmmword ptr [ecx+60h]  
  00468108  test        ah,5                            0041B5EB  cmpltps     xmm2,xmm5                   
  0046810B  jp          0046812A                        0041B5EF  movaps      xmm3,xmm2                   
  0046810D  movss       xmm0,dword ptr [ecx+0A0h]       0041B5F2  andps       xmm3,xmm4                   
  00468115  fstp        st(0)                           0041B5F5  movaps      xmm4,xmm2                   
  00468117  fld         dword ptr [ecx+0A0h]            0041B5F8  subps       xmm0,xmm6                   
  0046811D  fsub        dword ptr [ecx+60h]             0041B5FB  andps       xmm0,xmm1                   
  00468120  fstp        dword ptr [esp+20h]             0041B5FE  andnps      xmm1,xmm7                   
  00468124  fld         dword ptr [esp+20h]             0041B601  orps        xmm0,xmm1                   
  00468128  jmp         00468130                        0041B604  andps       xmm0,xmm2                   
  0046812A  movss       xmm0,dword ptr [esp+18h]        0041B607  andnps      xmm4,xmm5                   
  00468130  shufps      xmm0,xmm0,0                     0041B60A  orps        xmm3,xmm4                   
  00468134  fld         st(1)                           0041B60D  movaps      xmm4,xmmword ptr [ecx+10h]  
  00468136  movups      xmmword ptr [ecx+60h],xmm0      0041B611  movups      xmmword ptr [ecx+60h],xmm3  
  0046813A  fcomp       dword ptr [edx+24h]             0041B615  movss       xmm1,dword ptr [edx+24h]    
  0046813D  fnstsw      ax                              0041B61A  movss       xmm3,dword ptr [eax+24h]    
  0046813F  test        ah,44h                          0041B61F  shufps      xmm1,xmm1,0                 
  00468142  jnp         0046821A                        0041B623  mulps       xmm1,xmm4                   
  00468148  fld         dword ptr [edx+24h]             0041B626  subps       xmm5,xmm6                   
  0046814B  fmul        dword ptr [ecx+10h]             0041B629  andnps      xmm2,xmm5                   
  0046814E  fstp        dword ptr [esp+1F0h]            0041B62C  orps        xmm0,xmm2                   
  00468155  fld         dword ptr [ecx+14h]             0041B62F  movaps      xmm2,xmmword ptr [edx]      
  00468158  fmul        dword ptr [edx+24h]             0041B632  mulps       xmm1,xmm0                   
  0046815B  fstp        dword ptr [esp+1F4h]            0041B635  addps       xmm1,xmm2                   
  00468162  fld         dword ptr [ecx+18h]             0041B638  movaps      xmm2,xmmword ptr [edx+10h]  
  00468165  fmul        dword ptr [edx+24h]             0041B63C  movaps      xmmword ptr [edx],xmm1      
  00468168  fstp        dword ptr [esp+1F8h]            0041B63F  movaps      xmm1,xmmword ptr [ecx+30h]  
  0046816F  fld         dword ptr [esp+1F0h]            0041B643  mulps       xmm1,xmm0                   
  00468176  fmul        st,st(1)                        0041B646  addps       xmm1,xmm2                   
  00468178  fstp        dword ptr [esp+130h]            0041B649  movaps      xmmword ptr [edx+10h],xmm1  
  0046817F  fld         st(0)                           0041B64D  movaps      xmm2,xmmword ptr [eax]      
  00468181  fmul        dword ptr [esp+1F4h]            0041B650  shufps      xmm3,xmm3,0                 
  00468188  fstp        dword ptr [esp+134h]            0041B654  movaps      xmm1,xmm0                   
  0046818F  fld         st(0)                           0041B657  mulps       xmm3,xmm4                   
  00468191  fmul        dword ptr [esp+1F8h]            0041B65A  mulps       xmm1,xmm3                   
  00468198  fstp        dword ptr [esp+138h]            0041B65D  subps       xmm2,xmm1                   
  0046819F  fld         dword ptr [esp+130h]            0041B660  movaps      xmmword ptr [eax],xmm2      
  004681A6  fadd        dword ptr [edx]                 0041B663  movaps      xmm1,xmmword ptr [ecx+40h]  
  004681A8  fstp        dword ptr [edx]                 0041B667  mulps       xmm1,xmm0                   
  004681AA  fld         dword ptr [edx+4]               0041B66A  movaps      xmm0,xmmword ptr [eax+10h]  
  004681AD  fadd        dword ptr [esp+134h]            0041B66E  addps       xmm1,xmm0                   
  004681B4  fstp        dword ptr [edx+4]               0041B671  movaps      xmmword ptr [eax+10h],xmm1  
  004681B7  fld         dword ptr [esp+138h]            
  004681BE  fadd        dword ptr [edx+8]               
  004681C1  fstp        dword ptr [edx+8] 
  004681C4  fld         dword ptr [edx+20h] 
  004681C7  fmul        st,st(1) 
  004681C9  fstp        dword ptr [esp+18h] 
  004681CD  fld         dword ptr [esp+18h] 
  004681D1  fld         st(0) 
  004681D3  fmul        dword ptr [ecx+30h] 
  004681D6  fstp        dword ptr [esp+290h] 
  004681DD  fld         dword ptr [ecx+34h] 
  004681E0  fmul        st,st(1) 
  004681E2  fstp        dword ptr [esp+294h] 
  004681E9  fmul        dword ptr [ecx+38h] 
  004681EC  fstp        dword ptr [esp+298h] 
  004681F3  fld         dword ptr [esp+290h] 
  004681FA  fadd        dword ptr [edx+10h] 
  004681FD  fstp        dword ptr [edx+10h] 
  00468200  fld         dword ptr [edx+14h] 
  00468203  fadd        dword ptr [esp+294h] 
  0046820A  fstp        dword ptr [edx+14h] 
  0046820D  fld         dword ptr [esp+298h] 
  00468214  fadd        dword ptr [edx+18h] 
  00468217  fstp        dword ptr [edx+18h] 
  0046821A  fld         st(1) 
  0046821C  fcomp       dword ptr [edi+24h] 
  0046821F  fnstsw      ax   
  00468221  test        ah,44h 
  00468224  jnp         0046831A 
  0046822A  fld         dword ptr [ecx+10h] 
  0046822D  fchs             
  0046822F  fstp        dword ptr [esp+30h] 
  00468233  fld         dword ptr [ecx+14h] 
  00468236  fchs             
  00468238  fstp        dword ptr [esp+34h] 
  0046823C  fld         dword ptr [ecx+18h] 
  0046823F  fchs             
  00468241  fstp        dword ptr [esp+38h] 
  00468245  fld         dword ptr [esp+30h] 
  00468249  fmul        dword ptr [edi+24h] 
  0046824C  fstp        dword ptr [esp+210h] 
  00468253  fld         dword ptr [esp+34h] 
  00468257  fmul        dword ptr [edi+24h] 
  0046825A  fstp        dword ptr [esp+214h] 
  00468261  fld         dword ptr [esp+38h] 
  00468265  fmul        dword ptr [edi+24h] 
  00468268  fstp        dword ptr [esp+218h] 
  0046826F  fld         dword ptr [esp+210h] 
  00468276  fmul        st,st(1) 
  00468278  fstp        dword ptr [esp+170h] 
  0046827F  fld         st(0) 
  00468281  fmul        dword ptr [esp+214h] 
  00468288  fstp        dword ptr [esp+174h] 
  0046828F  fld         st(0) 
  00468291  fmul        dword ptr [esp+218h] 
  00468298  fstp        dword ptr [esp+178h] 
  0046829F  fld         dword ptr [edi] 
  004682A1  fadd        dword ptr [esp+170h] 
  004682A8  fstp        dword ptr [edi] 
  004682AA  fld         dword ptr [esp+174h] 
  004682B1  fadd        dword ptr [edi+4] 
  004682B4  fstp        dword ptr [edi+4] 
  004682B7  fld         dword ptr [edi+8] 
  004682BA  fadd        dword ptr [esp+178h] 
  004682C1  fstp        dword ptr [edi+8] 
  004682C4  fmul        dword ptr [edi+20h] 
  004682C7  fstp        dword ptr [esp+18h] 
  004682CB  fld         dword ptr [esp+18h] 
  004682CF  fld         st(0) 
  004682D1  fmul        dword ptr [ecx+40h] 
  004682D4  fstp        dword ptr [esp+270h] 
  004682DB  fld         dword ptr [ecx+44h] 
  004682DE  fmul        st,st(1) 
  004682E0  fstp        dword ptr [esp+274h] 
  004682E7  fmul        dword ptr [ecx+48h] 
  004682EA  fstp        dword ptr [esp+278h] 
  004682F1  fld         dword ptr [esp+270h] 
  004682F8  fadd        dword ptr [edi+10h] 
  004682FB  fstp        dword ptr [edi+10h] 
  004682FE  fld         dword ptr [edi+14h] 
  00468301  fadd        dword ptr [esp+274h] 
  00468308  fstp        dword ptr [edi+14h] 
  0046830B  fld         dword ptr [esp+278h] 
  00468312  fadd        dword ptr [edi+18h] 
  00468315  fstp        dword ptr [edi+18h] 
  00468318  jmp         0046831C 
  0046831A  fstp        st(0) 
The actual SIMD code and C++ code is here:

Code: Select all

// Project Gauss Seidel or the equivalent Sequential Impulse
SIMD_FORCE_INLINE void btSequentialImpulseConstraintSolver::resolveSingleConstraintRowGenericSIMD(btSolverBody& body1,btSolverBody& body2,const btSolverConstraint& c)
#ifdef USE_SIMD
	_asm int 3;
	_asm int 3;
	__m128 cpAppliedImp = _mm_set1_ps(c.m_appliedImpulse);
	__m128	lowerLimit1 = _mm_set1_ps(c.m_lowerLimit);
	__m128	upperLimit1 = _mm_set1_ps(c.m_upperLimit);
	__m128 deltaImpulse = _mm_sub_ps(_mm_set1_ps(c.m_rhs), _mm_mul_ps(_mm_set1_ps(c.m_appliedImpulse),_mm_set1_ps(c.m_cfm)));
	__m128 deltaVel1Dotn	=	_mm_add_ps(_vmathVfDot3(c.m_contactNormal.mVec128,body1.m_deltaLinearVelocity.mVec128), _vmathVfDot3(c.m_relpos1CrossNormal.mVec128,body1.m_deltaAngularVelocity.mVec128));
	__m128 deltaVel2Dotn	=	_mm_sub_ps(_vmathVfDot3(c.m_relpos2CrossNormal.mVec128,body2.m_deltaAngularVelocity.mVec128),_vmathVfDot3((c.m_contactNormal).mVec128,body2.m_deltaLinearVelocity.mVec128));
	deltaImpulse	=	_mm_sub_ps(deltaImpulse,_mm_mul_ps(deltaVel1Dotn,_mm_set1_ps(c.m_jacDiagABInv)));
	deltaImpulse	=	_mm_sub_ps(deltaImpulse,_mm_mul_ps(deltaVel2Dotn,_mm_set1_ps(c.m_jacDiagABInv)));
	btSimdScalar sum = _mm_add_ps(cpAppliedImp,deltaImpulse);
	btSimdScalar resultLowerLess,resultUpperLess;
	resultLowerLess = _mm_cmplt_ps(sum,lowerLimit1);
	resultUpperLess = _mm_cmplt_ps(sum,upperLimit1);
	__m128 lowMinApplied = _mm_sub_ps(lowerLimit1,cpAppliedImp);
	deltaImpulse = _mm_or_ps( _mm_and_ps(resultLowerLess, lowMinApplied), _mm_andnot_ps(resultLowerLess, deltaImpulse) );
	c.m_appliedImpulse = _mm_or_ps( _mm_and_ps(resultLowerLess, lowerLimit1), _mm_andnot_ps(resultLowerLess, sum) );
	__m128 upperMinApplied = _mm_sub_ps(upperLimit1,cpAppliedImp);
	deltaImpulse = _mm_or_ps( _mm_and_ps(resultUpperLess, deltaImpulse), _mm_andnot_ps(resultUpperLess, upperMinApplied) );
	c.m_appliedImpulse = _mm_or_ps( _mm_and_ps(resultUpperLess, c.m_appliedImpulse), _mm_andnot_ps(resultUpperLess, upperLimit1) );
	__m128	linearComponentA = _mm_mul_ps(c.m_contactNormal.mVec128,_mm_set1_ps(body1.m_invMass));
	__m128	linearComponentB = _mm_mul_ps((c.m_contactNormal).mVec128,_mm_set1_ps(body2.m_invMass));
	__m128 impulseMagnitude = deltaImpulse;
	body1.m_deltaLinearVelocity.mVec128 = _mm_add_ps(body1.m_deltaLinearVelocity.mVec128,_mm_mul_ps(linearComponentA,impulseMagnitude));
	body1.m_deltaAngularVelocity.mVec128 = _mm_add_ps(body1.m_deltaAngularVelocity.mVec128 ,_mm_mul_ps(c.m_angularComponentA.mVec128,impulseMagnitude));
	body2.m_deltaLinearVelocity.mVec128 = _mm_sub_ps(body2.m_deltaLinearVelocity.mVec128,_mm_mul_ps(linearComponentB,impulseMagnitude));
	body2.m_deltaAngularVelocity.mVec128 = _mm_add_ps(body2.m_deltaAngularVelocity.mVec128 ,_mm_mul_ps(c.m_angularComponentB.mVec128,impulseMagnitude));
	_asm int 3;
	_asm int 3;

// Project Gauss Seidel or the equivalent Sequential Impulse
SIMD_FORCE_INLINE void btSequentialImpulseConstraintSolver::resolveSingleConstraintRowGeneric(btSolverBody& body1,btSolverBody& body2,const btSolverConstraint& c)
	btScalar deltaImpulse = c.m_rhs-btScalar(c.m_appliedImpulse)*c.m_cfm;
	const btScalar deltaVel1Dotn	= 	+;
	const btScalar deltaVel2Dotn	= +;
	const btScalar delta_rel_vel	=	deltaVel1Dotn-deltaVel2Dotn;
	deltaImpulse	-=	deltaVel1Dotn*c.m_jacDiagABInv;
	deltaImpulse	-=	deltaVel2Dotn*c.m_jacDiagABInv;

	const btScalar sum = btScalar(c.m_appliedImpulse) + deltaImpulse;
	if (sum < c.m_lowerLimit)
		deltaImpulse = c.m_lowerLimit-c.m_appliedImpulse;
		c.m_appliedImpulse = c.m_lowerLimit;
	else if (sum > c.m_upperLimit) 
		deltaImpulse = c.m_upperLimit-c.m_appliedImpulse;
		c.m_appliedImpulse = c.m_upperLimit;
		c.m_appliedImpulse = sum;
	if (body1.m_invMass)
	if (body2.m_invMass)

Re: Bullet 2.73 has been updated to Bullet 2.73 SP1.

Posted: Wed Dec 03, 2008 9:45 pm
by Dragonlord
I have not tried this out yet. I just read about this on various places dealing with the question on how to do SSE with GCC and the main answer had been to not do it since the optimizer is already aggressive. Getting some comparison of this though would be interesting. I'm not sure right now if there is a suitable demo app in the distribution of Bullet right now which could be modified to make a direct comparison of a the code with manual optimization and one using GCC alone. I can try once to see how it compares but most probably not in the next days. Furthermore I have only gcc-4.1.2 for testing here although gcc-4.3.2 is the most recent one ( all masked on portage so far ). The SSE abilities ( -ftree-vectorize and company ) though exists there already. Would be interesting to see how the different compilers fare with the different codes.

Re: Bullet 2.73 has been updated to Bullet 2.73 SP1.

Posted: Fri Dec 05, 2008 9:41 pm
by reltham
Is there some MSVC setting that needs to be enabled for auto-vectorization?
With VS 2005 (and I'm sure it's the same for 2008 (and probably 2003)) use this:
/arch:SSE OR /arch:SSE2
/fp:fast (instead of /fp:precise)

If you don't set /fp:fast then it will block some of the SSE/SSE2 optimizations.

These settings cause the compiler to use SSE in a lot of cases, however it's still not the greatest at vectorizing compared to hand done code.
One bonus is that float/int conversions use SSE instructions which are faster.