Re: CPU multithreading is working!
Posted: Tue Dec 16, 2014 10:25 pm
Exciting! Will the MT version work on iOS devices ?
https://pybullet.org/Bullet/phpBB3/
I don't think you're interrupting at all. The more the merrier!Flix wrote:I don't usually like to pop-in and interrupt an ongoing discussion
No, I can't see any reason why such callbacks would cause a problem. In my own code I'm using the gContactAddedCallback without issues so far. Obviously the callback will have to be threadsafe.Flix wrote:I don't usually like to pop-in and interrupt an ongoing discussion: however I'd like to make a few brief questions/considerations, soon after having thanked lunkhound for having restored/rewritten/reconsidered/reforked the Bullet Multithreaded branch .
Here are my questions/considerations (I still haven't tested the library yet, I'm just taking a look at its code):
1) In the old/legacy Bullet Multithreaded all the callbacks (e.g. the custom material callback) were forbidden (= no more usable): does this limitation apply to your version as well?
a - Yes. Pretty much all that is needed for (a) is enough atomic operations to make a basic mutex. I started out using OS provided mutexes (i.e. Windows critical sections), but they were slow compared to the lightweight mutex I ended up with. It should be quite straightforward to add more fallbacks as needed.Flix wrote:2) As far as I can understand in your source code the threading stuff is confined to:
a) LinearMath/btThreads.h/cpp: for the basic stuff (e.g. mutex,lock,thread). This is the only part of the code that uses C++11 and defaults to MSVC++ intrinsics if not supported (there is no fallback on other systems).
b) Demos/MultiThreadedDemo/MultiThreadedDemo.h/cpp: for the advanced stuff.This is the only part of the code that uses the TBB library (for tbb::blocked_range AFAICS).
Is this correct?
I think since the requirements are so small for what btThreads actually needs (it's really just a compare-and-swap, and atomic load) that it doesn't really make sense to add any extra dependencies on external libraries for it. A few lines of asm or intrinsics for various fallback cases should cover it. The btMutex class is very similar to tinythreads::fast_mutex. Although I don't like the fact that the fast_mutex header includes other headers for the sake of inlining its member functions.Flix wrote: I was thinking that maybe by using http://tinythreadpp.bitsnbites.eu/ (a subset of C++11 in a single cpp file with a liberal license) we could add a fallback to point a (or, with a clever use of typedefs and using directives, we could even think of switching seamlessly between C++11/boost_thread and tinythreads++ using some precompiler definition, by forcing these three libraries to use the same API).
I was thinking of OpenMP as well (as an alternative to TBB). I think it does have parallel_for, but I haven't actually used it. It would be nice to have at least 2 choices for task schedulers to underscore the fact that this isn't tied to the architecture of any one particular task scheduler.Flix wrote:Maybe for point b we can think about providing an alternative demo using OpenMP as a replacement of TBB (however. I'm not sure if there's something compatible to tbb::blocked_range: I'll have to check the openMP API). Probably here the best option would be not to use any additional advanced library at all, and build some infrastructure using btThreads.h/cpp to have less dependencies, but I guess it won't be as easy/efficient: so it's OK for me.
I'll double check my code again (apparently thread-safe now, but still want to make sure), and then I'll post a patch on that issue.lunkhound wrote: The MLCP solver is apparently not threadsafe. I don't know about any of the other alternative solvers.
Yup, on window's it's actually easy enough, just need to use some derivative of:lunkhound wrote:Oh, I noticed that right now the MultiThreadedDemo is hardcoding the number of threads to 4. If you are doing performance testing, make sure to set numThreads to match your hardware!
Code: Select all
#include "process.h"
SYSTEM_INFO sysinfo;
GetSystemInfo(&sysinfo);
numThreads = sysinfo.dwNumberOfProcessors;//not recommended for multiprocessor though
Code: Select all
std::thread::hardware_concurrency()
Code: Select all
omp_get_max_threads();
Code: Select all
tbb::task_scheduler_init::default_num_threads();
Code: Select all
bool btIsAligned( const void* ptr, unsigned int alignment )
{
btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2
return ( ( (unsigned int) ptr )&( alignment - 1 ) ) == 0;
}
Code: Select all
bool btIsAligned( const void* ptr, unsigned int alignment )
{
btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2
return ( ( (size_t) ptr )&( alignment - 1 ) ) == 0;
}
Good find, thanks! I'll fix it. Glad to know its been tested on 64-bit!Flix wrote: ! Just to say I'm very happy I can use callbacks and the OpenMP version!
Only one additional thing:
On Linux 64bit in LinearMath/btThreads.cpp, I had to change:to:Code: Select all
bool btIsAligned( const void* ptr, unsigned int alignment ) { btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2 return ( ( (unsigned int) ptr )&( alignment - 1 ) ) == 0; }
otherwise the cast loses precision and the code won't compile without using the -fpermissive flag on gcc.Code: Select all
bool btIsAligned( const void* ptr, unsigned int alignment ) { btAssert( ( alignment & ( alignment - 1 ) ) == 0 ); // alignment should be a power of 2 return ( ( (size_t) ptr )&( alignment - 1 ) ) == 0; }
I'm planning to make a pull request soon. I'm just waiting to see if any bugs crop up and hopefully get a few more reports from people using it.Granyte wrote:EDIT: This is working great any chance this would get merged back into the main trunk