I'm doing some research into collision detection, and I wish to compare my research results against an appropriate GPU-based collision detection algorithm. For the comparison, I selected the Bullet 3 b3GpuSapBroadphase implementation from https://github.com/erwincoumans/bullet3/.
I have some questions on how to use the algorithm. Unlike all of the examples I've seen, I wish to send the updates to the object positions from the CPU to the GPU (and not compute them directly on the GPU). I realise this will disimprove overall application performance, but I'm only interested in the broad phase performance for my experiments. How can I update the AABB locations from the software? (I realise I can call createProxy for each AABB in each iteration and then call reset at the end of the iteration, but this will artificially degrade broad phase performance as it will remove coherence.)
I also wish to time the performance of the algorithm. I don't wish to include the CPU-GPU transfer overhead (in either direction), but I'd like to measure the time spent executing the collision check as closely as possible. Could I confirm that the correct way to time the algorithm is to time the call to calculateOverlappingPairs?
Thanks.
Using GPU SAP in Bullet 3
-
- Posts: 18
- Joined: Thu Sep 04, 2008 10:25 am
Re: Using GPU SAP in Bullet 3
I've now found a way to update object positions. I'm calling
after each collision check in order to reset the AABB data. The data are then added back to the arrays using createProxy in the next iteration.
However, the performance of calculateOverlappingPairs seems poor. In general, the algorithm seems to be about 1/100th of the speed of Bullet 2's software DBVT across a large variety of object quantities (up to 32000). I would have hoped that the GPU would give better performance than the software. Is there some reason why I might be seeing such poor performance? Are the commands above wiping the coherence data?
Code: Select all
m_allAabbsCPU.resize(0);
m_smallAabbsCPU.resize(0);
m_largeAabbsCPU.resize(0);
However, the performance of calculateOverlappingPairs seems poor. In general, the algorithm seems to be about 1/100th of the speed of Bullet 2's software DBVT across a large variety of object quantities (up to 32000). I would have hoped that the GPU would give better performance than the software. Is there some reason why I might be seeing such poor performance? Are the commands above wiping the coherence data?