Collision detection on GPU.

SteveBaker · Post by **SteveBaker** » Wed Sep 27, 2006 10:54 pm

(The GPU physics thread was getting kinda long...)

I have a good way to do collisions worked out - it's not easy to understand - but it's going to work and it'll be fairly efficient. The idea is to go N-times around the 'main loop' where 'N' is the maximum number of collisions that any object undergoes. The inner loop feeds one polygon to the hardware for every object on the first main loop iteration once more for every object that collides two or more times, again for every object that collides three times or more...and so on.

So in a brick wall case where every brick touches six others, every brick goes around the innermost loop six times. That's probably close to a worst case though.

The result is N textures - each location of which is either a collision pair or a zero. These textures are going to be pretty sparse - mostly zeroes.

My question is whether I can resolve each collision-pair in isolation - turning it into resultant forces then summing the forces from all of the collisions at the end?

Here is the collision algorithm:
(Every object has a unique ID which maps to a location in a texture map and is representable as a small integer - there is no object with an ID of zero).

------------------------------------------------------------------------------------------------------
Initialisation

1. Prepare one 'probe' polygon for every object in the scene in object ID order.
2. Mark all probe polygons as 'needed'.
3. Prepare a collision map - one location in the collision map for every object in the scene. Call this the 'source collision map' (SCM)
4. Fill the SCM entries with a number that's higher than the highest object ID.

Main loop

1. Set up an empty 'destination' collision map (DCM) to render into as an FBO.
2. Clear DCM to all zeroes using glClear.
3. Send the SCM to the frag shader as an input parameter.
4. Send all probe polygons that are marked as 'needed' down the pipe in low-to-high numbered order.
5. For each polygon - ideally, fetch the 'position', 'size', 'velocity' and any other data for this object it represents from vertex textures and pass them down to the fragment shader - but remembering that we can only have four per-vertex textures in nVidia-land and none at all elsewhere, some or all of these parameters may have to come from the CPU in vertex data or in VBO's.
6. At each fragment:
* If the probe polygon identifier is greater than or equal to the SCM at this pixel 'discard' this fragment.
* If the probe polygons identifier is equal to the identifier of the object at this pixel 'discard' this fragment.
* If the probe polygon's object doesn't collide with the object at this pixel 'discard' this fragment.
* If we get this far, write the polygons object identifier to the DCM.
7. Using the occlusion query feature, mark the probe polygons that caused DCM writes (which represent newly recorded collisions) as 'needed' and those that did not as not-needed.
8. If all are marked as not-needed then you have accumulated all of the forces and you can END.
9. After all probe polygons have been sent once, run the collision physics on the DCM, accumulating forces as we go.
10. Swap the DCM and SCM and go around again.

SteveBaker · Post by **SteveBaker** » Sun Oct 01, 2006 3:47 am

I have this algorithm working in the SVN trunk.

The results of a collision happening are not truly correct - but it looks OK at first sight!

The system is doing this is as follows (and of course, runs entirely in the GPU):

* For each object:
*** Submit it to the collision detector and count how many things it collided with (doing all of the tests in parallel of course).
*** Store only ONE of these collisions against a particular target in the 'collision map'.

* Once every object has been submitted once, we have a "collision map" containing at most one collision pair for every object and recorded who it was who collided.

* Pass this collision map on to something that calculates the forces resulting from that set of collisions and adds that to the net force exerted on each object.

* Because one object could potentially collide with SEVERAL others in a single iteration, it is necessary to repeat this process, eliminating objects that didn't collide at all and ignoring collision pairs that we already processed so that the number of passes each object makes through the algorithm is one plus the number of times it collided.

Right now, the 'force application' part just imparts a force away from the center of the other sphere with a magnitude equal to the depth of penetration.

I also turned off gravity and reduced the numbers of objects so we could see what's going on.

The present implementation is pretty slow (relative to what I want to do - but at least comparable to what we can do in the CPU) for several reasons:

1) I'm being very careful - so there is lots of unnecessary checks and stuff.

2) There is an inherent performance problem with the above algorithm: The first step requires that I find out how many collision pairs an object generated so that I know whether I need to pass it through a second, third, fourth time in order to extract multiple collisions. However, OpenGL requires that you request that count, draw some number of polygons, then extract the result. Because each collision query results in drawing just one polygon, I'm have to do one request/draw/check cycle for every polygon. This is horribly inefficient. What I really need to do is to turn on the check, test maybe a thousand objects by drawing a thousand polygons - THEN read back the number of collisions. But that means I don't know whether an individual object caused a collision or not, only that one out of that 'batch' of a thousand did. That will cause extra redundant testing of up to 999 objects in a second pass if just one out of the 'batch' causes a collision. However, if collisions are either fairly rare or if most objects are colliding about the same number of times (eg bricks in a brick wall) - then there is no wastage and the speedup will be enormous.

So I have more work to do to make this really blindingly fast - but it's already clear that it's pretty good.

...and this is with no 'broadphase' - this is just mindlessly checking every single object against every other.