It all depends on the setup.
Usually the broadphase takes out most of the pairs, so 1024 sphere will never be a n^2 problem.
You can modify the Bullet CcdPhysicsDemo, to increase the stack of objects to 1024, change this line (release mode)
const int gNumObjects = 120;
, and change the shape in btSphereShape, and enable profiling.
Profiling can be enabled by editing Bullet/src/LinearMath/btQuickprof.h,
uncomment the line at the top //#define USE_QUICKPROF 1
Then recompile the sources and press 'p'. It will show profiling on-screen, and ouput the data to a comma separated file. Perhaps I should enable profiling by default, it doesn't take much cpu cycles probably (and its output still needs to be enabled by the user anyway).
By default, that CcdPhysicsDemo (only that demo) registers a very basic SphereSphere collision algorithm (see REGISTER_CUSTOM_COLLISION_ALGORITHM):
http://svn.sourceforge.net/viewvc/bulle ... iew=markup
Thanks,
Erwin
My exotic approach to GPU collision detection
-
- Posts: 59
- Joined: Thu Aug 31, 2006 11:51 am
But broadphase is n^2 (exact (n^2 - n)/2). No?Erwin Coumans wrote:It all depends on the setup.
Usually the broadphase takes out most of the pairs, so 1024 sphere will never be a n^2 problem.
And it is not faster then sphere-sphere. Right?
So, it is only a broadphase will eat 16 ms for 1024 objects....
-
- Site Admin
- Posts: 4221
- Joined: Sun Jun 26, 2005 6:43 pm
- Location: California, USA
No,Jack wrote:But broadphase is n^2 (exact (n^2 - n)/2). No?Erwin Coumans wrote:It all depends on the setup.
Usually the broadphase takes out most of the pairs, so 1024 sphere will never be a n^2 problem.
And it is not faster then sphere-sphere. Right?
So, it is only a broadphase will eat 16 ms for 1024 objects....
The CPU sweep and prune broadphase that Bullet uses is much cheaper. With typical motion (framecoherence) the cost roughly depends on the number of moving objects. For n objects, the cost is around O(n), but with small amount of moving objects it can be cheaper like O(m).
So it makes sense to run the broadphase on the CPU, and then upload a CollisionMap texture to GPU.
Erwin
-
- Posts: 59
- Joined: Thu Aug 31, 2006 11:51 am
I took a look at your code. I am not sure, but I suspect that you send collision pairs to GPU one by one.
How many times above code is executed per frame?
Code: Select all
glBeginQueryARB ( GL_SAMPLES_PASSED_ARB, query ) ;
glMultiDrawArraysEXT ( GL_QUADS, (GLint*)& collstart, (GLint*)& colllength, 1 ) ;
numPasses++ ;
glEndQueryARB ( GL_SAMPLES_PASSED_ARB ) ;
-
- Posts: 127
- Joined: Sun Aug 13, 2006 4:41 pm
- Location: Cedar Hill, Texas
Typically once or twice only per frame - irrespective of the number of objects.Jack wrote:I took a look at your code. I am not sure, but I suspect that you send collision pairs to GPU one by one.
How many times above code is executed per frame?Code: Select all
glBeginQueryARB ( GL_SAMPLES_PASSED_ARB, query ) ; glMultiDrawArraysEXT ( GL_QUADS, (GLint*)& collstart, (GLint*)& colllength, 1 ) ; numPasses++ ; glEndQueryARB ( GL_SAMPLES_PASSED_ARB ) ;
The deal is that drawing one polygon compares one object against ALL of the other objects using the frag shader - so there is a lot of parallelism. The glMultiDrawArrays call sends one polygon for every object (in a VBO for speed) - so in this formulation (which is just for the demo) it's an N^2 calculation - but performed as N polygons each of N pixels. The amount of parallelism you have is determined by the number of fragment shaders in your GPU.
The snag is that there is potentially insufficient storage space in the shader's output buffer to store ALL of the collision pairs that one 'target' object might undergo - so I count the number of pixels actually drawn (which equates to the number of collision pairs generated) and loop around generating different ones each pass until no more are drawn. The number of iterations of that loop depends on the worst case number of collisions that one object might undergo divided by the number of collisions you can store in one pass (which is 1 right now - but can be more in future).
This isn't great - but it's a reasonable 'proof of concept' - which is about where we are with this little experiment right now.
I tried measuring the number of collisions produced by each object in turn - in order that second and subsequent passes would require me only to re-send those objects that actually collided with something on the previous iteration - but the overhead of counting the pixel hits for each polygon drawn using the 'query' mechanism actually took longer than just drawing all of the polygons each time and testing whether any of them generated a collision...so simple is better as is so often the case with GPU work.
Probably the best solution of all will be to batch them into groups of a few hundred polygons - but we have a pretty good idea of what this can do now and that level of final polish will only be needed if/when we ever make a GPU-based Bullet that uses GPU collision detection.
Right now, we are really just seeing what can be done and where the big speedups are.
-
- Posts: 59
- Joined: Thu Aug 31, 2006 11:51 am
1) I see now. You execute it N times. Each execution may be repeated several times (bet it is 1 in most cases). So you make N GPU calls. It is not good. You lose on this! But until Shader Model 4.0 you can do nothing, I think.SteveBaker wrote:Typically once or twice only per frame - irrespective of the number of objects.Jack wrote:I took a look at your code. I am not sure, but I suspect that you send collision pairs to GPU one by one.
How many times above code is executed per frame?Code: Select all
glBeginQueryARB ( GL_SAMPLES_PASSED_ARB, query ) ; glMultiDrawArraysEXT ( GL_QUADS, (GLint*)& collstart, (GLint*)& colllength, 1 ) ; numPasses++ ; glEndQueryARB ( GL_SAMPLES_PASSED_ARB ) ;
The deal is that drawing one polygon compares one object against ALL of the other objects using the frag shader - so there is a lot of parallelism. The glMultiDrawArrays call sends one polygon for every object (in a VBO for speed) - so in this formulation (which is just for the demo) it's an N^2 calculation - but performed as N polygons each of N pixels. The amount of parallelism you have is determined by the number of fragment shaders in your GPU.
The snag is that there is potentially insufficient storage space in the shader's output buffer to store ALL of the collision pairs that one 'target' object might undergo - so I count the number of pixels actually drawn (which equates to the number of collision pairs generated) and loop around generating different ones each pass until no more are drawn. The number of iterations of that loop depends on the worst case number of collisions that one object might undergo divided by the number of collisions you can store in one pass (which is 1 right now - but can be more in future).
This isn't great - but it's a reasonable 'proof of concept' - which is about where we are with this little experiment right now.
I tried measuring the number of collisions produced by each object in turn - in order that second and subsequent passes would require me only to re-send those objects that actually collided with something on the previous iteration - but the overhead of counting the pixel hits for each polygon drawn using the 'query' mechanism actually took longer than just drawing all of the polygons each time and testing whether any of them generated a collision...so simple is better as is so often the case with GPU work.
Probably the best solution of all will be to batch them into groups of a few hundred polygons - but we have a pretty good idea of what this can do now and that level of final polish will only be needed if/when we ever make a GPU-based Bullet that uses GPU collision detection.
Right now, we are really just seeing what can be done and where the big speedups are.
2) Why N^2. You have (N^2-N)/2 pairs. You need something like:
for(i=0; i<TOTAL; i++)
for(j=i+1; j<TOTAL; j++)
{
}
This hepls you to get rid of redundant "IFs" in the fragment shader.
3) "N polygons each of N pixels" probably is not good. Vertex shaders can not fully load fragment shaders in this case.
4) You "polygons" are pionts (D3DPT_POINTLIST). Right?
5) Do not use texture fetch in Vertex Shader. Try to embed all the data you needed in Vertex Declaration.
I think there is no acceptable solution Shader Model 4.0. We need ability to generete variable length output streams (It will be allowed in Geometry Shader). Then collision detection can be done in ONE DrawPrimitive call.
-
- Posts: 127
- Joined: Sun Aug 13, 2006 4:41 pm
- Location: Cedar Hill, Texas
I don't know what you mean by "GPU call". For 1024 objects, colliding against 1024 objects, each frame I do ONE call to glMultiDrawArrays which draws 1024 polygons (which are stored off on the graphics card as VBO's). At the end, I read back the number of collision pairs it generated (using the occlusion query). If there were collisions, I have to run that glMultiDrawArrays again.Jack wrote: 1) I see now. You execute it N times. Each execution may be repeated several times (bet it is 1 in most cases). So you make N GPU calls.
[/quote]
It is not good. You lose on this! But until Shader Model 4.0 you can do nothing, I think.
Well, it's better - but you still need to do more than one pass in pathalogical cases where too many objects can collide with one object. It's also possible that this multipass approach might be faster because I don't think we can selectively kill() writes to individual buffers in Shader Model 4. (I could be wrong about that - but it's an ikky problem).The problem is that with only one output FBO, I can't write out all of the collision pairs in one pass because each 'target' object might be hit multiple times. Hence the multiple calls to glMultiDrawArrays...but there won't usually be more than a couple of those per frame for non-pathalogical cases. Shader Model 4 permits me to write to more FBO's - but still there may be pathalogical cases where I can't store *ALL* of the collision pairs. So the multipass approach is still needed - it's just going to consume fewer passes in the future. (There are still some complicated issues associated with even that - but let's not go there now).
For subsequent processing, it'll be easier if I have A-collided-with-B as well as B-collided-with-A...so what I'm doing is still OK. We gotta see the big picture here - it's not just about collision detection!2) Why N^2. You have (N^2-N)/2 pairs. You need something like:
for(i=0; i<TOTAL; i++)
for(j=i+1; j<TOTAL; j++)
{
}
This hepls you to get rid of redundant "IFs" in the fragment shader.
Eh? The vertex shader is just about idle. Unless N is less than about 64, the vertex shader can keep up.3) "N polygons each of N pixels" probably is not good. Vertex shaders can not fully load fragment shaders in this case.I'm using OpenGL - but no, the primitive is GL_QUADS....oh - that's why you think I'm doing it all wrong. Sheesh! Give me more credit than that!4) You "polygons" are pionts (D3DPT_POINTLIST). Right?I don't understand what you are saying here...but don't sweat it - I have this right.5) Do not use texture fetch in Vertex Shader. Try to embed all the data you needed in Vertex Declaration.I think there is no acceptable solution Shader Model 4.0. We need ability to generete variable length output streams (It will be allowed in Geometry Shader). Then collision detection can be done in ONE DrawPrimitive call.
-
- Posts: 59
- Joined: Thu Aug 31, 2006 11:51 am
1) You Draw N polygons. Each polygon consists of N pixels. Right?
2) Your render target texture contains only N pixels. Or N*N?
3) If N*N then I do not understand why you need second pass...
For example, in my approach I draw only ONE triangle. Triangle cathetus are N-1 pixels. Area: (N^2-N)/2. This triangle contains all collisions. And I do not use vertex shader at all.
2) Your render target texture contains only N pixels. Or N*N?
3) If N*N then I do not understand why you need second pass...
For example, in my approach I draw only ONE triangle. Triangle cathetus are N-1 pixels. Area: (N^2-N)/2. This triangle contains all collisions. And I do not use vertex shader at all.
-
- Posts: 127
- Joined: Sun Aug 13, 2006 4:41 pm
- Location: Cedar Hill, Texas