My exotic approach to GPU collision detection

Jack · Post by **Jack** » Sat Sep 30, 2006 12:51 pm

Here is my thoughts and some practical results in GPU collision detection:
http://www.continuousphysics.com/ftp/pu ... ection.pdf

Probably it can be useful and for CPU collision detection. I like the idea to unify all kind of shapes in ONE entity (triangle independent)...

SteveBaker · Post by **SteveBaker** » Sat Sep 30, 2006 1:19 pm

Can you find an easier way for us to download this? That site you've got it hosted on is a pain in the ass to use! It wants me to sign up for who-knows-what just to see your file!

If you'd like to email it to me at <sjbaker1@airmail.net> I'll put it up on my web site for people to get at.

Jack · Post by **Jack** » Sat Sep 30, 2006 1:29 pm

There is "FREE" button at the bottom of that page

No registration reuired if you click "FREE". I'll also send it to you via email...

Erwin Coumans · Post by **Erwin Coumans** » Sat Sep 30, 2006 3:15 pm

Thanks a lot for sharing the GPU ideas!

I uploaded the file onto the Bullet website, next time just mail the file to me (erwin _at_ continuousphysics.com ) and I'll upload it (or mail Steve Baker, and he can put it on his website).

http://www.continuousphysics.com/ftp/pu ... ection.pdf

SteveBaker · Post by **SteveBaker** » Sat Sep 30, 2006 4:17 pm

Well, I guess the first issue is that you need a 3D texture with four components at whatever resolution you need your objects to be collided against. If you wanted (say) one part in 128 precision for your surface you'd need a 128x128x128x4x2 = 16Mbytes texture. You won't get many of those from a typical 256Mbyte graphics card (especially not if your graphics need lots of texture too).

But I guess if you stuck with one of these maps for all spheres, another for all cubes, another for all cylinders - you might only need maybe half a dozen of these maps...but that's still close to 40% of your texture map memory.

For more complex items (eg a chair or a table or something), a 128x128x128 map is nowhere near enough. Going to the next size up (256x256x256) eats 8 times as much texture!!! So you could definitely forget the idea that you could have a room with one map for a chair, another for the table, another for the room itself, another for...you get the problem!

So *maybe* you could use this for standardized shapes such as spheres and cubes and such - but for those simple mathematical shapes, why not just describe the shape mathematically inside the shader and save all of that memory? For spheres and such the math is easy. The hardest is the cube - but your algorithm requires that you find the closest point on the texture - which is the exact same calculation. So this is only really helpful for shapes that are hard to describe with math...and I think that in a practical setting, the memory requirements kill you.

I wonder though whether this idea can be salvaged.

Suppose we use a 'cubeMap' - which is (effectively) six maps packed into one - each one being (notionally) one face of a cube. So you'd have the distance from any point on the outside of the bounding box of your irregular-shaped object to the actual surface. I'm not sure whether that's enough - but it's orders of magnitude less texture map memory!

The other thing that's problematic is this. If your shapes are lots of arbitary things then the shader that does the comparisons has to have access to all of those textures. But there is a limit of only 16 maps per shader - so in one pass of the algorithm, you can't compare one object against more than 15 others in one pass. Again, that's not TOO much of a problem if you use standardized shapes - but chairs and tables are out.

So my opionion is that what you suggest isn't useful for encoding complicated shapes (because you'll run out of texture memory in any 'real' example) - and for simple shapes you are better off using math than a 3D texture.

Think of your 3D texture as a lookup table that implements a mathematical function that expresses the distance and direction from any point in the volume to the nearest point on the surface. When would you want to use a lookup table and when would you want to use an equation? For a sphere, it's OBVIOUS that an equation would be faster (and much, much less space) than a texture lookup. For a chair, it's equally obvious that math ain't going to cut it and a lookup table is the way to go.

So the argument hinges on whether you can store enough complex shapes to make doing this worth-while - or whether you are better off representing a chair as a bunch of spheres, cuboids and cylinders? This is a straight trade between speed and space. A single 3D texture lookup for a chair is going to be faster than (say) six cubes...but we don't have enough texture memory to do the entire game world in 3D textures.

Hence - I'll be concentrating on doing spheres, cylinders and cubes...with triangle meshes remaining a major problem.

Jack · Post by **Jack** » Sat Sep 30, 2006 5:03 pm

SteveBaker wrote:Well, I guess the first issue is that you need a 3D texture with four components at whatever resolution you need your objects to be collided against. If you wanted (say) one part in 128 precision for your surface you'd need a 128x128x128x4x2 = 16Mbytes texture. You won't get many of those from a typical 256Mbyte graphics card (especially not if your graphics need lots of texture too).

But I guess if you stuck with one of these maps for all spheres, another for all cubes, another for all cylinders - you might only need maybe half a dozen of these maps...but that's still close to 40% of your texture map memory.

For more complex items (eg a chair or a table or something), a 128x128x128 map is nowhere near enough. Going to the next size up (256x256x256) eats 8 times as much texture!!! So you could definitely forget the idea that you could have a room with one map for a chair, another for the table, another for the room itself, another for...you get the problem!

So *maybe* you could use this for standardized shapes such as spheres and cubes and such - but for those simple mathematical shapes, why not just describe the shape mathematically inside the shader and save all of that memory? For spheres and such the math is easy. The hardest is the cube - but your algorithm requires that you find the closest point on the texture - which is the exact same calculation. So this is only really helpful for shapes that are hard to describe with math...and I think that in a practical setting, the memory requirements kill you.

I wonder though whether this idea can be salvaged.

Suppose we use a 'cubeMap' - which is (effectively) six maps packed into one - each one being (notionally) one face of a cube. So you'd have the distance from any point on the outside of the bounding box of your irregular-shaped object to the actual surface. I'm not sure whether that's enough - but it's orders of magnitude less texture map memory!

The other thing that's problematic is this. If your shapes are lots of arbitary things then the shader that does the comparisons has to have access to all of those textures. But there is a limit of only 16 maps per shader - so in one pass of the algorithm, you can't compare one object against more than 15 others in one pass. Again, that's not TOO much of a problem if you use standardized shapes - but chairs and tables are out.

So my opionion is that what you suggest isn't useful for encoding complicated shapes (because you'll run out of texture memory in any 'real' example) - and for simple shapes you are better off using math than a 3D texture.

Think of your 3D texture as a lookup table that implements a mathematical function that expresses the distance and direction from any point in the volume to the nearest point on the surface. When would you want to use a lookup table and when would you want to use an equation? For a sphere, it's OBVIOUS that an equation would be faster (and much, much less space) than a texture lookup. For a chair, it's equally obvious that math ain't going to cut it and a lookup table is the way to go.

So the argument hinges on whether you can store enough complex shapes to make doing this worth-while - or whether you are better off representing a chair as a bunch of spheres, cuboids and cylinders? This is a straight trade between speed and space. A single 3D texture lookup for a chair is going to be faster than (say) six cubes...but we don't have enough texture memory to do the entire game world in 3D textures.

Hence - I'll be concentrating on doing spheres, cylinders and cubes...with triangle meshes remaining a major problem.

1) 128x128x128 it is of course too much

I played with 32x32x32 (64x64x64)
2) No. The idea is not to replace simple shapes (spheres, cubes,...). ANY SHAPE.
3) Let's not speak about chair at this time. Chair is concave object. Let's speak about convex ones....
4) I do not understand what you mean under six in one...
5) For DirectX 9 we need to put all shapes into one BIG volume texture. In DirectX 10 (as I understand) you will be able to access any number of texture buffers.
6) "For a sphere, it's OBVIOUS that an equation would be faster". Are you sure? If I show you the shader, you will see how EASY and FAST lookups are.
7) Memory. Yes. It is a problem. And I mentioned that we need variable dense textures. It is to think how it could be implemented in GPU (maybe hardware manufactures are to add something in their hardware). But there is no problem to do it in CPU. Variable dense textures will require as much memory as body surface size (not volume size). Variable dense volume textures will grow as square (not cube). And it becomes ACCEPTABLE.
8. And do not forget about cloth

SteveBaker · Post by **SteveBaker** » Sat Sep 30, 2006 6:36 pm

Jack wrote: 1) 128x128x128 it is of course too much :shock: I played with 32x32x32 (64x64x64)

At that resolution, pretty much everything is a 'blob'! Why bother?

2) No. The idea is not to replace simple shapes (spheres, cubes,...). ANY SHAPE.

Then I don't think you'll have enough texture map memory for anything useful.

3) Let's not speak about chair at this time. Chair is concave object. Let's speak about convex ones....

Yes - of course. Sorry - I latched onto one example of something moderately complex. Replace all instances of 'chair' with something complicated and convex.

4) I do not understand what you mean under six in one...

A cube-map is an arrangement in which six separate square textures are packed into one special map that is accessed as if the six pieces were one texture that's wrapped onto the faces of a cube. I believe it would be nearly as good as a full volume texture - but at vastly less storage.

5) For DirectX 9 we need to put all shapes into one BIG volume texture. In DirectX 10 (as I understand) you will be able to access any number of texture buffers.

OK - I guess so.

6) "For a sphere, it's OBVIOUS that an equation would be faster". Are you sure? If I show you the shader, you will see how EASY and FAST lookups are.

Well, first you have to find the nearest point on the texture to your last collision point - right? If you use a continuous function you don't have to do that. Secondly, texture accesses in the GPU are only reasonably fast if they have good cache coherency - yours most certainly won't. Randomly accessed texture performance is terrible. But with a sphere, you can calculate the distance from any point in space to the nearest point on the sphere by computing the distance from the center of the sphere (pythagoras) and subtracting the radius. The GPU can do that in a couple of instructions - which is liekly to be several times faster than a cache hit and maybe 20 times faster than a cache miss.
Yeah - it'll be a LOT faster.

7) Memory. Yes. It is a problem. And I mentioned that we need variable dense textures. It is to think how it could be implemented in GPU (maybe hardware manufactures are to add something in their hardware).

OK - I'm prepared to look into algorithms that work only on the very latest graphics hardware - but I'm certainly not going to work on anything where I have to hope the H/W manufacturers will change their designs to help! But what you are calling for is using an INSANE amount of memory - it's not even close to practicable for a real game level. Just stop and think about this for a while!

But there is no problem to do it in CPU. Variable dense textures will require as much memory as body surface size (not volume size). Variable dense volume textures will grow as square (not cube). And it becomes ACCEPTABLE.
8. And do not forget about cloth :wink:

I *AM* thinking about cloth. I'm thinking about how non-convex it is. I'm thinking about how incredibly thin it is - and therefore about what resolution of texture I'd need to represent it. I'm thinking about how much effort it'll take to recompute your 3D distance texture as the cloth animates...I'm thinking that it could take longer to do that than to compute the collisions. ....So I'm thinking that the one thing your algortihm will REALLY suck at is cloth!

If you think that's a good application for this then there must still be something I don't understand about what you are proposing.

Jack · Post by **Jack** » Sun Oct 01, 2006 6:36 am

SteveBaker wrote:
I *AM* thinking about cloth. I'm thinking about how non-convex it is. I'm thinking about how incredibly thin it is - and therefore about what resolution of texture I'd need to represent it. I'm thinking about how much effort it'll take to recompute your 3D distance texture as the cloth animates...I'm thinking that it could take longer to do that than to compute the collisions. ....So I'm thinking that the one thing your algortihm will REALLY suck at is cloth!

If you think that's a good application for this then there must still be something I don't understand about what you are proposing.

Under cloth I see set of points connected with springs. Cloth is NOT a volume texture. If you need to detect CONVEX-CLOTH collision, then it becomes very easy. Just a texture lookup for each cloth point. Is not it?

Here is shader (just to see what it is):

Code: Select all

void PSCollision(
	float4			vIndex			: TEXCOORD0,
	float2			vPointUV		: TEXCOORD1,
	out float4		oCP				: COLOR0,
	out float4		oCN				: COLOR1,
	uniform sampler	SamplerMatrix	: register(s0),
	uniform sampler	SamplerCollision: register(s1),
	uniform sampler	SamplerShape[2]	: register(s2))
{
	float3x4	mat1, mat2;
		
	const float4 Size1 = tex2D(SamplerMatrix, vIndex.xz);
	const float4 Size2 = tex2D(SamplerMatrix, vIndex.yz);
	
	{
		float4		vPoint = tex2D(SamplerCollision, vPointUV);	
		
		vIndex.z += vIndex.w;
		mat1[0] = tex2D(SamplerMatrix, /*vTex1*/vIndex.xz);
		mat2[0] = tex2D(SamplerMatrix, /*vTex2*/vIndex.yz);
		vIndex.z += vIndex.w;
		mat1[1] = tex2D(SamplerMatrix, /*vTex1*/vIndex.xz);
		mat2[1] = tex2D(SamplerMatrix, /*vTex2*/vIndex.yz);
		vIndex.z += vIndex.w;
		mat1[2] = tex2D(SamplerMatrix, /*vTex1*/vIndex.xz);
		mat2[2] = tex2D(SamplerMatrix, /*vTex2*/vIndex.yz);
		
		vPoint.w = 1;
		
		float3 vPoint1 = mul(mat1, vPoint);
		float3 vPoint2 = mul(mat2, vPoint);
		
		//-----------
		const float3 ss1 = clamp(vPoint1, 0, Size1);
		const float3 ss2 = clamp(vPoint2, 0, Size2);
		
		const float3 vP1 = vPoint1 - ss1;
		const float3 vP2 = vPoint2 - ss2;
		
		const float3 Coord1 = (ss1 / Size1);
		const float3 Coord2 = (ss2 / Size2); 
		
		float4 T1,T2;
		T1 = tex3D(SamplerShape[0], Coord1);
		T2 = tex3D(SamplerShape[0], Coord2);
						 
		vPoint1 = T1.xyz - vP1;
		vPoint2 = T2.xyz - vP2;
		//-----------
		
		vPoint1 = mul(vPoint1, (float3x3)mat1);
		vPoint2 = mul(vPoint2, (float3x3)mat2);
		
		vPoint.xyz += (vPoint1+vPoint2)/2;
		vPoint.w = T1.w + T2.w;
				
		oCP = vPoint;
		oCN.xyz = normalize(vPoint1-vPoint2);
		oCN.w = 1;
		
	}
		
}

Cube texture. It will not work correctly, but maybe there is something useful in that idea....I need to think...

SteveBaker · Post by **SteveBaker** » Sun Oct 01, 2006 1:05 pm

You should realise that the problem with using a cube-map is that there isn't a convenient way to pack lots of different shapes into a single cube-map. So my earlier concern about the number of textures you'd have to read in the shader would then be justified.

This code actually works? I'm a little surprised because the two 'clamp' calls don't do what I think you want them to do. If Size1 is (say) (1,1,1) and vPoint1 is (1,100,1), what that clamp will give you for ss1 is (1,1,1) - but what you want (if I understand your shader correctly - which I may not) is (0.01, 1, 0.01)...clamp doesn't intersect the vector with a box - it does a component-by-component clamp.

This is a fragment shader? Well, no - I don't think so because you are writing out two colours and I don't think you can do that from a frag shader. If it's a vertex shader then it has to run N^2 times - which means you have almost no parallelism and you'd be better off doing it in the CPU. My Cg is a little rusty though - maybe I don't understand something.

Jack · Post by **Jack** » Sun Oct 01, 2006 1:31 pm

SteveBaker wrote:You should realise that the problem with using a cube-map is that there isn't a convenient way to pack lots of different shapes into a single cube-map. So my earlier concern about the number of textures you'd have to read in the shader would then be justified.

This code actually works? I'm a little surprised because the two 'clamp' calls don't do what I think you want them to do. If Size1 is (say) (1,1,1) and vPoint1 is (1,100,1), what that clamp will give you for ss1 is (1,1,1) - but what you want (if I understand your shader correctly - which I may not) is (0.01, 1, 0.01)...clamp doesn't intersect the vector with a box - it does a component-by-component clamp.

This is a fragment shader? Well, no - I don't think so because you are writing out two colours and I don't think you can do that from a frag shader. If it's a vertex shader then it has to run N^2 times - which means you have almost no parallelism and you'd be better off doing it in the CPU. My Cg is a little rusty though - maybe I don't understand something.

1) Yes. But it is not the only problem. Cube texture is not enough. If collision point is inside the texture, then we need more data.
2) Yes. It works

It is real code. Before clamp it goes thru inverse transform. And some additional math is encoded in the mat1 and mat2 (some shifts as I remember).
I do not need ray intersection. Just projection on the texture plane. So clamp is OK.
3) Yes. It is fragment shader. Two colours -> it is MRT (Multiple Render Tergets). You can write up to 4 colors

By the way, idea with you cube texture might work OK for CUBES and SPHERES. And shader maybe even shorter. I think, for cube shape it is enough just 2x2 cube texture (very GPU friendly). For SPHERES probably something like 16x16 (32x32). So, CUBE-CUBE, CUBE-SPHERE, SPHERE-SPHERE can be done extremely fast. But it needs more investigations...

Jack · Post by **Jack** » Sun Oct 01, 2006 2:29 pm

More ideas.

If you are happy with CUBEs and SPHEREs. And do not want something else, then I think CUBE-CUBE, CUBE-SPHERE, SPHERE-SPHERE collisions can be done without proxy texture. And speed will be beyond any imagination (as fast as GPU fill rate

).

Do not try to calculate collision point directly (using some equation). CUBE-CUBE intersection is not a simple task. Mantain collision point for each pair and calculate only CP adjustment on each frame (this task is very simple -> leads to very short shader).

SteveBaker · Post by **SteveBaker** » Sun Oct 01, 2006 5:25 pm

So let's talk about simple shapes:

Sphere/Sphere:

My present code (in SVN - check it out!) is doing 'typical' sphere/sphere collision tests at the following speeds:

4 cubes: other = 0.25ms collisions = 0.75ms
16 cubes: other = 0.25ms collisions = 0.75ms
64 cubes: other = 0.25ms collisions = 0.84ms
256 cubes: other = 0.29ms collisions = 1.75ms
1024 cubes: other = 0.34ms collisions = 16.72ms
4096 cubes: other = 0.63ms collisions = 252.78ms
16384 cubes: other = 1.61ms collisions = 5517.420ms

So - for 1024 spheres against 1024 spheres in 16.72ms - that's a million tests - so we're looking at an average of around 17 nanoseconds per collision on a vanilla 6800 (not a 'GT' or an 'ULTRA' - those would be a lot faster). Thats about as fast as the results could possibly be written out to the 6800's RAM - there is no way to speed that up (except by buying a faster graphics card or doing fewer tests of course - but that's a 'broadphase' issue). Then, my results are as accurate as floating point - and it uses no texture memory for shape storage.

Cube/Cube:

Your approach requires that when CP is outside the texture volume, you figure out where on the outside of the texture map you hit. (I still don't think your 'clamp' is doing that right...but whatever) If you have to do that - then you could imagine that the cube we are testing against completely fills the texture - so all of the volume map represents the interior of the cube except for the outermost cells which are full of 0's. That being the case, you don't need the texture - you can just pretend it's all zeroes. So in the cube/cube case, there is no possible way that using a 3D texture can speed things up versus just using math....it's not possible.

Jack · Post by **Jack** » Sun Oct 01, 2006 7:01 pm

SteveBaker wrote:So let's talk about simple shapes:

Sphere/Sphere:

My present code (in SVN - check it out!) is doing 'typical' sphere/sphere collision tests at the following speeds:

4 cubes: other = 0.25ms collisions = 0.75ms
16 cubes: other = 0.25ms collisions = 0.75ms
64 cubes: other = 0.25ms collisions = 0.84ms
256 cubes: other = 0.29ms collisions = 1.75ms
1024 cubes: other = 0.34ms collisions = 16.72ms
4096 cubes: other = 0.63ms collisions = 252.78ms
16384 cubes: other = 1.61ms collisions = 5517.420ms

So - for 1024 spheres against 1024 spheres in 16.72ms - that's a million tests - so we're looking at an average of around 17 nanoseconds per collision on a vanilla 6800 (not a 'GT' or an 'ULTRA' - those would be a lot faster). Thats about as fast as the results could possibly be written out to the 6800's RAM - there is no way to speed that up (except by buying a faster graphics card or doing fewer tests of course - but that's a 'broadphase' issue). Then, my results are as accurate as floating point - and it uses no texture memory for shape storage.

Cube/Cube:

Your approach requires that when CP is outside the texture volume, you figure out where on the outside of the texture map you hit. (I still don't think your 'clamp' is doing that right...but whatever) If you have to do that - then you could imagine that the cube we are testing against completely fills the texture - so all of the volume map represents the interior of the cube except for the outermost cells which are full of 0's. That being the case, you don't need the texture - you can just pretend it's all zeroes. So in the cube/cube case, there is no possible way that using a 3D texture can speed things up versus just using math....it's not possible.

Simple shapes (Sphere, Cube) can be done without 3D texture. But using permanent CP can speed up it! I think.
16.72ms it is 60 fps. Believe me 6800 can fill much more then 60 million texels per second.
6800 Ultra:
Fill Rate 6.4 billion texels/sec
Memory Bandwidth 35.2 GB/sec

Fill Rate is 100 times greater then your 60 millions.
Your Bandwidth 1024*1024*16*60 = 960MB/sec

I realized that your card is not Ultra. But your card is not even 2 times worse, I think.
And your result is for the simplest case SPHERE-SPHERE. For CUBE-CUBE your FPS will be much below 60, I think.

Can you paste here fragment shader?....

Erwin Coumans · Post by **Erwin Coumans** » Sun Oct 01, 2006 7:11 pm

Jack wrote:
I realized that your card is not Ultra. But your card is not even 2 times worse, I think.
And your result is for the simplest case SPHERE-SPHERE. For CUBE-CUBE your FPS will be much below 60, I think.

Can you paste here fragment shader?....

Latest work-in-progress Bullet and GPUphysics is available in the sourceforge Subversion repository.

You can browse this online here:
http://svn.sourceforge.net/viewvc/bulle ... PUphysics/

You can also use a Subversion client (even graphical UI like http://tortoisesvn.org ) and get the sources. Point to
https://svn.sourceforge.net/svnroot/bullet/trunk
commandline this is:
svn co https://svn.sourceforge.net:/svnroot/bullet/trunk bullet

Hope this helps,
Erwin

Jack · Post by **Jack** » Sun Oct 01, 2006 7:17 pm

Did you try it on CPU? I wonder how many "ms" you will get for 1024 spheres against 1024 spheres on CPU? Do you have such statistics?

17 nanoseconds it is 34 clock cycles on 2GHz CPU. Not too much, but not too small. Probably SIMD optimized CPU program could beat your GPU result.....

What if we take Dual-Core CPU?

Real-Time Physics Simulation Forum

My exotic approach to GPU collision detection

My exotic approach to GPU collision detection

Re: My exotic approach to GPU collision detection