GPU Physics fun: rigid body, fluids and 1.000.000 particles

Please don't post Bullet support questions here, use the above forums instead.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA
Contact:

GPU Physics fun: rigid body, fluids and 1.000.000 particles

Post by Erwin Coumans »

[update February 2008: 1 million particles using CUDA on 2 NVidia SLI cards.]

Another proof that GPU has some great potential for rigid body dynamics and fluids and other kind of simulation:

http://www.iii.u-tokyo.ac.jp/~takahiroharada/

Early numbers for the NVidia 8800:
Rigid Body : 130 fps (5,000 chess pieces)
SPH : 70 fps (15,000 particles)
Local cached version of the Quicktime 7 movies:

5000 chess pieces, GPU Rigidbodies (31 Mb)
http://www.continuousphysics.com/ftp/pu ... idBody.mov
15000 particles on a trianglemesh, GPU Rigidbodies (21 Mb)
http://www.continuousphysics.com/ftp/pu ... os/dem.mov
SPH fluids (16 Mb)
http://www.continuousphysics.com/ftp/pu ... sphDam.mov

When I visited them at Tokyo University this summer, we discussed their work that was also on display at SIGGRAPH 2006:
Rigid body simulation using a Particle Method:
http://mps.q.t.u-tokyo.ac.jp/~tanaka/
Real-time Solid Voxelization using Graphics Hardware
http://mps.q.t.u-tokyo.ac.jp/~harada/

Erwin
Last edited by Erwin Coumans on Fri Jan 26, 2007 6:08 pm, edited 2 times in total.
User avatar
Dragonlord
Posts: 198
Joined: Mon Sep 04, 2006 5:31 pm
Location: Switzerland
Contact:

Post by Dragonlord »

That Particle Method is some serious good stuff. Paired with GPU to brute-force the collision calculations this looks very interesting.

8)
coderchris
Posts: 49
Joined: Fri Aug 18, 2006 11:50 pm

Post by coderchris »

That is quite impressive. I can see how the GPU would be able to make all those calculations for the narrow phase collision detection and dynamics, I just dont see the GPU being able to do the broadphase, nor do I see the CPU doing broadphase for the 780,000 particles in the DEM movie (using techniques im aware of).

Theres no way that broadphase is skipped altogether because even a GPU wouldnt be able to check 780,000 * 780,000 pairs. I also dont believe that broadphase is at all parallel and wouldnt be practical on the GPU. However, even on a fast CPU, I dont see how current broadphase methods could find pairs for all 780,000 particles. Even after calculating all pairs, one would have to upload all those pairs (definately more than 100,000 pairs) to the GPU which would take quite a bit of time. Any thoughts as to how he might be doing this?
User avatar
Dragonlord
Posts: 198
Joined: Mon Sep 04, 2006 5:31 pm
Location: Switzerland
Contact:

Post by Dragonlord »

I would suppose the particle soup is divided into smaller clusters befitted with an AABB for example. The broadphase would single out those clusters in need of collision detection and send the possible pairs down to the GPU. With a 512x512 texture you can already do 262'144 sphere-sphere tests in one go. One other way I could imagine is making a texture for each object with all the spheres in it relative to the CMP. Then testing two objects is setting two textures and two matrices and running the shader. Just my thoughts about this one.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA
Contact:

Post by Erwin Coumans »

The entire simulation and collision detection is performed on GPU, including broadphase. I am in touch with the author, and I will let you know once more details about his approach will become available. I guess he can't expose all details yet, because it might be published in some paper:
Harada-san wrote:
Erwin wrote: Could you tell me how you do the 'broadphase': how can you determine which spheres are colliding against which other spheres on the GPU??
For n spheres, this can easily become a huge n*n problem, unless you do something clever like sweep and prune or hash tables.?
It is the key of my program.?I do not solve a n*n problem.?
I will tell you some day, not now.
coderchris
Posts: 49
Joined: Fri Aug 18, 2006 11:50 pm

Post by coderchris »

Ah, so the broadphase is done on the GPU...will just have to wait untill he reviels the details. It must be very clever; Iv spent alot of time thinking about how to do that and I couldnt figure it out.

I just got an idea of how he might be doing it; I forgot that he was using an 8800 which is direct3d 10 compatable which means he can use the 'geometry shader'. I expect that he is using the geometry shader in some way to register new pairs on the fly without having to send any of this information back to the CPU.

If this is the case then it may not yet be practical since most people dont have direct3d 10 cards yet but its still very impressive
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA
Contact:

Post by Erwin Coumans »

coderchris wrote:Ah, so the broadphase is done on the GPU...will just have to wait untill he reviels the details. It must be very clever; Iv spent alot of time thinking about how to do that and I couldnt figure it out.
If this is the case then it may not yet be practical since most people dont have direct3d 10 cards yet but its still very impressive
Apparently it works on older graphics cards too:
Harada-san wrote:
Erwin wrote: Does your GPU programs only run on GeForce 8800, or also on older models like GeForce 6800??
It can work on GeForce7XXX (it must work on 6XXX too, although I've not tried yet).
We will have to patiently wait for more details.
coderchris
Posts: 49
Joined: Fri Aug 18, 2006 11:50 pm

Post by coderchris »

Apparently it works on older graphics cards too
Thats good to know; A little speculation untill he writes his paper:
I did a little research and there are some very fast sorting algorithms avaliable for the GPU. Since there is so much coherancy in a physics simulation, a simple sweep and prune broadphase done on the GPU might actually be very effective
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA
Contact:

Post by Erwin Coumans »

Takahiro mailed me he uploaded an executable GPU rigid body demo, that runs on NVidia 7XXX, 8XXX cards:

http://www.iii.u-tokyo.ac.jp/~takahiroh ... dBody.html

No more details about approach/implementation yet, but they will follow at some stage.

Enjoy,
Erwin
stbuzer
Posts: 23
Joined: Fri Dec 08, 2006 10:16 am

Post by stbuzer »

Amazing demo!
It runs fine on my GF6600 with about 30 fps on average.
12 fps with 2500 concave chess pieces.
(Pity, that there is only a single scene)
stbuzer
Posts: 23
Joined: Fri Dec 08, 2006 10:16 am

Post by stbuzer »

Looks like that collision detections uses the method similar to pmap which uses PhysX.
coderchris
Posts: 49
Joined: Fri Aug 18, 2006 11:50 pm

Post by coderchris »

Very impressive, and yea it looks like all the objects are composed of a bunch of spheres/particles

The big question is how is he doing the broadphase :?

Hopefully he releases some details to his approach soon :wink:
KenB
Posts: 49
Joined: Sun Dec 03, 2006 12:40 am

More info on methods?

Post by KenB »

Is there any more information on the methods used? What type of rigid body solver is it and does it ever stop jittering? Average number of contacts per body?

What is the average number of neighbours in the SPH simulation.
nctusdk
Posts: 1
Joined: Sat Mar 03, 2007 4:52 pm
Location: Taiwan

Post by nctusdk »

I am new to this thread. The particle method is just what I am looking for. Althrough there's some research dealing with massive amount of particles like UberFlow in 2004, it does not resolve multiple inter-collisions on a single particle and is even not capable of finding all collided particles. So I am wondering if this works can handle all paired collisions among all particles?

Also, I did some research on his code and it looks like he used some sorting first to put particles into so-called buckets and then resolved collisions on bucket-based. Not hard to image an efficient GPU spatial partitioning method is used to do the broadphase. But the interesting point is he use a vertex program to do this in stead of a fragment program. The collision resolver is done in another fragment program (also, there're loops inside collision resolver, maybe it does handle multiple collisions on a single particle). So far I still cannot figure out all the detail since it's all Cg-compiled ARB shader code and I am not good at reverse engineering... :(

However, as all particles are of same size, I think it's praticle to use simple octree or quadtree on modern GPU to prune the possible pair of collision (this is also what I am working on my research).

Let's wait for his impressive paper.
KenB
Posts: 49
Joined: Sun Dec 03, 2006 12:40 am

Rouhly as on CPU

Post by KenB »

15.000 particles at 70Hz is however roughly the performance you can squeeze out of a CPU - in fact somewhat less.
He doesn't say anything about the smoothing radius or number of neighbours though, so it is hard to compare.
Cell can be taken to at least 3-4 times that performance.
Post Reply