That is quite impressive. I can see how the GPU would be able to make all those calculations for the narrow phase collision detection and dynamics, I just dont see the GPU being able to do the broadphase, nor do I see the CPU doing broadphase for the 780,000 particles in the DEM movie (using techniques im aware of).
Theres no way that broadphase is skipped altogether because even a GPU wouldnt be able to check 780,000 * 780,000 pairs. I also dont believe that broadphase is at all parallel and wouldnt be practical on the GPU. However, even on a fast CPU, I dont see how current broadphase methods could find pairs for all 780,000 particles. Even after calculating all pairs, one would have to upload all those pairs (definately more than 100,000 pairs) to the GPU which would take quite a bit of time. Any thoughts as to how he might be doing this?
I would suppose the particle soup is divided into smaller clusters befitted with an AABB for example. The broadphase would single out those clusters in need of collision detection and send the possible pairs down to the GPU. With a 512x512 texture you can already do 262'144 sphere-sphere tests in one go. One other way I could imagine is making a texture for each object with all the spheres in it relative to the CMP. Then testing two objects is setting two textures and two matrices and running the shader. Just my thoughts about this one.
The entire simulation and collision detection is performed on GPU, including broadphase. I am in touch with the author, and I will let you know once more details about his approach will become available. I guess he can't expose all details yet, because it might be published in some paper:
Harada-san wrote:
Erwin wrote:
Could you tell me how you do the 'broadphase': how can you determine which spheres are colliding against which other spheres on the GPU??
For n spheres, this can easily become a huge n*n problem, unless you do something clever like sweep and prune or hash tables.?
It is the key of my program.?I do not solve a n*n problem.?
I will tell you some day, not now.
Ah, so the broadphase is done on the GPU...will just have to wait untill he reviels the details. It must be very clever; Iv spent alot of time thinking about how to do that and I couldnt figure it out.
I just got an idea of how he might be doing it; I forgot that he was using an 8800 which is direct3d 10 compatable which means he can use the 'geometry shader'. I expect that he is using the geometry shader in some way to register new pairs on the fly without having to send any of this information back to the CPU.
If this is the case then it may not yet be practical since most people dont have direct3d 10 cards yet but its still very impressive
coderchris wrote:Ah, so the broadphase is done on the GPU...will just have to wait untill he reviels the details. It must be very clever; Iv spent alot of time thinking about how to do that and I couldnt figure it out.
If this is the case then it may not yet be practical since most people dont have direct3d 10 cards yet but its still very impressive
Apparently it works on older graphics cards too:
Harada-san wrote:
Erwin wrote:
Does your GPU programs only run on GeForce 8800, or also on older models like GeForce 6800??
It can work on GeForce7XXX (it must work on 6XXX too, although I've not tried yet).
Thats good to know; A little speculation untill he writes his paper:
I did a little research and there are some very fast sorting algorithms avaliable for the GPU. Since there is so much coherancy in a physics simulation, a simple sweep and prune broadphase done on the GPU might actually be very effective
Is there any more information on the methods used? What type of rigid body solver is it and does it ever stop jittering? Average number of contacts per body?
What is the average number of neighbours in the SPH simulation.
I am new to this thread. The particle method is just what I am looking for. Althrough there's some research dealing with massive amount of particles like UberFlow in 2004, it does not resolve multiple inter-collisions on a single particle and is even not capable of finding all collided particles. So I am wondering if this works can handle all paired collisions among all particles?
Also, I did some research on his code and it looks like he used some sorting first to put particles into so-called buckets and then resolved collisions on bucket-based. Not hard to image an efficient GPU spatial partitioning method is used to do the broadphase. But the interesting point is he use a vertex program to do this in stead of a fragment program. The collision resolver is done in another fragment program (also, there're loops inside collision resolver, maybe it does handle multiple collisions on a single particle). So far I still cannot figure out all the detail since it's all Cg-compiled ARB shader code and I am not good at reverse engineering...
However, as all particles are of same size, I think it's praticle to use simple octree or quadtree on modern GPU to prune the possible pair of collision (this is also what I am working on my research).
15.000 particles at 70Hz is however roughly the performance you can squeeze out of a CPU - in fact somewhat less.
He doesn't say anything about the smoothing radius or number of neighbours though, so it is hard to compare.
Cell can be taken to at least 3-4 times that performance.