Broadphase with large number of objects

User avatar
tasora
Posts: 26
Joined: Mon Aug 15, 2005 8:57 am
Location: Italy

Broadphase with large number of objects

Post by tasora »

Hallo,

I am using the Bullet library for broadphase& narrowphase
collision detection in my physics engine.
I made a simulation with almost 10'000 falling spheres,
and the problem is that the sweep'n prune broadphase stage
requires almost 2Gb of RAM and 2-3 minuted of CPU time
when the simulation is started (because the many objects
start with randomized positions, so the AABB sorting requires
some time and temporary storage, for the first frame).
For the following frames, of course, the RAM requirement
drops to much lower values (few Mb) and CPU time for CD
is few seconds, of course.
- Is there any room for improvements on the sweep'n prune
broadphase? (I suspect that, for extremely large numbers
of objects, a 3D hashing may be preferable..)
- Has anyone tested scenarios with more than 10'000
objects?

By the way: following is the snapshot of the anim:
http://ied.unipr.it/~tasora/download/falling9000.avi
Note, the LCP solver is different from the one of Bullet, I wrote
my own dynamics engine.

regards,

Alessandro
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Re: Broadphase with large number of objects

Post by Erwin Coumans »

Hi,

That movie looks great! I didn't have the chance to see it yet, it didn't display under my Mac OS X, but watching it under Windows was great!

I moved your topic to Bullet forum, because I think it's fairly Bullet specific, and it can definately be improved:

- It seems you are not setting the position of the rigidbody, before adding it to the world. That way, all objects initially start in the origin, overlapping. This means 100.000.000 overlapping pairs, explainging the 2Gb -> few Mb drop.
- we can add some pre-sorting, that would help a lot.
- add a memory allocator, and keep track of the memory usage
- I did some tests with 10000 spheres falling down a while back, and total collision+simulation time was faster then 1 second on my Macbook pro. Perhaps some issues have been introduced.
- 3d sweep and prune should be able to deal with so many object. Removing/adding objects might be slow, but that can be improved with 'marker objects'. But I don't think that is your issue here, you are not adding/removing objects once the simulation started, is it?
- with fairy regular motion, sweep and prune is probably much faster then hash space, even for 10.000 objects. We can also improve the performance by splitting the broadphase, and dealing with overlap (this is a todo)

Which version of Bullet are you using?
Shall we make some testbed that represents your case? I really want to try to improve Bullet's existing sweep and prune first, before reverting to any alternative.

Thanks,
Erwin
tasora wrote:Hallo,

I am using the Bullet library for broadphase& narrowphase
collision detection in my physics engine.
I made a simulation with almost 10'000 falling spheres,
and the problem is that the sweep'n prune broadphase stage
requires almost 2Gb of RAM and 2-3 minuted of CPU time
when the simulation is started (because the many objects
start with randomized positions, so the AABB sorting requires
some time and temporary storage, for the first frame).
For the following frames, of course, the RAM requirement
drops to much lower values (few Mb) and CPU time for CD
is few seconds, of course.
- Is there any room for improvements on the sweep'n prune
broadphase? (I suspect that, for extremely large numbers
of objects, a 3D hashing may be preferable..)
- Has anyone tested scenarios with more than 10'000
objects?

By the way: following is the snapshot of the anim:
http://ied.unipr.it/~tasora/download/falling9000.avi
Note, the LCP solver is different from the one of Bullet, I wrote
my own dynamics engine.

regards,

Alessandro
User avatar
tasora
Posts: 26
Joined: Mon Aug 15, 2005 8:57 am
Location: Italy

Post by tasora »

Hallo,
In fact, you are right:
- It seems you are not setting the position of the rigidbody, before adding it to the world. That way, all objects initially start in the origin, overlapping. [..]
Right, now that I update the position also before addding the
object to the collision world, the time for broadphase setup is
much lower, and the memory requirement is as low as I
expected. Perfect!
- I did some tests with 10000 spheres falling down a while back, and total collision+simulation time was faster then 1 second on my Macbook pro. Perhaps some issues have been introduced.
Note: with my simulation engine, a test with 10000 speres requires
up to 3-4 seconds of collision time per frame, or more, when lot of contacts are created. Well, I think it depends on the simulation
scene. Or maybe that your simulation takes just 1s per frame because
your Mac is faster than my laptop:) (Toshiba Satellite dual core 2.1GHz).

These timings may be reasonable, but I hope it could be
improved. My code (for LCP solution) takes about 0.3s each step,
so simulation time is almost negligible when compared to collision
time (from Bullet engine).
Removing/adding objects might be slow, but that can be improved with 'marker objects'. But I don't think that is your issue here, you are not adding/removing objects once the simulation started, is it?
Well, not in the case we was talking about, but in other cases, I
need the ability of adding/removing objects in linear time, because
I am also implementing a version of my engine as a plugin for a
3d modeling system, where I must deal with user interaction, GUI,
undo/redo, copy&paste, etc. For the moment, it works, but I suppose
that adding/removing isn't a linear-time process in bullet broadphase,
right now.
Which version of Bullet are you using?
Almost the latest (I mean, I still must install the latest release,
the one with floating point support).
Shall we make some testbed that represents your case?
The case in my demo animation can be considered a good
starting point: 10k objects falling into a box. I hope that it's
possible to cut collision time to the minimum (maybe reverting
to GPUs for the simple sphere-sphere narrowphase cases?)

Thank you for feedback!
Alessandro
coderchris
Posts: 49
Joined: Fri Aug 18, 2006 11:50 pm

Post by coderchris »

I think that the broadphase, while good, could definately be improved. If you look at havok's GPU physics, they have sometimes over 30,000 objects simulated on the gpu, however they are doing their broadphase on the cpu (im 99% sure at least). That means they are doing broadphase many times per second. Therfor they must either have very good sweep and prune or are using a different method.

I find that if you have many many objects (like 10,000) then it is good to use a spatial hashing type of broadphase with a little less accuracy than to use a very accurate broadphase (sweep and prune) that isnt as fast.

For smaller scenes however sweep and prune appears to be the best it gets
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Post by Erwin Coumans »

As ex-Havok employee I can say they have a very fast and efficient sweep and prune implementation, and I added ray cast support in their sweep and prune. Although Bullet's sweep and prune is not bad either, there is some room for improvement indeed. It's just that in most scenes broadphase isn't the bottleneck, hence its popularity in general purpose physics engines like Havok, PhysX, Bullet and many 'in-house'/homebrew engines.

I expect in most cases, including scenes that include 10 to 30.000 objects sweep and prune to be superior to hash spaces due to coherency. But it might be worthwhile that someone makes a comparison. Tasora: insertion into sweep and prune can be done in linear time, and that can be improved with batch-addition, presorting/radix sort and marker objects. But you told that insertion is not an issue for your case at the moment, so we better focus on general performance. It would be best to just create a BulletDemo and use VTune.

Erwin
coderchris
Posts: 49
Joined: Fri Aug 18, 2006 11:50 pm

Post by coderchris »

Iv started on a spatial hashing addon for bullet just to compare. Ill let you know when im finished.

Theres a very neat little article here: http://www.stereopsis.com/radix.html
which describes how to use radix sort for floating point numbers. I tried it on my system and it sorts the 65,000 element array over 500 times per second.
Do you think this type of sort would benefit the sweep and prune any?