Reducing overhead of resting bodies

DanielJosephTracy · Post by **DanielJosephTracy** » Tue May 24, 2011 9:42 pm

Erwin and I chatted a bit in the tail end of the Issue below, and he suggested that we continue in the Bullet public forum. The best way to acquire context is reading the Issue:

http://code.google.com/p/bullet/issues/detail?id=128

"The Bullet broadphase have a addPair and removePair. When adding a pair that involves a 'sleeping' proxy, the sleeping object will be activated, the proxy type changes, and an actual pair will be generated. So there is no delay/cycle skipped, all constraints are fine."

Ah! That was what I was missing: Bullet activates bodies when a broad phase overlap occurs. Does it also form contact groups from broad phase overlaps? Our system only activates bodies after narrow phase confirms contact with an active body, and forms contact groups from narrow phase contacts. I consider that to be an advantage in many cases. Our environment has a lot of scattered objects with somewhat high density at times.

"Sleeping objects only consumed the memory of a world transform."

Exactly. And since Bullet is already state-aware, I think this should be internal to Bullet and not require developer modification if there's no performance compromise. Four floats per resting object is a beautiful thing (I assume you stored as quaternion since it's not being used continuously). Of course, you typically also need broad phase proxy, collision shape pointer, and such.

"I got rid of almost all non-active rigid body traversals, except for one, in the island generation. This can also be removed, by storing the 'active' flag in the overlapping pairs (at creation), and incrementally updating this flag when needed."

What we do is use internal object state as a partial indication of whether it's been added to the disjoint subsets structure yet. We start by adding all active bodies. Then we traverse narrow phase output: if a body is still resting, activate it and it to disjoint subset. If a body is still resting, it isn't there yet. So no tags.

"In the optimized case, you still need to traverse over all overlapping pairs. It seems best to separate 'active' overlapping pairs and 'non-active' overlapping pairs, so the narrowphase only processes the 'active' overlapping pairs (having at least one active body)."

We never traverse all overlapping pairs. Our event-based output broad phase is used to maintain an overlap graph (hash table). Then to feed the narrow phase, we traverse the set of active bodies and look up their overlaps in the graph. We do not need multiple graphs.

"Of course, in the worst case, all objects are connected and sleeping and a single object wakes up: in that case, you need to traverse all."

That's a pretty bad worst case. Activating it can be made fast, but in any case you're now dealing with all objects in the simulation being active and part of a single contact group. There's no cure for that but your new parallel-constraint-solving-within-an-island stuff I haven't looked at yet.

"By the way, the performance dropped from 25ms to 4ms when creating 40000 sleeping boxes (non-overlapping) in the air. I could cleanup the code and move it in trunk if there is enough interest."

That's very good. Some comments:

1. A better test would be to keep a few random objects alive and moving around in the system with mostly resting bodies.
2. What broad phase are you using in this test? MultiSAP?
3. Where is the overhead now? Contact group (island) generation?
4. I just did a quick test on our system with 240,000 objects (all resting on trimeshes, massive overlaps): 0.75 ms
But again, that's all "overhead", and could be eliminated with some early-outs on empty data. A more useful test would include some moving objects in various places.

Erwin Coumans · Post by **Erwin Coumans** » Tue May 24, 2011 10:53 pm

Just to clarify, I was talking about several different Bullet implementations:

1 implementation was used in a PS2 game long time ago, where sleeping objects were removed and deleted, with only a proxy object remaining in the broadphase (used to activate/re-instance the sleeping object).

The other implementation is a refactoring I did yesterday, to try to avoid traversing all (sleeping) objects.

Interesting, for island generation you use a hash map, instead of union find.

2. What broad phase are you using in this test? MultiSAP?

No, MultiSAP is experimental slow code. I'm using the btDbvtBroadphase, based on dynamic AABB trees.

3. Where is the overhead now? Contact group (island) generation?

My quick refactoring of yesterday still had some brute force traversal during island generation, so there is the bottleneck.

4. I just did a quick test on our system with 240,000 objects (all resting on trimeshes, massive overlaps): 0.75 ms

Absolute timings depend on the machine. I was more interested in the difference, so I should have mentioned percentage improvement instead.

Is your implementation available somewhere (in particular the hash-map)?
Thanks,
Erwin

DanielJosephTracy · Post by **DanielJosephTracy** » Tue May 24, 2011 11:12 pm

"Interesting, for island generation you use a hash map, instead of union find."

No, I use union-find just as you do. The hash map stores broad phase overlaps. The union-find algorithm is incrementally constructed as I said starting with active objects and increasing as objects are activated.

"I'm using the btDbvtBroadphase, based on dynamic AABB trees."

That is unexpected. And you don't have a lot of overhead there? Do not rebuild the tree every cycle? And that it can somehow perform processing changes without high overhead for resting bodies? This might a stage where having some moving bodies various places in the environment will change things. MultiSAP is the way to go for environments with most bodies at rest. Unless there's something here that I'm unaware of.

"Absolute timings depend on the machine. I was more interested in the difference, so I should have mentioned percentage improvement instead."

I apologize. Didn't mean to be grand-standing. Test was on a fast machine. In any case, I think that Bullet's conversion as an "incremental pipeline" is just beginning and I hope that it continues. There are some uses and environments for which this is critical, and again, I think it doesn't have to reduce performance in other environments.

"Is your implementation available somewhere?"

Not at the moment. I work for a public institution (university): I could ask my employer about open-sourcing it for reference. However, I also have a paper submitted to a journal (and evolving) that also explains these things in detail, which will be likely be posted in some fashion soon.

Daniel

Erwin Coumans · Post by **Erwin Coumans** » Wed May 25, 2011 2:38 am

Do you rebuild union find from scratch each frame?

Dynamic aabb tree based broadphase is better than multisap in general. Non-moving objects are free and it can handle moving objects better. I'll write a blog posting about it soon. Let's focus on island generation for now.

DanielJosephTracy · Post by **DanielJosephTracy** » Wed May 25, 2011 4:20 am

"Do you rebuild union find from scratch each frame?"

Yes, but only active bodies and bodies in contact with them are added. I'm assuming that we're talking about union-find to construct contact groups or "islands" in ODE terminology.

The union-find structure begins empty. Here is what we do from there. Some of it may be specific to our needs.

1. Add all active bodies to the union-find structure
2. For each object pair overlapping from the narrow phase:
----A. For each object in the pair
--------1. If object was in physics-resting state, put in physics-active state
--------2. If object was part of a resting island, activate all objects in island, adding them to union-find
--------3. If object is not in union-find, add it
----B. Perform join on object pair

The actual code we use for part A is more convoluted and should be refactored (switch statement on state with fall-throughs, redundant island activations), but I'm pretty sure this is what it comes down to.

So it might be unusual for the union-find class to continually add more independent nodes while processing joins, but it works fine. You just need a dynamic array (e.g. std::vector) to hold node info. No problem. And the running time is proportional to the number of active bodies, as it nicely avoids having to touch bodies at rest.

Apologies for the dashes. The forum software doesn't seem to indent for leading spaces or tabs!

Daniel

Erwin Coumans · Post by **Erwin Coumans** » Fri May 27, 2011 6:55 am

There seem to be a few problems with your recipe.
First of all, does Step 2 (For each object pair overlapping) traverse sleeping pairs? If it traverses sleeping pairs, it will touch all objects (assuming all objects are resting on the ground), which contradicts your statement that you don't traverse all sleeping objects.

Secondly, can you clarify what happens when applying your recipe to the following scenarios:

1) 2 islands, both sleeping. When you apply your recipe, step 2-A-1 will wake up both islands and you have to traverse all objects, even though all objects are sleeping. Why does step 2-A-1 need to wake up sleeping objects?

2_islands_sleeping.png

2) The user wakes up object C before the simulation starts, the rest is still sleeping. Which step of your recipe wakes up object E, part of pair (D,E)?

object_c_wakes_up.png

Thanks,
Erwin

DanielJosephTracy · Post by **DanielJosephTracy** » Fri May 27, 2011 11:19 pm

"Does Step 2 (For each object pair overlapping) traverse sleeping pairs?"

No. It traverses the object pairs that form the output of the narrow phase. Remember that the narrow phase is performed only between pairs of objects in which at least one are active. So that will contain some sleeping objects, but only as part of an active/resting pair. This allows us to wake up objects when necessary.

Quick re-explain before covering scenarios: When starting the narrow phase, to get the narrow phase stage's input, we traverse the set of active objects and look them up in the Broadphase Overlap Graph (the graph of all overlaps according to the broad phase). We take any overlaps with those active objects and give those pairs to the narrow phase.

Then the narrow phase outputs a subset of the same pairs (among other things), indicating which pairs are actually penetrating. This output contains pairs in which at least one body is non-resting.

"Why does step 2-A-1 need to wake up sleeping objects?"

Step 2-A-1 needs to wake up sleeping objects based on collisions with non-sleeping objects. This is the primary way that objects are activated. Right there during the union-find.

In Scenario 1, no work is done for these islands.

In Scenario 2, C was an active object whose overlap with D was processed in the narrow phase. Assuming that <C, D> interpenetrates, 2-A-2 will activate the island {D, E} and place them in the union-find. Just like in Bullet, we activate and de-activate objects at the island-level, so {D, E} were already stored as an island. Then 2-B will coalesce them into {C, D, E}.

The set of islands consists of two data sets. The active islands are regenerated fresh each cycle. The resting islands are stored away in a map and retained. So every cycle the active set is cleared and re-generated. In the example collision with object C, {D, E} was a resting island tucked away in a map. Then they were activated, so that island was removed from the resting islands and made part of the active islands. That means that the set {C, D, E} will be blown away at the end of the cycle and re-generated using union-find next cycle, and the resting set will remain without any of those objects. All objects should be a part of one set or the other.

This is somewhat different from the way that Bullet runs, so it's not immediately intuitive. I don't know how well some of this will apply to your system. You may find that you need to make more extensive changes than you planned due to interface and relationship differences between the stages.

I hope that's clear,
Daniel

Erwin Coumans · Post by **Erwin Coumans** » Sat May 28, 2011 12:43 am

The resting islands are stored away in a map and retained.

This is the crucial bit, and I'm planning to implement such a system for Bullet. Without storing sleeping pairs, you will need to traverse all sleeping pairs.

Thanks for sharing your thoughts,
Erwin

DanielJosephTracy · Post by **DanielJosephTracy** » Wed Jun 01, 2011 8:05 pm

"Thanks for sharing your thoughts,
Erwin"

Thank you for your interest. I did some broad phase testing based on your statement above about Bullet's DBVT approach vs MultiSAP, so I put up another topic:

http://bulletphysics.org/Bullet/phpBB3/ ... f=4&t=6863

Daniel

Real-Time Physics Simulation Forum

Reducing overhead of resting bodies

Reducing overhead of resting bodies

Re: Reducing overhead of resting bodies

Re: Reducing overhead of resting bodies

Re: Reducing overhead of resting bodies

Re: Reducing overhead of resting bodies

Re: Reducing overhead of resting bodies

Re: Reducing overhead of resting bodies

Re: Reducing overhead of resting bodies

Re: Reducing overhead of resting bodies