Broad Phase Performance

DanielJosephTracy · Post by **DanielJosephTracy** » Wed Jun 01, 2011 8:01 pm

Erwin: you suggested the following in another thread:

"Dynamic aabb tree based broadphase is better than multisap in general. Non-moving objects are free and it can handle moving objects better."

I did some work on sweep and prune a few years ago and have a nice implementation of MultiSAP. I decided to test it against the btDBVT broad phase and see how it fared. I used "scalability tests", where I increase the environment and object count until the broad phase reaches some set processing time (1/60th second in this case).

Now, I had to create an adapter for bullet broad phases to be used in my test framework, so there may be some issues there. Also, I'm making a reverse adapter that can plug my broad phases into Bullet, but there is some difficulty there so far.

Anyways, some test results on my 2.4 GHz Intel Core 2 Duo 15" MacBook Pro w/ 4 GB 667 MHz DDR2 SDRAM, Mac OS X 10.6.7, Xcode 3.2.6:

Bullet DBVT:
All moving: 7,000 objects in 1/60th second
1% moving (the rest static): 100,000 total objects in 1/60th second
100,000 objects, none moving, insertions & removals: 55 adds & 55 deletes in 1/60th second

Grid S&P (single thread):
All moving: 20,000 objects in 1/60th second
1% moving (the rest static): 450,000 total objects in 1/60th second
100,000 objects, none moving, insertions & removals: 530 adds & 530 deletes in 1/60th second

These tests utilize an "output all overlaps" interface for both broad phases, which is less efficient, but I'm still learning how Bullet's interface works. All other parameters are identical (density, velocity, etc). My implementation is a Grid with Sweep and Prune in each cell. The tests are all uniformly distributed objects so far. I should continue the work and make a more adaptive superstructure: the overhead of the mapping phase can be eliminated by use of a "triggered migrations" feature where I add just the bounding boxes of adjacent/parent spatial domains to each sweep and prune cell, allowing the sort to find objects that are migrating between spaces. There is also undone work on combining batch insertion/removal with segmented interval lists.

But the implementation is pretty efficient. It seems to scale fairly well in a multithreaded environment, as well. The broad phase is open source should be available at http://www.danieljosephtracy.com/. I haven't updated the site in some time, but I could clean up and post the testing framework as well if anyone's interested.

I'll have to look into the details of the DBVT algorithm, but my testing so far indicates that it's not as sensitive to objects transitioning between resting & moving states as I thought. It performs insertions & removals quickly, but what holds it up in that department is having to traverse the overlap set to find/remove pairs containing the object being removed. Ideally, the broad phase should determine algorithmically which overlaps exist and then find those specific pairs. That would be much faster, especially for environments with large numbers of objects.

Daniel

Erin Catto · Post by **Erin Catto** » Thu Jun 02, 2011 4:13 pm

My problem with SAP is that it doesn't handle varying object sizes well. Your demo seems to use boxes between 3 and 7 units (not sure what you used in your benchmarks). Imagine you have object sizes between 1 and 100 units. Then the SAP will have to check large intervals on insertion and deletion. The dynamic tree shouldn't have this problem.

For you object size range, I'm guessing a uniform grid would be the fastest solution.

In your tests you should also include volume query and ray cast timings. I would use long ray casts that span around half of the world.

I recommend tracking the tree height to see if it is balanced. Also a comparison of memory usage would be useful.

DanielJosephTracy · Post by **DanielJosephTracy** » Thu Jun 02, 2011 5:23 pm

"My problem with SAP is that it doesn't handle varying object sizes well. Your demo seems to use boxes between 3 and 7 units (not sure what you used in your benchmarks). Imagine you have object sizes between 1 and 100 units. Then the SAP will have to check large intervals on insertion and deletion. The dynamic tree shouldn't have this problem."

I did use (3-7 units randomly in each dimension) for AABB size in the benchmark. Keep in mind that we're talking about a hybrid Subdivision+SAP algorithm, not just SAP. We subdivide the environment to keep about 300 objects in each cell, and further use segmented interval lists for faster insertion/deletion. Our use case has thousands of similarly-sized objects, but then also some enormous AABBs that overlap large swaths of objects. But our use case doesn't require us to dynamically add and remove the large ones. So I did the test you suggested below.

"For your object size range, I'm guessing a uniform grid would be the fastest solution."

If all objects are in motion, that is true. A simple grid does very well much of the time. For environments with many objects at rest, the Grid won't scale nearly as well, even if all objects are the same size.

"In your tests you should also include volume query and ray cast timings. I would use long ray casts that span around half of the world.
I recommend tracking the tree height to see if it is balanced. Also a comparison of memory usage would be useful."

All good suggestions. This brings up the need for the tester to have many different kinds of performance tests, and it's hard to think of all possible conditions. But your object size test I can do right away:

The following tests were done on a different (but similar) machine (laptop bit the dust!): iMac 2.16 GHz Intel Core 2 Duo, 3 GB 667 MHz DDR2 SRAM, Mac OS X 10.6.7, Xcode 3.2.6. Because the testing framework keeps total density constant, when you increase the variance in volume, you actually achieve a speedup over results with less variance. But with this in mind, the relative speedup for Grid+SAP is more.

With object size 1-100 in each dimension randomly:

Bullet DBVT:

All moving: 8000 at 60 fps
1% moving (the rest static): 130,000 at 60 fps
100,000 objects, none moving, level of dynamics at 60 fps: 80 adds & 80 deletes

Grid S&P (single thread):

All moving: 40,000 at 60 fps
1% moving (the rest static): 900,000 at 60 fps
100,000 objects, none moving, level of dynamics at 60 fps: 600 adds & 600 deletes

I have a basic "ToBullet" adapter now (unverified on Bullet side), but doesn't yet support aabbTest (external queries) or ray-casting.

Daniel

Erwin Coumans · Post by **Erwin Coumans** » Sat Jun 11, 2011 4:47 am

These tests utilize an "output all overlaps" interface for both broad phases, which is less efficient, but I'm still learning how Bullet's interface works.

The btDbvtBroadphase is incremental, and all pairs are kept in the hashed paircache. Each time, only the new and removed pairs should be reported to this cache, not all pairs. Otherwise all performance is lost, and the test not representative.

Did you make it traverse the tree for each object, or use all default settings, with paircache?

Can you please share the benchmark? Bullet also has a broadphase benchmark.
Thanks,
Erwin

DanielJosephTracy · Post by **DanielJosephTracy** » Sun Jun 12, 2011 10:28 pm

I did only update objects that were moving, for both algorithms.

I didn't change any of the default settings of the algorithm, except I did turn off the feature that increases AABB sizes slightly, allowing me to verify the output. I believe that it uses the hash-based paircache by default, if I remember correctly.

I will try to clean up the benchmark today and post it, providing a URL here, providing I have enough time. Its GUI is Mac-only, but there will be a simple command-line interface that should work on *nix/Windows.

Daniel

DanielJosephTracy · Post by **DanielJosephTracy** » Wed Jun 15, 2011 2:12 am

Alright, the tester is available at the following link:

http://www.danieljosephtracy.com/Daniel ... haseCD.zip

I didn't have time to do a lot with it. It still has a lot of tweaks and knobs for various potential changes, etc. It's also fairly heavily templated, so final algorithms are typically "composed" using various policies. In any case, I included the adapters I made recently. Also, if you want to test it in the Bullet tester, just use the ToBulletBroadphaseAdapter class and see how that works with Grid+S&P. An example instantiation is:

new ToBulletGriddedBroadphaseAdapter<SweepAndPrune::Segmented>( minEnv, maxEnv, objectCount, numThreads )

as shown in BulletBroadPhases.hpp at the bottom, where I attach that back into the system using the FromBulletBroadphaseAdapter.

Let me know if you want me to make some requested modifications or if you have questions,
Daniel

shadiq · Post by **shadiq** » Mon Jun 27, 2011 2:52 pm

Thanks for information.

iaterziev · Post by **iaterziev** » Sun Feb 07, 2016 2:56 pm

While searching for comparison tests I found this interesting topic and played with the code. Dbvh works best when objects are not penetrating much, otherwise when many objects share the same volume it can't partition space efficiently and queries the same space over and over. One problem with testing dbvh with random data is that depending on density and speed it may happen that the objects are already penetrating and then with dbvh you don't gain much from space partitioning even if you have good temporal coherence.

Real-Time Physics Simulation Forum

Broad Phase Performance

Broad Phase Performance

Re: Broad Phase Performance

Re: Broad Phase Performance

Re: Broad Phase Performance

Re: Broad Phase Performance

Re: Broad Phase Performance

Re: Broad Phase Performance

Re: Broad Phase Performance