Multi SAP

Pierre · Post by **Pierre** » Tue Jul 31, 2007 2:31 pm

Hi,

I implemented the "multi SAP" optimization we talked about in this forum a while back. I also benchmarked the new code against available versions, and while doing this I found that Bullet doesn't report the same number of overlapping shapes as the others. Below are my preliminary notes for this stuff.

Any idea about this Bullet issue? Is this normal? If yes, is there a way to "fix" this?

Thanks,
- Pierre

-----------------------------------------

Multi-SAP notes:
================

The sweep-and-prune (SAP) algorithm doesn't scale well. As the number of objects
increases in the SAP structure, updating it for a single object takes longer and
longer. To solve this, a natural idea is to use multiple broad-phases instead of
a single one. For example, and that's what we tried here, one could use a 2D grid
of SAP structures.

The implementation is not trivial, but also not super complicated. So the details
are left out for another time. Basically we rasterize a 2D bounding box (discarding
the "up" axis) into a 2D grid covering the game world, and objects are inserted in
all SAPs of covered cells. A given object can be inserted in multiple SAPs.

Then we profiled 2 things:
- insertion of new objects (a new object is created in the game world)
- updates of objects (an existing object is moved around)

We profiled 4 different implementations:
- the original ICE sweep-and-prune (based on the one released in Opcode 1.3, but
more optimized)
- the multi SAP which has been built on the ICE version
- the PhysX SAP implementation
- the Bullet SAP implementation (from version 2.55)

In all honesty, only the ICE SAP and its multi-SAP version should be compared,
as they share the same algorithmic details. Other SAP implementations did not
make the same design choices (e.g. linked-lists or arrays, FPU or CPU compares,
etc) so any speed difference between them and the Multi-SAP might come from
those design choices, more than from the nature of the SAP (single or multi).

However it is useful to compare the PhysX and Bullet versions to the ICE SAP,
to see the impact of those design choices on runtime speed. For example it is
often heard that "using X is faster than using Y" when it comes to SAPs. We
found that very often, it actually depends on how X or Y has been implemented,
more than anything else. (For example a linked list of pool-allocated elements
is not as bad as a linked list of elements randomly located in memory.)

In any case, here are the features for the different SAP implementations:

ICE single SAP:
- linked lists (objects allocated from pools)
- FPU or CPU comparisons (CPU in this test)
- unlimited game world

ICE multi SAP:
- 2D array of "ICE single SAPs" (for this test, grid is 8*8 = 64 SAPs)
- limited game world (world bounds needed at creation time)

PhysX:
- arrays
- CPU comparisons
- unlimited game world

Bullet:
- arrays
- CPU comparisons
- quantized boxes
- limited game world (world bounds needed at creation time)
- limited to 32767 objects (*)

(*) using the BP_USE_FIXEDPOINT_INT_32 define to remove this constraint has a
significant performance impact, as we will show in one test.

================================================================================

I - Insertions:
---------------

The test is like this: we create N randomly located objects in a (1000 x 1000 x
1000) game world. Objects have a homogeneous size. This is somewhat artificial
but all implementations are tested against the same scenario so it should be fair.

Then we create object N+1 and profile how long it takes to update the structure.
Some implementations (e.g. PhysX) do not update the structure immediately, only
when the broadphase (BP) is later "updated", so we included the BP update in the
profile. Note however that an implementation (e.g. PhysX) can be optimized for
multiple inclusions at the same time, and this test doesn't reflect that feature.
But the focus should be on the MultiSAP here. At the time of writing the MultiSAP
version doesn't take advantage of multiple insertions, so we didn't test this case
(but we might come back to this later).

Anyway, current results are like this:

--------------------------------------------------------------------------------

N = 400
Insertion time PhysX: 142208
Insertion time Bullet: 66808
Insertion time ICE single: 169456
Insertion time ICE multi: 8056

N = 4000
Insertion time PhysX: 1528408
Insertion time Bullet: 1159072
Insertion time ICE single: 2725808
Insertion time ICE multi: 35712

N = 10000
Insertion time PhysX: 4417480
Insertion time Bullet: 4024632
Insertion time ICE single: 6730048
Insertion time ICE multi: 174712

--------------------------------------------------------------------------------

And the comments:

- insertion has never been optimized in the ICE single SAP. Object is just added
to the 3 linked lists in linear line. It does not try to use "stabbing numbers"
or "markers" to optimize this process. So it is not a surprise that this version
is the slowest.

- a much more interesting result comes from the Multi SAP, which really shines
here. It is based on the same ICE code, so it also doesn't use any special
optimization in individual SAPs. Nonetheless, it easily beats all the other
implementations for insertion times. This is good because that was the whole
point of the multi SAP in the first place: to optimize creation of new objects.

- the multi SAP is betwen 20 and 76 times faster than its "single" counterpart.
It is also between 8 and 32 times faster than Bullet, which is usually slightly
faster than PhysX for insertions.

================================================================================

II - Updates:
-------------

In this test we use the same setup as before, and objects are randomly moved within
the game world, following sine/cosine curves (Lissajoux). Again, this is very
artificial, but everybody's tested against the same scenario.

We have 3 variables to play with here:
- the total number of objects in the world (N)
- the number of objects moving at any given time (M)
- the speed of moving objects

We profiled two main scenarios:
- all objects are moving
- 1% of objects are moving

The first scenario is rather unlikely in a game, but it is a good stress test.

The speed is the same for all objects. We need to do more tests with varying
speeds.

Results so far:

The 8 columns are:

PhysX time | ICE single time | Bullet time | ICE Multi time ||
PhysX nb pairs | ICE single nb pairs | Bullet nb pairs | ICE Multi nb pairs

We recorded results for several frames. One line = one frame. We report results
for several frames, as the numbers might slightly evolve and diverge from one
frame to the next.

PX = PhysX
IS = ICE Single
IM = ICE Multi
BT = Bullet

--------------------------------------------------------------------------------

1) All objects are moving:
--------------------------

N = M = 400 objects
PX | IS | BT | IM ||PX |IS |BT |IM
646 | 526 | 773 | 548 || 2 | 2 | 4 | 2
647 | 543 | 754 | 537 || 2 | 2 | 3 | 2
646 | 523 | 753 | 561 || 2 | 2 | 3 | 2
645 | 529 | 766 | 539 || 2 | 2 | 3 | 2
642 | 537 | 761 | 557 || 2 | 2 | 3 | 2
644 | 526 | 743 | 522 || 2 | 2 | 2 | 2
650 | 537 | 759 | 582 || 2 | 2 | 2 | 2
653 | 541 | 749 | 535 || 2 | 2 | 2 | 2

N = M = 4000 objects
PX | IS | BT | IM || PX | IS | BT | IM
14504 | 20545 | 16579 | 16540 || 275 | 275 | 342 | 275
14987 | 21770 | 17464 | 17963 || 277 | 277 | 349 | 277
15938 | 21043 | 16890 | 16955 || 274 | 274 | 344 | 274
14454 | 21336 | 17075 | 17371 || 276 | 276 | 346 | 276
14410 | 20406 | 16347 | 16555 || 277 | 277 | 347 | 277
14468 | 20304 | 16497 | 15906 || 278 | 278 | 350 | 278
14638 | 22655 | 17377 | 18334 || 278 | 278 | 351 | 278
14982 | 21792 | 17242 | 17777 || 279 | 279 | 354 | 279
14810 | 22142 | 17450 | 17627 || 279 | 279 | 354 | 279
14843 | 22018 | 16640 | 16976 || 281 | 281 | 357 | 281
14982 | 22247 | 17565 | 17419 || 281 | 281 | 355 | 281

Comments:

- for some reason Bullet does not report the correct number of pairs! We don't
know if it's a bug in Bullet, but probably not. We suspect it is probably
because Bullet uses quantized bounds, or maybe some kind of loose bounds, i.e.
it only reports conservative results (to be confirmed). In any case we kept
the results, as the timings seem to indicate that Bullet does "the right job"
anyway (i.e. performance is consistent with other SAP implementations).

- when all objects are moving and the SAP contains "few" objects (400), the Multi
SAP doesn't show any benefit, and is actually a bit slower than the original code.
This is expected for two reasons: a) if the number of objects is limited, the
effects of cache misses during updates is not too big, and b) the Multi SAP has
some fixed overhead to rasterize objects bounds into the grid. This overhead
becomes significant when the update itself is very cheap.

- in the same scenario (400 objects moving) the ICE SAP happens to be faster than
both PhysX and Bullet. Linked lists seem faster than arrays here.

- however things change when the number of objects is bumped to 4000: here the ICE
single SAP becomes slower than the Bullet & PhysX implementations. At the same
time the Multi SAP becomes useful and always runs faster than the single version,
giving it roughly the same speed as Bullet (but still slower than PhysX). On the
other hand the number of pairs reported by Bullet seems really excessive here,
and if further box-box tests are needed to get back the real set of colliding
pairs, it may lose some performance.

- Bullet is always slower than PhysX here.

--------------------------------------------------------------------------------

- note that if all objects are moving, no SAP implementation will beat the "radix
based" pruning. The "box pruning" code in Opcode 1.3 needs ~4000 time units to
find the correct number of pairs in the second test, which is 3.5 times faster
than the fastest SAP implementation in this benchmark.

--------------------------------------------------------------------------------

In sake of completeness we repeat the last test without the Bullet limit of 32767
objects, i.e. with BP_USE_FIXEDPOINT_INT_32 defined:

N = M = 4000 objects, BP_USE_FIXEDPOINT_INT_32
PX | IS | BT | IM || PX | IS | BT | IM
14630 | 22643 | 20218 | 18519 || 276 | 276 | 346 | 276
14624 | 22348 | 19567 | 17763 || 277 | 277 | 347 | 277
14566 | 21406 | 19024 | 16587 || 278 | 278 | 349 | 278
14579 | 21655 | 19252 | 17724 || 278 | 278 | 351 | 278
15615 | 22640 | 20584 | 18405 || 279 | 279 | 353 | 279
14605 | 23848 | 20065 | 18554 || 279 | 279 | 351 | 279
14925 | 22386 | 20701 | 17334 || 281 | 281 | 356 | 281
15073 | 22578 | 20634 | 17603 || 281 | 281 | 356 | 281

We see that there is a significant performance loss here, from ~17000 time units
to ~20000. This makes Bullet comparable in speed to the ICE single SAP, which is
interesting because one of them uses arrays while the other uses linked lists.
We are in the case where the data structure used should have a significant impact
(all objects moving), yet both implementations roughly have the same speed.

Regardless, BP_USE_FIXEDPOINT_INT_32 is not used for other tests.

--------------------------------------------------------------------------------

2) 1% of objects are moving:
----------------------------

N = 4000, M = 40
PX | IS | BT | IM || PX | IS | BT | IM
186 | 187 | 187 | 85 || 89 | 89 | 89 | 89
168 | 190 | 193 | 79 || 89 | 89 | 89 | 89
169 | 194 | 192 | 156 || 89 | 89 | 89 | 89
166 | 206 | 190 | 114 || 89 | 89 | 89 | 89
159 | 179 | 187 | 119 || 89 | 89 | 89 | 89
159 | 179 | 187 | 92 || 89 | 89 | 89 | 89
150 | 167 | 186 | 87 || 89 | 89 | 89 | 89

N = 10000, M = 100
PX | IS | BT | IM || PX | IS | BT | IM
1067 | 1838 | 1227 | 701 || 521 | 521 | 525 | 521
1083 | 1880 | 1233 | 880 || 521 | 521 | 525 | 521
1143 | 2084 | 1284 | 876 || 521 | 521 | 525 | 521
1161 | 2141 | 1294 | 659 || 521 | 521 | 525 | 521
1165 | 2134 | 1321 | 802 || 521 | 521 | 525 | 521
1142 | 2055 | 1304 | 892 || 521 | 521 | 525 | 521
1453 | 1970 | 1256 | 731 || 521 | 521 | 525 | 521
1129 | 1917 | 1244 | 800 || 522 | 522 | 526 | 522
1236 | 2173 | 1347 | 686 || 522 | 522 | 526 | 522

Comments:

- in this more realistic test where a few objects only are moving, the multi SAP
becomes really useful. It is often twice faster than the single SAP counterpart.
It is also faster than all other SAP implementations.

- with a limited number of objects, performance for all single SAP implementations
is roughly the same. When that number is increased however, the array-based SAPs
(PhysX/Bullet) perform better.

- here again, the Bullet implementation is a bit slower than the PhysX one.

================================================================================

Conclusion so far:

- the MultiSAP is very efficient for insertions, and quite efficient for updates.
It's always a win except in trivial situations, where its overhead is just as
costly as the deeper SAP operations.

- more tests are needed to check the impact of:
- bigger SAP grids (16*16, etc)
- object speeds (slower speed, faster speed, etc)
- heterogeneous objects

- it would be interesting to apply the "multiple SAPs" strategy to both Bullet &
PhysX implementations...

Erwin Coumans · Post by **Erwin Coumans** » Tue Jul 31, 2007 4:59 pm

Hi Pierre,

Thanks a lot for sharing your research in broadphases. A multi-SAP should give good results indeed.

I found that Bullet doesn't report the same number of overlapping shapes as the others.

Did you realize Bullet needs an additional step to clean/sort the overlapping pairs? So you need to call btAxisSweep3::processAllOverlappingPairs each frame. This is due to 'delayed' removal of pairs: instead of during the sweep stage, removal of non-overlapping pairs happens during this sort/clean 'processAllOverlappingPairs.

And indeed, as you states, the bounds are quantized, hence conservative, which could cause the difference. Can you check/assert that Bullet at least gets the pairs that the other gets?

Another thing: Bullet is used in some huge-world streaming game for XBox 360/Playstation 3, and the insertion of a large batch became the bottleneck. So we optimized the inert-batch case by using a loose octree to rebuild the overlapping pairs, and then inserting the sorted begin/end points into the SAP (without doing the add/remove pairs during this merge). This completely got rid of the spikes of large insertions.

Another note: the marker objects can be more efficient then expected: Keeping a single marker object in the middle can speed up things dramatically. This marker object should be kept in the middle of each axis. This is similar to the first level of an octree: each axis gets subdivided in the middle. Just adding a 2 or 3 more nested levels will increase the performance more exponentially. It would be good to compare markers too.

Thanks,
Erwin

Julio Jerez · Post by **Julio Jerez** » Tue Jul 31, 2007 5:04 pm

Don't you think that would be a multiresoltution grid that you are reinventing there with a different name. I use the same method and I still call it Multiplication grid.
What units is the timing? microseconds or nanoseconds
I see you are comparing with the establishment engines.
Let me introduce my entry. In order of complexity
http://www.youtube.com/watch?v=FUjLx5DAb5k (around one thousands)
http://www.youtube.com/watch?v=PDYrcl_i3AA (around 3 thousand)
http://www.youtube.com/watch?v=VmQgKqEF-s8 (around 27 thousand)

It is the same Newton version, I just linearized the solver so that I can contest these dishonest comparison made be the engine appraisal where they use a technology that is using an algorithm with a quadratic time complexity to compared with one the is using a linear time complexity.

In case you think the video are a hoax, I will tell you that you can check the Collada viewer so that you can compare Newton run time performance with your new version, and while you are at it maybe you can compare with PhsyicsX, Bullet and ODE.
http://www.physicsengine.com/forum/viewtopic.php?t=3722

And please made the result public.
Maybe I get better result bu challenging you that I got with by challenging Mister KenB.

Erwin Coumans · Post by **Erwin Coumans** » Tue Jul 31, 2007 5:12 pm

Julio, I think Pierre's posting is exclusively about broadphases.

I haven't checked Newton API, but is it possible to test just its broadphase (add/remove/move/check overlapping broadphase pairs), without doing anything else (no contact generation, no physics etc) ?

You new Newton results look impressive, it might be better to add a separate posting, so it doesn't gets lost in this broadphase thread.

Thanks,
Erwin

Julio Jerez · Post by **Julio Jerez** » Tue Jul 31, 2007 5:50 pm

You new Newton results look impressive

you say looks, they do not just look, they are real.

New post? this is about Broadphase or do you think that Newton is the only engine that do not uses bradphase, are you going to send my respond to the back of the bus as usual, so that is is better to ignore it rather that face it?

About the Broadphase, don't you think that if 27 thousands clustered balls with correct Coulomb friction are moving at iterative time, I assume that even you will agree that at some point there must be some kind of runtime scene management.

I know you haven't used Newton, which make me wonder how did you arrived to the conclusion that Bullet is the four best physics engine in particular that bullet better that Newton.
Did you just read the appraisals by mr KenB students, for witch I never have an response and I never saw a tangible demonstration.

Erwin Coumans · Post by **Erwin Coumans** » Tue Jul 31, 2007 6:00 pm

Julio,

My assumption is that Newton has some broadphase, just like most engines. My question was if Newton API allows to benchmark just its broadphase. Then you can ask Pierre to consider to include Newton's broadphase next time.

Pierre's topic is about broadphases, so please stick with that topic.
Thanks,
Erwin

dog · Post by **dog** » Tue Jul 31, 2007 6:54 pm

Thank you Pierre!

You should also measure performance with large numbers (say 10%) of deletions. Naive array based implementations can exhibit spikes when this typical game behavior occurs.

Pierre · Post by **Pierre** » Wed Aug 01, 2007 11:09 am

Oh dear. What the hell happened here? Sigh.

So you need to call btAxisSweep3::processAllOverlappingPairs each frame.

I already do that. If I don't, the number of pairs is always 0.

And indeed, as you states, the bounds are quantized, hence conservative, which could cause the difference.

Ok, that should be the reason then.

Can you check/assert that Bullet at least gets the pairs that the other gets?

I will.

Another thing: Bullet is used in some huge-world streaming game for XBox 360/Playstation 3, and the insertion of
a large batch became the bottleneck. So we optimized the inert-batch case by using a loose octree to rebuild the
overlapping pairs, and then inserting the sorted begin/end points into the SAP (without doing the add/remove pairs
during this merge). This completely got rid of the spikes of large insertions.

Ok, I see. Interesting. In PhysX we pre-sort the batch not to restart walking the list for each element, but we still do the add/remove during the merge (I think. I didn't write the last BP code). I might try your approach if I get some time (although I'd use "box pruning" instead of a loose octree, since I'm "very big on that sort of stuff", right?

)

Another note: the marker objects can be more efficient then expected: Keeping a single marker object in the middle
can speed up things dramatically. This marker object should be kept in the middle of each axis. This is similar to
the first level of an octree: each axis gets subdivided in the middle. Just adding a 2 or 3 more nested levels will
increase the performance more exponentially. It would be good to compare markers too.

...another thing to test I guess. I'm not a big fan of the markers approach but I might try it later. The MultiSAP
seems to "solve" most insertion issues anyway so all the alternatives are down the priority list right now.

Don't you think that would be a multiresoltution grid that you are reinventing there with a different name.

No. A multiresolution grid is just a set of grids, similar to a loose octree. There is no SAP in there. There is also quite a bit of litterature about them (Christer Ericson's book, if nothing else). But we're talking about a grid of SAPs, which is quite a different thing. And I don't think I saw an article/paper about it either. It got mentioned here in this forum a while back, and I said I would go back to it "when I have time". Well, that's now, I have time, I get back to it.

BTW it's not a "re-invention", it's a re-implementation of an already existing, already discussed structure. We don't have one yet in our codebase, we need something like this to solve a very real problem in a very real game, and there is no available source code anywhere for such a thing. So, no choice, we have to re-implement it.

I use the same method and I still call it Multiplication grid.

(I guess you meant "multiresolution", not "multiplication"). I you use an array of SAPs, I wouldn't call that "Multiresolution grid", it's confusing. A multiresolution grid is something else. For starters, at least in my version, the grid doesn't have "multiple resolutions" at all.

What units is the timing? microseconds or nanoseconds

Units are either cycles or kilo-cycles.

I don't like microseconds in benchmarks since it doesn't give you a stable reference point for comparisons. "1 ms" is not the same on a 1 Ghz machine or on a 3 Ghz one, for example.

I just linearized the solver so that I can contest these dishonest comparison made be the engine appraisal
where they use a technology that is using an algorithm with a quadratic time complexity to compared with
one the is using a linear time complexity.

It doesn't sound "dishonest" to compare different algorithms with different big-O complexities. I mean, that's the whole point of benchmarks. One algorithm might be O(n), the other one O(log n), it's fair to compare
them if they do the same job. If you only compare O(n) to O(n), you're not comparing algorithms, you're pretty much comparing implementations, which is somewhat different.

Anyway this is off-topic. This thread is NOT about comparing engine X to engine Y, this thread is about:
1) finding out why Bullet doesn't report the correct number of pairs
2) comparing a multi SAP to a single SAP

The PhysX/Bullet benchmarks are a bonus, not the main focus of the thread.

I suggest you start a new thread to promote Newton instead of hijacking this one. Man, that's just rude. If they left you out in the past, maybe it's because of your attitude, not because of your technology?

I will tell you that you can check the Collada viewer so that
you can compare Newton run time performance with your new version, and while you are at it maybe you
can compare with PhsyicsX, Bullet and ODE.

I don't mind adding Newton to the test, but I briefly looked at it (v1.53) and it doesn't seem I can use the broadphase all alone. I don't see how I can include Newton, the API is not "open" enough.

For Bullet I just added those 3 functions:

udword CreateBulletVolume(const AABB& box, int id);
void UpdateBulletVolume(udword handle, const AABB& box);
udword GetBulletNbPairs();

I need to be able to do these operations in Newton, to compare it to others. I don't want to create a "world", or "shapes", or anything: this is just a broadphase test.

Also, I don't see what the Collada viewer has to do with this. I'm using a standalone test with no graphical output, and I make sure everybody's called the same way, on the same data, etc. It would be completely meaningless to get some figures (FPS? ms?) from some viewer doing some other things, probably suffering from different cache misses, etc, and compare that to others.

Maybe I get better result bu challenging you that I got with by challenging Mister KenB.

I'm afraid I didn't follow that story with "Mister KenB".

Julio, I think Pierre's posting is exclusively about broadphases.

Indeed.

You should also measure performance with large numbers (say 10%) of deletions. Naive array based
implementations can exhibit spikes when this typical game behavior occurs.

Ok, I'll keep that in mind and try to check this.

I am just responding to Pierre claims about his new rediscovery.

I didn't claim I "discovered" anything. I *implemented* something already discussed here a while back. Havok had this for years, I'm not discovering anything here.

Quite frankly I never considered sorting a very big challenge, but he seems to be very big on that sort of stuff.
For years he seems to claiming to knows sorting better than anybody else and no one challenge his claims.

Errrr, hello? Now what the hell is that about?

Are you talking about "sorting" or "broadphases"? None of them is a "very big challenge". Hell, game physics is not a big challenge, everybody and his dog seems to be doing it nowadays.

I didn't claim to "know sorting better than anybody else", that's plain ridiculous. I have a few favorite algorithms like radix-sorting and sweep-and-prunes, so I talk about them, share my results, and try to improve them as much as possible. That's about it. If a guy is good or knows better, I don't mind praising him. See the first page of Christer Ericson's book for example. If somebody knows collision stuff "better than anybody else", that's him.

Anyway, can we get back on topic now?

- Pierre

John McCutchan · Post by **John McCutchan** » Wed Aug 01, 2007 12:17 pm

Pierre wrote:Oh dear. What the hell happened here? Sigh.

Anyway, can we get back on topic now?

- Pierre

Yep, would you mind explaining how you handle the case when an object begins and ends to straddle two SAPs?

Thanks,
John

Dirk Gregorius · Post by **Dirk Gregorius** » Wed Aug 01, 2007 12:22 pm

If I understood the text correctly objects are inserted into several SAP spaces when they overlap more than one cell.

Basically we rasterize a 2D bounding box (discardingthe "up" axis) into a 2D grid covering the game world, and objects are inserted in all SAPs of covered cells. A given object can be inserted in multiple SAPs.

Pierre · Post by **Pierre** » Wed Aug 01, 2007 12:51 pm

Yep, would you mind explaining how you handle the case when an object begins and ends to straddle two SAPs?

It's completely automatic for me. There is no "boundary management" or "overlapping broadphases". The thing to realize is that in a normal SAP, duplicate pairs are already added several times, for the different axes. So another SAP is really just anoter set of axes, nothing more. The only thing to do is share the pair container between all SAPs, and the same mechanism that filtered out duplicates in a normal SAP will filter out duplicates coming from other SAPs...

- Pierre

Eternl Knight · Post by **Eternl Knight** » Wed Aug 01, 2007 1:21 pm

Pierre:
Thanks for your reply on the SAP details. It is, after all, the reason I was reading the thread

When you have finished the "same pairs" test (i.e. ensuring that the pairs in your implementation are all covered in the Bullet implementation), could you let us know here. It is something I would like verified

Also, you mentioned "in PhysX" in a context that makes me wonder if you are involved in its development. Not that it matters, but it helps me put things in context when I know the background behind questions/comments.

Julio:
RE: ODE & Bullet - "You are and expert on those I am an expert in Newton so let...". This kind of childish attitude and "conspiracy theory" stuff is what makes people "turn off" on your rants. I never claimed to be an expert on Bullet or ODE.

I have used them, I know some of their quirks & limitations. This knowledge gives me confidence as to when and how I can use them. That is all. I know that Newton works well in certain scenarios, but there is (as Pierre pointed out), some limitations on what I can do with it. Some of these limitations are nothing to worry about, some of them are.

I have to say, that even if it were proven that all things are equal between Newton & Bullet - I would have to choose the developer not out there attacking everyone as the one more approachable when I have questions.

--EK

John McCutchan · Post by **John McCutchan** » Wed Aug 01, 2007 2:42 pm

Pierre wrote: It's completely automatic for me. There is no "boundary management" or "overlapping broadphases". The thing to realize is that in a normal SAP, duplicate pairs are already added several times, for the different axes.

Sorry, I should have been more specific in my question. Yes, I had the same thought on treating each SAP independently and handling duplicate pairs in the pair manager (ref counting comes to mind). What I was getting at is, how do you handle adding/removing/transitioning objects between the independent SAPs efficiently.

Erwin Coumans · Post by **Erwin Coumans** » Wed Aug 01, 2007 5:41 pm

In Bullet, the adding/removing of pairs is already a separate stage, so once the paircache is separated from the broadphase, it can be shared among a grid of broadphases. I expect a multi-SAP pretty trivial under such circumstances, so your promising results encourage me to give it a try.

Pierre, do you forsee any tricky cases in above scenario?
Thanks,
Erwin

By the way: sorry for the harassment and disrespectful postings by Julio Jerez.
Julio continues to clutter this broadphase topic, even after I asked him to start a new topic. As he ignored it, I split some postings into a new topic.
I have explained him on the COLLADA forums, that the choice of popular physics engines during COLLADA design was indeed arbirary and not exhaustive. The COLLADA Physics open standard is meant to democratize physics engine development. We are discussing this in a panel at Siggraph next week, called Open Source toolchains.

Pierre · Post by **Pierre** » Wed Aug 01, 2007 5:59 pm

When you have finished the "same pairs" test (i.e. ensuring that the pairs in your implementation are all covered in the Bullet implementation), could you let us know here. It is something I would like verified

Ok, I just checked this. The Bullet pairs indeed contain all "my" pairs, plus some more. I have to say it was a bit painful to access though. I got the array of pairs with "getOverlappingPairsArrayPtr" or something, and found out that a LOT of entries there are invalid (null m_pProxy0 pointers). So one has to manually parse the whole thing to discard the invalid entries, and it can take some time (e.g. I had 337 valid pairs in an array of 7074 entries). I guess it's more efficient if using some new-pair / deleted-pair callbacks, but I can't help thinking it would be better to have a real array with no "holes" all over the place.

In any case, all the overlapping pairs are there nonetheless.

Also, you mentioned "in PhysX" in a context that makes me wonder if you are involved in its development.

Well, I was working for Novodex when we got acquired so yes, I'm involved in the development of PhysX (that's why I can profile the BP, otherwise it's not directly exposed to users).

What I was getting at is, how do you handle adding/removing/transitioning objects between the independent SAPs efficiently.

I'm not sure what you want to hear exactly? Since each SAP doesn't contain a lot of objects, adding or removing a shape from them is not very costly. Overlapping shapes are kept in a shared hash-based pair manager with O(1) add/remove/find (and no holes

)

First what you insert in each cell do not define the algorithm what define the algorithm is that you have a array of cells wrapped by a larger cell and in each cell you have a list of objects, to go around the maintained you add bodies across boundary to each of the cell they touch.
For updates what you have is an array of fat list of self contained objects.
The fact that you have a larger grid warping smaller grids is what makes it a multiresolution grid.
The insertions of link list in each so not make the Multy SAP. You tell me where did I go wrong?

I'm not sure to follow you. All I'm saying is that "multiresolution grid" is a well-known data-structure, e.g. described in Christer Ericson's book (he calls them "hierarchical grids"). And that is not what I am using. Multi grids do not use the SAP algorithm, for example.

About the Collada think...

I will skip this one if you don't mind. I don't see what Collada has to do in this thread.

You just have to give a description of what those functions need to do.
udword CreateBulletVolume(const AABB& box, int id);
void UpdateBulletVolume(udword handle, const AABB& box);
udword GetBulletNbPairs();

Well that's pretty simple:
- the first one creates an AABB in the SAP, associated to a user object (the id). It returns a handle to the internal SAP object that has been created.
- the second one updates the AABB during runtime. Parameters are the ney AABB and the handle previously returned by the first function.
- last function simply returns the current number of overlapping pairs.

To make that a useful API in Newton you would also need a function to retrieve overlapping pairs. You can return either all current pairs, or you can have callbacks to return the "new" and "deleted" pairs (both versions are useful).

- Pierre