Page 1 of 1

Performance issue adding objects when using btDbvtBroadphase

Posted: Sat Aug 09, 2008 9:21 pm
by reltham
My project is a streaming world setup using btDbvtBroadphase, and collision only not dynamics.

Whenever I stream in new objects and insert them to the collision system I get a noticable hitch (sometimes as long as a quarter to half second). The objects come in groups of a few dozen at a time.

A profile shows it spending a significant amount of time in the Proximity() function in btDbvt.h. Which is doing some simple add/subtracts and then 3 btFabs() which are just going into fabs(). Shouldn't this be using SSE? in any case fabs() is terribly slow and on the x86 architecture you can abs a float by clearing the sign bit (which I temporarily did in Proximity() and it took it almost completely out of the profile).

Anyway it also spends a decent amount of time in the various other parts of btDbvt during the insert.

Is there something I can do to make this faster? It is scheduled to be optimized?

Roy

Re: Performance issue adding objects when using btDbvtBroadphase

Posted: Sun Aug 10, 2008 4:54 am
by Nathanael
reltham wrote:Whenever I stream in new objects and insert them to the collision system I get a noticable hitch (sometimes as long as a quarter to half second). The objects come in groups of a few dozen at a time.
half a second!!? that's eternity, i know peoples that use dbvt to spawn particles in their world, we are talking of ~1k insert/remove batches, so there's definitely something wrong here.
reltham wrote:A profile shows it spending a significant amount of time in the Proximity() function in btDbvt.h. Which is doing some simple add/subtracts and then 3 btFabs() which are just going into fabs(). Shouldn't this be using SSE? in any case fabs() is terribly slow and on the x86 architecture you can abs a float by clearing the sign bit (which I temporarily did in Proximity() and it took it almost completely out of the profile).
fabs may be slow depending your standard libs implementation, but the fabs fpu instruction itself is not.
i attached a version of btDbvt with the following modifications:
- benchmark code enable (can you call btDbvt::benchmark and post the console output?).
- new benchmark for insert/remove of batches.
- new benchmark for Proximity.
- using force_inline on critical methods, not to rely to much on compiler predicates.
- alternatives implementation of 'Proximity' , generic(current), fpu(default on win32) ans sse, can be selected via DBVT_PROXIMITY_IMPL.

make sure you compile in single precision.

Nathanael.

Re: Performance issue adding objects when using btDbvtBroadphase

Posted: Mon Aug 11, 2008 12:55 am
by reltham
I will try out this file soon, but I wanted to let you know that my previous time estimation might have been high. I had some other bugs in the code that influenced everything.

Once I have things stable I will use this file to check the performance.

Thanks!

Roy