High level parallelism with state buffering

Kafu
Posts: 8
Joined: Sat Jun 21, 2008 1:51 pm

High level parallelism with state buffering

Post by Kafu »

Hi,

I need to implement a double buffering of physics world and I'm searching for some hints. I know that Bullet is multi-threading capable (and thus, I deduce, thread-safe), but I need it from a more high-level point of view.

This is what I want to obtain while in frame i-th:
  • A thread, using data from State(i), executes stepSimulation() to compute State(i+1)
  • Parallel threads can queries, without need of synchronization, rayTest() using only data from State(i)
To avoid synchronizations, rayTest implementation must be thread safe. Is it?

Back to the problem, the conceptually simplest solution is to have two separate worlds: a btDynamicsWorld that compute new states and a btCollisionWorld used for queries. On frame change I must update/recreate the btCollisionWorld with new data.

I don't know the internals of Bullet, but if exists a way to make internally this buffering it will remove the need to re-create a btCollisionWorld each frame. Is this possible, maybe playing with classes inheritance?


Regards,
Dani
beaugard
Posts: 8
Joined: Sat Feb 16, 2008 5:12 pm

Re: High level parallelism with state buffering

Post by beaugard »

I implemented a similar high-level parallelization in my project. Rather than duplicating the whole collision world I just duplicate position/orientation of active bodies (duplicating the whole world seems wasteful). All calls to the "physics thread" are through functors, so execution is deferred to once per physics cycle. I also have some very simple communication between threads through shared objects with interlockedExchage* for thread safety. This solution works very well and was quite simple to implement.

The question is whether you really need those raycasts immediately or if you can stand deferred execution plus the overhead of using functors. Maybe the parts of your program that perform the raycasts should be migrated to the physics thread? Or is the whole point of parallelizing to be able to perform millions of raycasts (like a raytracer)?
Kafu
Posts: 8
Joined: Sat Jun 21, 2008 1:51 pm

Re: High level parallelism with state buffering

Post by Kafu »

beaugard wrote:I implemented a similar high-level parallelization in my project. Rather than duplicating the whole collision world I just duplicate position/orientation of active bodies (duplicating the whole world seems wasteful).
I don't understand how do you use those duplicated data. To call rayTest() you need a btCollisionWorld (or derived) filled with the duplicated data. Or I'm wrong?
beaugard wrote:All calls to the "physics thread" are through functors, so execution is deferred to once per physics cycle. I also have some very simple communication between threads through shared objects with interlockedExchage* for thread safety. This solution works very well and was quite simple to implement.

The question is whether you really need those raycasts immediately or if you can stand deferred execution plus the overhead of using functors. Maybe the parts of your program that perform the raycasts should be migrated to the physics thread? Or is the whole point of parallelizing to be able to perform millions of raycasts (like a raytracer)?
The execution model that I'm using is based on the fact that every object that is updating needs to see other objects coherently, i.e. in frame i only State(i) is accessible. If I allow (also with synchronization) to query the physics world while it is updating, the above constraint would be violated. This have various benefits (as to be able to do immediate raycasting without synchronization) and of course some overhead (as the double buffering).
beaugard
Posts: 8
Joined: Sat Feb 16, 2008 5:12 pm

Re: High level parallelism with state buffering

Post by beaugard »

I never use the duplicated data for raytests. Only to update my graphics.

With this method there is a lag of one frame between initializing a query to receiving the result (at least the way I implement it - if you empty the message queue more than one time in a frame and physics is running at a higher Hz the result might be faster). There is also no guarantee that the result will exactly correspond to the frame that is currently drawn, but usually it will. If it is an absolute requirement then the method doesn't suit you. Note, however, that Bullet usually gives you interpolated position/orientation (if you are using motionstates) which probably means rayqueries will never be exactly accurate except for once per substep.
The execution model that I'm using is based on the fact that every object that is updating needs to see other objects coherently, i.e. in frame i only State(i) is accessible.
With such a requirement, are you absolutely sure that these things should be part of your graphics system and not your physics system?
Kafu
Posts: 8
Joined: Sat Jun 21, 2008 1:51 pm

Re: High level parallelism with state buffering

Post by Kafu »

beaugard wrote:I never use the duplicated data for raytests. Only to update my graphics.
The execution model that I'm using is based on the fact that every object that is updating needs to see other objects coherently, i.e. in frame i only State(i) is accessible.
With such a requirement, are you absolutely sure that these things should be part of your graphics system and not your physics system?
Maybe we have made some confusion: I've never spoke about graphics. All what I said is about physics: while physics updates, the physics system must be able to supply some services (as raycasting - meaning ray-testing or ray-querying).
beaugard wrote:With this method there is a lag of one frame between initializing a query to receiving the result (at least the way I implement it - if you empty the message queue more than one time in a frame and physics is running at a higher Hz the result might be faster). There is also no guarantee that the result will exactly correspond to the frame that is currently drawn, but usually it will. If it is an absolute requirement then the method doesn't suit you. Note, however, that Bullet usually gives you interpolated position/orientation (if you are using motionstates) which probably means rayqueries will never be exactly accurate except for once per substep.
Keeping a separate world would solve these issue (modulo one frame lag). What I would know is if exists a more efficient way than keeping two different instances, maybe hacking somewhere (btMotionState? Some method overriding?).
beaugard
Posts: 8
Joined: Sat Feb 16, 2008 5:12 pm

Re: High level parallelism with state buffering

Post by beaugard »

Ok, you were talking about frames so I assumed the graphics part. In the end it doesn't matter - my point was that whatever needs the whole collision world should maybe go into the physics system.

Anyway, I would guess that the data has to be duplicated, but you could certainly get away with updating only the non-sleeping parts. I do this by keeping my own list of active objects that I update after every step (I ensure thread-safety with a mutex that locks the whole list, but only for bulk-copying so it doesn't hurt performance).

IIRC I simply subclassed btRigidBody and implemented a custom setActivationState that adds removes the pointer from my "active bodies" list. Maybe I had to add the rigidbody to the list upon inserting into the world, too.
You could try just implementing your own motionstate that is responsible for synchronizing, I think they are all synchronized in one go at the end of a step (so you can control thread safety without locking for every motionstate update individually).
Kafu
Posts: 8
Joined: Sat Jun 21, 2008 1:51 pm

Re: High level parallelism with state buffering

Post by Kafu »

beaugard wrote:Anyway, I would guess that the data has to be duplicated, but you could certainly get away with updating only the non-sleeping parts. I do this by keeping my own list of active objects that I update after every step (I ensure thread-safety with a mutex that locks the whole list, but only for bulk-copying so it doesn't hurt performance).

IIRC I simply subclassed btRigidBody and implemented a custom setActivationState that adds removes the pointer from my "active bodies" list. Maybe I had to add the rigidbody to the list upon inserting into the world, too.
This is a useful suggestion to avoid a full world recreation and limits the updatable objects, thanks.
beaugard wrote:You could try just implementing your own motionstate that is responsible for synchronizing, I think they are all synchronized in one go at the end of a step (so you can control thread safety without locking for every motionstate update individually).
From generated Doxygen call tracking, setWorldTransform() seems to be called only from synchronizeMotionStates(). Unfortunately synchronizeMotionStates() is called for each simulation sub-step, so updating here the buffers will invalidates my constraints. getWorldTransform() instead is called only in saveKinematicState(), executed only once before any sub-step. Someone know why synchronizeMotionStates() is called so often? Is safe to remove it from the sub-step loop and leaving a single call at the end of the loop? (P.S. There is also a little bug/inefficiency: if numSimulationSubSteps is not 0, last synchronizeMotionStates() will be called twice).

Anyway synchronizing over last synchronizeMotionStates() and using setWorldTransform() to write the new values would allow a very efficient buffering; but I think that Bullet uses getInterpolationWorldTransform() to retrieve shapes transform during ray-testing, invalidating my constraint.
sparkprime
Posts: 508
Joined: Fri May 30, 2008 2:51 am
Location: Ossining, New York

Re: High level parallelism with state buffering

Post by sparkprime »

Kafu wrote: From generated Doxygen call tracking, setWorldTransform() seems to be called only from synchronizeMotionStates(). Unfortunately synchronizeMotionStates() is called for each simulation sub-step, so updating here the buffers will invalidates my constraints. getWorldTransform() instead is called only in saveKinematicState(), executed only once before any sub-step. Someone know why synchronizeMotionStates() is called so often? Is safe to remove it from the sub-step loop and leaving a single call at the end of the loop? (P.S. There is also a little bug/inefficiency: if numSimulationSubSteps is not 0, last synchronizeMotionStates() will be called twice).
See second part of http://code.google.com/p/bullet/issues/detail?id=73

I disabled the inner loop update locally because it was reducing my framerate by a factor of 3 :)
Kafu
Posts: 8
Joined: Sat Jun 21, 2008 1:51 pm

Re: High level parallelism with state buffering

Post by Kafu »

sparkprime wrote:See second part of http://code.google.com/p/bullet/issues/detail?id=73
Thanks for the reference. So the only problem seems in btRaycastVehicle...maybe I'm wrong, but why don't it uses directly getInterpolationWorldTransform()?
sparkprime wrote:I disabled the inner loop update locally because it was reducing my framerate by a factor of 3 :)
For sure, in my system a call to setWorldTransform() could fire a lot of code.
sparkprime
Posts: 508
Joined: Fri May 30, 2008 2:51 am
Location: Ossining, New York

Re: High level parallelism with state buffering

Post by sparkprime »

My vehicles (implemented completely independently of bullet's vehicles) use the actual positions of the bodies as a base from which to cast the rays, not the "interpolated" positions. It is extremely stable, I have beaten the crap out of these vehicles and they just bounce back again. So I suspect bullet could be changed to do this also, and remove this additional dependency.