Bullet's Voronoi Simplex Solver
 John McCutchan
 Posts: 133
 Joined: Wed Jul 27, 2005 1:05 pm
 Location: Berkeley, CA
 Contact:
Bullet's Voronoi Simplex Solver
I read the paper on using voronoi regions to replace johnsons distance algorithm in GJK. The paper didn't mention anything about using barycentric coordinates. So two questions, what exactly are barycentric coordinates? Why are you using them, what are you trying to solve? I see you use them when computing the closest point, but in Ericsons paper he says that an orghogonal projection of the origin onto the closest feature is all that is needed to compute the point.

 Physics Researcher
 Posts: 23
 Joined: Mon Jun 27, 2005 9:28 am
 Location: Helmond, Netherlands
 Contact:
Re: Bullet's Voronoi Simplex Solver
Barycentric coordinates: each point in a simplex can be described as the weighted sum of the simplex's vertices. The barycentric coordinates are the weights. The weights summate to 1 and are nonnegative for points on the simplex.jmc wrote:I read the paper on using voronoi regions to replace johnsons distance algorithm in GJK. The paper didn't mention anything about using barycentric coordinates. So two questions, what exactly are barycentric coordinates? Why are you using them, what are you trying to solve? I see you use them when computing the closest point, but in Ericsons paper he says that an orghogonal projection of the origin onto the closest feature is all that is needed to compute the point.
Johnson's algorithm computes the barycentric coordinates of the point closest to the origin of a simplex. Ericsson's suggested alternative to Johnson does not compute the barycentric coordinates IIRC. Ericsson claims that since his approach is more intuitive it is easier to fix robustness problems. I agree on the intuitive part, but I'm a bit less convinced on the robustness part. In fact, since Ericsson's approach separates the subsimplex selection from the closest point computation it is a source of a new set of robustness problems: closest points not necessarily are contained by the selected subsimplex.
So my advice would be to just bite the bullet (no pun) and implement Johnson, since it is the fastests and most accurate solution around.
Gino
 John McCutchan
 Posts: 133
 Joined: Wed Jul 27, 2005 1:05 pm
 Location: Berkeley, CA
 Contact:
Re: Bullet's Voronoi Simplex Solver
The paper makes no mention of barycentric coordinates. What Ericson suggests, is to use voronoi regions to determine what subsimplex the origin is in. Then orthogonally project the origin on to this new subsimplex. There doesn't seem to be a seperation between finding the subsimplex and calculating the closest point. But, maybe I am missing something...gino wrote:Barycentric coordinates: each point in a simplex can be described as the weighted sum of the simplex's vertices. The barycentric coordinates are the weights. The weights summate to 1 and are nonnegative for points on the simplex.jmc wrote:I read the paper on using voronoi regions to replace johnsons distance algorithm in GJK. The paper didn't mention anything about using barycentric coordinates. So two questions, what exactly are barycentric coordinates? Why are you using them, what are you trying to solve? I see you use them when computing the closest point, but in Ericsons paper he says that an orghogonal projection of the origin onto the closest feature is all that is needed to compute the point.
Johnson's algorithm computes the barycentric coordinates of the point closest to the origin of a simplex. Ericsson's suggested alternative to Johnson does not compute the barycentric coordinates IIRC. Ericsson claims that since his approach is more intuitive it is easier to fix robustness problems. I agree on the intuitive part, but I'm a bit less convinced on the robustness part. In fact, since Ericsson's approach separates the subsimplex selection from the closest point computation it is a source of a new set of robustness problems: closest points not necessarily are contained by the selected subsimplex.
I've decided to do just that. I've read the same 10 pages of your book, too many times. I'm finally starting to understand how to implement it.gino wrote:So my advice would be to just bite the bullet (no pun) and implement Johnson, since it is the fastests and most accurate solution around.
John McCutchan
 Erwin Coumans
 Site Admin
 Posts: 4232
 Joined: Sun Jun 26, 2005 6:43 pm
 Location: California, USA
 Contact:
Re: Bullet's Voronoi Simplex Solver
andgino wrote: Johnson's algorithm computes the barycentric coordinates of the point closest to the origin of a simplex. Ericsson's suggested alternative to Johnson does not compute the barycentric coordinates IIRC.
Christer Ericson's voronoi method effectively calculates the barycentric coordinates: the result of the projection ARE the barycentric coordinates.jmc wrote: The paper makes no mention of barycentric coordinates. What Ericson suggests, is to use voronoi regions to determine what subsimplex the origin is in. Then orthogonally project the origin on to this new subsimplex. There doesn't seem to be a seperation between finding the subsimplex and calculating the closest point. But, maybe I am missing something...
See line 365 in the Bullet VoronoiSimplexSolver code snippet here:
Code: Select all
00356 // Check if P in edge region of AB, if so return projection of P onto AB
00357 float vc = d1*d4  d3*d2;
00358 if (vc <= 0.0f && d1 >= 0.0f && d3 <= 0.0f) {
00359 float v = d1 / (d1  d3);
00360 result.m_closestPointOnSimplex = a + v * ab;
00361 result.m_usedVertices.usedVertexA = true;
00362 result.m_usedVertices.usedVertexB = true;
00363 result.SetBarycentricCoordinates(1v,v,0);
00364 return true;
00365 //return a + v * ab; // barycentric coordinates (1v,v,0)
00366 }
Here is the Solid35 Johnson Simplex Solver, and a Combined Simplex Solver, compatible with Bullet framework, and a modified Bullet demo file. The combined simplex solver allows switching between Bullet and Solid35 simplex solvers during the simulation (on the fly). 'ExtrasSolid35ForBullet' also has the Solid35 EPA penetration algorithm, with a robustness fix so it actually works with stacking. In other words, Bullet addresses all concerns of Pierre Terdiman in this post:
http://q12.org/pipermail/ode/2005April/015738.html
All files are made compatible so it fits the bullet framework. Notice that the license is different, Solid35 is not free for commercial use.
http://www.continuousphysics.com/ftp/pu ... Bullet.zip
http://www.continuousphysics.com/ftp/pu ... ceDemo.cpp
As a last note, I have seen no robustness problems in practice with the voronoi solver. Also performancewise there is no real difference. All the memory used for caching determinants is probably not a win on modern consoles, like Playstation 3 or XBox 2, because processor power (recalculating from scratch) outperforms fetching from memory... memory doesn't keep up with faster processors nowadays !
Erwin
Last edited by Erwin Coumans on Sun Aug 14, 2005 7:01 pm, edited 1 time in total.
 John McCutchan
 Posts: 133
 Joined: Wed Jul 27, 2005 1:05 pm
 Location: Berkeley, CA
 Contact:
Alright, so I've been able to find information on barycentric coordinates for a triangle, but what about a tetrahedron & a line segment?Erwin Coumans wrote:Christer Ericson's voronoi method effectively calculates the barycentric coordinates: the result of the projection ARE the barycentric coordinates.
I'm guessing but for a line segment it looks like this:
Find where the origin projects onto the line, describe that as a linear combination of the start and end points of the line.
Then the barycentric coordinates are: (1alpha, alpha) right?
Is that correct? And what about the tetrahedron case?
What is this robustness fix? I thought that solid's HybridPenetrationDepth was robust?Erwin Coumans wrote:'ExtrasSolid35ForBullet' also has the Solid35 EPA penetration algorithm, with a robustness fix so it actually works with stacking. In other words, Bullet addresses all concerns of Pierre Terdiman in this post:
http://q12.org/pipermail/ode/2005April/015738.html
I can't believe this without proof. Where is your justification? The cache used by solid should fit within a cpu cache page, so we are talking about a single fetch from memory into the cpu's cache. Fetching from the cpu cache should out perform recalculating all the time.Erwin Coumans wrote:Also performancewise there is no real difference. All the memory used for caching determinants is probably not a win on modern consoles, like Playstation 3 or XBox 2, because processor power (recalculating from scratch) outperforms fetching from memory... memory doesn't keep up with faster processors nowadays !
 Erwin Coumans
 Site Admin
 Posts: 4232
 Joined: Sun Jun 26, 2005 6:43 pm
 Location: California, USA
 Contact:
All correct. GJK needs separation in order to work. If the point is outside the tetrahedron, it is only closest to 1 triangle.jmc wrote: Then the barycentric coordinates are: (1alpha, alpha) right?
Is that correct? And what about the tetrahedron case?
If the point (origin) is inside the tetrahedron, a 'full' simplex (4 points) is detected, and GJK will terminate (possibly continue to the penetration depth if required). So you don't need to compute the coordinates for the tetrahedron case.
Did you read the link http://q12.org/pipermail/ode/2005April/015738.html ?What is this robustness fix? I thought that solid's HybridPenetrationDepth was robust?
It's probably better to discuss the details offline. Pierre is on the forum too, you can send him a privatemessage ?
For me, I experienced problems with invalid normals (even with hybrid penetration depth, and also the epa failed when the determinant was close to zero. This was not detected, and this was a bad failing case. I fixed it by detecting the case, and handling it in the outside loop.
Code: Select all
bool Facet::computeClosest(const SimdVector3 *verts)
{
...
m_det = v1dv1 * v2dv2  v1dv2 * v1dv2; // nonnegative
//printf("m_det = %f\n",m_det);
//ASSERT(m_det >= 0.f);
if (m_det >= (SIMD_EPSILON*SIMD_EPSILON)) {
...
return true;
}
return false;
}
Well, I work in Sony on PS3, but I'm not allowed to give you any detailserwin wrote: recalculating from scratch) outperforms fetching from memory...jmc wrote: Where is your justification?
L2 cache miss, or main memory cache miss on nextgen consoles is worse then on current generation consoles or Intel x86. SPU doesn't have a cache, but 256k local storage. If you want to deal with lots of collision pairs, it adds up.
Even more important, it is very unlikely that you will need the cached determinants if you cache the seperating axis: this will allow you to avoid performing the narrow phase all together. If you have access to the seperating vector (and distance), and project the relative motion on this axis, similar to the continuous collision detection in my paper, you can detect that the distance is still 'positive'. If so, there is no reason for running GJK at all.
Bullet is not optimized yet, but I will implement this seperating axis caching soon.

 Posts: 25
 Joined: Mon Jun 27, 2005 4:30 pm
 Location: Los Angeles
 Contact:
Re: Bullet's Voronoi Simplex Solver
That sounds a bit weird because my code is for all intents and purposes identical to Johnson's own implementation, which is actually very different from yours. Johnson himself didn't use the "recursive" formulation presented in the GJK paper and that you are using; like myself, his implementation "unrolled" it into what I labelled the geometric approach.gino wrote:Johnson's algorithm computes the barycentric coordinates of the point closest to the origin of a simplex. Ericsson's suggested alternative to Johnson does not compute the barycentric coordinates IIRC. Ericsson claims that since his approach is more intuitive it is easier to fix robustness problems. I agree on the intuitive part, but I'm a bit less convinced on the robustness part. In fact, since Ericsson's approach separates the subsimplex selection from the closest point computation it is a source of a new set of robustness problems: closest points not necessarily are contained by the selected subsimplex.
So my advice would be to just bite the bullet (no pun) and implement Johnson, since it is the fastests and most accurate solution around.
Also, in both my and Johnson's implementations the barycentric coordinates are computed as part of determining if the query point projects orthogonally onto the feature.

 Posts: 25
 Joined: Mon Jun 27, 2005 4:30 pm
 Location: Los Angeles
 Contact:
Just to corroborate Erwin's statement... On the nextgeneration consoles you can assume an L2 cache miss will cost on the order of 500 cycles (roughly  I'm being a bit deliberately vague here). You can do a lot of (re)computation in ~500 cycles! Caching computations is not the way to go, going forwards (and, arguably, not even today).jmc wrote:I can't believe this without proof. Where is your justification? The cache used by solid should fit within a cpu cache page, so we are talking about a single fetch from memory into the cpu's cache. Fetching from the cpu cache should out perform recalculating all the time.Erwin Coumans wrote:All the memory used for caching determinants is probably not a win on modern consoles, like Playstation 3 or XBox 2, because processor power (recalculating from scratch) outperforms fetching from memory... memory doesn't keep up with faster processors nowadays !

 Physics Researcher
 Posts: 23
 Joined: Mon Jun 27, 2005 9:28 am
 Location: Helmond, Netherlands
 Contact:
Re: Bullet's Voronoi Simplex Solver
I do not have your book here (shame on me), but I recall that your version first selects the proper subsimplex (through Voronoi plane tests) and computes the closest points by projection. I understand from your post that this "projection" involves solving the wellknown linear system for finding a vector orthogonal to the subsimplex's affine hull, and the barycentric coordinates are used for the Voronoi test as well. So, as I make it, the only thing different from your approach with respect to the original GJK approach is the fact that you do not reuse computations from lowerdimensional subsimplices. Am I right?Christer Ericson wrote:That sounds a bit weird because my code is for all intents and purposes identical to Johnson's own implementation, which is actually very different from yours. Johnson himself didn't use the "recursive" formulation presented in the GJK paper and that you are using; like myself, his implementation "unrolled" it into what I labelled the geometric approach.gino wrote:Johnson's algorithm computes the barycentric coordinates of the point closest to the origin of a simplex. Ericsson's suggested alternative to Johnson does not compute the barycentric coordinates IIRC. Ericsson claims that since his approach is more intuitive it is easier to fix robustness problems. I agree on the intuitive part, but I'm a bit less convinced on the robustness part. In fact, since Ericsson's approach separates the subsimplex selection from the closest point computation it is a source of a new set of robustness problems: closest points not necessarily are contained by the selected subsimplex.
So my advice would be to just bite the bullet (no pun) and implement Johnson, since it is the fastests and most accurate solution around.
Also, in both my and Johnson's implementations the barycentric coordinates are computed as part of determining if the query point projects orthogonally onto the feature.
The reason you and Erwin give why earlier computations are not reused is the fact that retrieving cached computations may incur memory cache misses. In the SOLID 3.5 implementation the cached determinants take up 256 bytes (using singleprecision), so it shouldn't be to hard to keep that in cache, if it weren't for the support computations that can mess up the data cache. With some effort the determinant cache size can be reduced to 96 bytes at a slightly higher retrieval cost, but anyhow the point stands that support computations can kick around wildly.
However, if we forget about saving determinant values for future iterations, it still makes sense to save them within a single iteration to compute the barycentric coordinates for the higherdimensional subsimplices, so I (and probably also Dan Johnson) would still go for a recursive determinant computation.

 Posts: 25
 Joined: Mon Jun 27, 2005 4:30 pm
 Location: Los Angeles
 Contact:
Re: Bullet's Voronoi Simplex Solver
Sort of. Conceptually, yes. Practically, however, the calculations for determining the proper subsimplex has already computed the terms needed for the determination of barycentric coordinates on the feature. Also, note that it is a proper orthogonal projection (and not a "projection") because once you have determined the Voronoi feature region, the query point will project orthogonally onto it.gino wrote:I do not have your book here (shame on me), but I recall that your version first selects the proper subsimplex (through Voronoi plane tests) and computes the closest points by projection. I understand from your post that this "projection" involves solving the wellknown linear system for finding a vector orthogonal to the subsimplex's affine hull, and the barycentric coordinates are used for the Voronoi test as well.
As I said before: my approach is the original GJK approach. I'm not sure why you're thinking otherwise. It's not the recursive formulation given in the paper, but it's the unrolled version as used in Daniel Johnson's own implementation. I recommend studying his implementation. I reference it in my book, but it seems the link I gave has gone stale. Erwin has a copy here:So, as I make it, the only thing different from your approach with respect to the original GJK approach is the fact that you do not reuse computations from lowerdimensional subsimplices. Am I right?
http://www.erwincoumans.com/p/gilbert.c
(Technically this is a C translation of the original code, done by Diego Ruspini, which is nicer anyway in that it's more readable than the original FORTRAN code.)
The recursive and nonrecursive approaches are for all intents and purposes identical, primarily differing on what order the subsimplicies are searched, but even that can be made the same. Again I recommend you read the code and make the comparison. Earlier computations are reused in both my code and Johnson's code. Johnson stores his to memory (in arrays del, d1, d2, etc). I keep them in registers, to avoid huge penalties in accessing potentially uncached data (penalties large enough to cover the whole GJK computation cost for a single cache line miss).
You should probably familiarize yourself with his code before making definitive statements about it! :) His code is fully unrolled (nonrecursive and noniterative). He does reuse previous determinant expressions, but this is basically equivalent to common subexpression elimination. Also, that code was written in the "precache" era where computation was expensive (multiplications in particular) so that was a natural way of writing things. These days, it's not.so I (and probably also Dan Johnson) would still go for a recursive determinant computation.
Anyway, the two approaches are the same. Math is math after all. The key difference is really that the unrolled version makes it explicit and obvious which subsimplex is being tested. The recursive formulation doesn't really (although you could obviously document it to make it clear). That's why I like the "geometric" formulation: it is more readable, and therefore more easy to argue about and to debug. The geometric view you can describe to someone and they'll understand it first try. The recursive formulation you can describe 10 times and people still don't get it.

 Physics Researcher
 Posts: 23
 Joined: Mon Jun 27, 2005 9:28 am
 Location: Helmond, Netherlands
 Contact:
Re: Bullet's Voronoi Simplex Solver
I've seen this code. It does not cache subdeterminants over iterations but it *does* use the recursive determinant formula from the GJK paper. This does not mean that the code itself is recursive (and neither is mine). Furthermore, I cannot find this code more "geometric" than code in which there is no case distinction for the number of vertices in Y. It boils down to solving the same linear system for different dimensions. The code has at least five times the number of lines as my (partly unrolled) implementation so, draw your conclusions on readability and debugability. And then there is also the risk of incuring code cache misses. (Better stop this before we end up boasting the size of our "Johnson" )Christer Ericson wrote:Sort of. Conceptually, yes. Practically, however, the calculations for determining the proper subsimplex has already computed the terms needed for the determination of barycentric coordinates on the feature. Also, note that it is a proper orthogonal projection (and not a "projection") because once you have determined the Voronoi feature region, the query point will project orthogonally onto it.gino wrote:I do not have your book here (shame on me), but I recall that your version first selects the proper subsimplex (through Voronoi plane tests) and computes the closest points by projection. I understand from your post that this "projection" involves solving the wellknown linear system for finding a vector orthogonal to the subsimplex's affine hull, and the barycentric coordinates are used for the Voronoi test as well.As I said before: my approach is the original GJK approach. I'm not sure why you're thinking otherwise. It's not the recursive formulation given in the paper, but it's the unrolled version as used in Daniel Johnson's own implementation. I recommend studying his implementation. I reference it in my book, but it seems the link I gave has gone stale. Erwin has a copy here:So, as I make it, the only thing different from your approach with respect to the original GJK approach is the fact that you do not reuse computations from lowerdimensional subsimplices. Am I right?
http://www.erwincoumans.com/p/gilbert.c
(Technically this is a C translation of the original code, done by Diego Ruspini, which is nicer anyway in that it's more readable than the original FORTRAN code.)
The recursive and nonrecursive approaches are for all intents and purposes identical, primarily differing on what order the subsimplicies are searched, but even that can be made the same. Again I recommend you read the code and make the comparison. Earlier computations are reused in both my code and Johnson's code. Johnson stores his to memory (in arrays del, d1, d2, etc). I keep them in registers, to avoid huge penalties in accessing potentially uncached data (penalties large enough to cover the whole GJK computation cost for a single cache line miss).
You should probably familiarize yourself with his code before making definitive statements about it! His code is fully unrolled (nonrecursive and noniterative). He does reuse previous determinant expressions, but this is basically equivalent to common subexpression elimination. Also, that code was written in the "precache" era where computation was expensive (multiplications in particular) so that was a natural way of writing things. These days, it's not.so I (and probably also Dan Johnson) would still go for a recursive determinant computation.
Anyway, the two approaches are the same. Math is math after all. The key difference is really that the unrolled version makes it explicit and obvious which subsimplex is being tested. The recursive formulation doesn't really (although you could obviously document it to make it clear). That's why I like the "geometric" formulation: it is more readable, and therefore more easy to argue about and to debug. The geometric view you can describe to someone and they'll understand it first try. The recursive formulation you can describe 10 times and people still don't get it.
Math is math, but on a machine A dot B  A dot C is not the same as A dot (B  C). A "geometric" view does not help a lot in improving accuracy. In the given code it would be incredibly hard to fix numerical issues. Something to keep in mind as well.
I love these absolute statements.Christer Ericson wrote:Just to corroborate Erwin's statement... On the nextgeneration consoles you can assume an L2 cache miss will cost on the order of 500 cycles (roughly  I'm being a bit deliberately vague here). You can do a lot of (re)computation in ~500 cycles! Caching computations is not the way to go, going forwards (and, arguably, not even today).
Does this means that reusing intermediate values is not longer an optimization compiler will be doing anymore because risk of cach miss?

 Posts: 25
 Joined: Mon Jun 27, 2005 4:30 pm
 Location: Los Angeles
 Contact:
My statement is a generalization obviously, but it's an important rule of thumb. In certain situations you will have full knowledge of what the caching situation is, and will be able to act accordingly (but in these multithreaded environments things are quickly becoming unpredictable). Note that I'm only refering to caching data/calculations to memory btw (and situations where there's a risk of suffering L2 cache misses). Reusing computations already existing in registers is fine of course, which is what your typical compiler does. That is, your comment about compilers reusing intermediate values is really a different issue that is only related in the case the compiler spills registers to memory. Anyway, the answer to the question is no, because AFAIK none of today's compilers are taking possible L2 cache misses into account when optimizing/scheduling code.Etherton wrote:I love these absolute statements.Christer Ericson wrote:Just to corroborate Erwin's statement... On the nextgeneration consoles you can assume an L2 cache miss will cost on the order of 500 cycles (roughly  I'm being a bit deliberately vague here). You can do a lot of (re)computation in ~500 cycles! Caching computations is not the way to go, going forwards (and, arguably, not even today).
Does this means that reusing intermediate values is not longer an optimization compiler will be doing anymore because risk of cach miss?
If in these multithreaded environment things are so unpredictable what make you think your solutions are superior?Christer Ericson wrote: My statement is a generalization obviously, but it's an important rule of thumb. In certain situations you will have full knowledge of what the caching situation is, and will be able to act accordingly (but in these multithreaded environments things are quickly becoming unpredictable). Note that I'm only refering to caching data/calculations to memory btw (and situations where there's a risk of suffering L2 cache misses)?..
My second sentence was made out of irony.
The point is that you two are discussing about who rewrote JGK or Cameron function better. The way I see it, all you did was unrolling a function and for some reason you believe that will make it better. You seem to jump around a lot, first you bring the robustness argument and when your claim is challenged you quickly change it to the efficiency of implementation argument. Both arguments are weak since they are not supported by a rigorous proof.
Cache is strictly a hardware architectural issue. Any body can argue that what you did is slower because what you save in memory misses you loose in instruction and branch prediction misses which are equally if not more severe with these new deep pipe lines.
 Erwin Coumans
 Site Admin
 Posts: 4232
 Joined: Sun Jun 26, 2005 6:43 pm
 Location: California, USA
 Contact:
On a Playstation 3 it is fairly well predictable, each SPU has its own local storage. I don't think filling this with cached data between frames helps. Between the gjk iterations its better to keep the data in registers, 128 simd registers is more then enough.If in these multithreaded environment things are so unpredictable what make you think your solutions are superior?
Wait, there is more then 2 people. I can confirm it is more intuitive, just compare bullet voronoi simplex solver with solid one. Bullet is not Christer Ericson implementation, it has bits, but I added the rest. It is not the optimized target platform code, obviously.My second sentence was made out of irony.
The point is that you two are discussing about who rewrote JGK or Cameron function better.
I did performance and robustness comparisons, and they are not much different. Just run the latest bullet demo if you like. Earlier in this threat I posted the Solid 3.5 integration files to compare it yourself
Code size between bullet simplex solver and solid simplex solver is similar too, I think that bullet simplex solver code is actually smaller, but I have to verify.
So in the end, not much difference:
(Bullet versus Solid 3.5 simplex solver)
0) the geometric version is more intuitive
1) robustness, code size and performance is similar
1) caching can hurt, keeping data in registers is better
2) between frames: its better to cache the separating vector, and avoid gjk all toghether, instead of caching the determinants
Erwin