## Bullet's Voronoi Simplex Solver

Please don't post Bullet support questions here, use the above forums instead.
John McCutchan
Posts: 133
Joined: Wed Jul 27, 2005 1:05 pm
Location: Berkeley, CA
Contact:

### Bullet's Voronoi Simplex Solver

I read the paper on using voronoi regions to replace johnsons distance algorithm in GJK. The paper didn't mention anything about using barycentric coordinates. So two questions, what exactly are barycentric coordinates? Why are you using them, what are you trying to solve? I see you use them when computing the closest point, but in Ericsons paper he says that an orghogonal projection of the origin onto the closest feature is all that is needed to compute the point.
gino
Physics Researcher
Posts: 23
Joined: Mon Jun 27, 2005 9:28 am
Location: Helmond, Netherlands
Contact:

### Re: Bullet's Voronoi Simplex Solver

jmc wrote:I read the paper on using voronoi regions to replace johnsons distance algorithm in GJK. The paper didn't mention anything about using barycentric coordinates. So two questions, what exactly are barycentric coordinates? Why are you using them, what are you trying to solve? I see you use them when computing the closest point, but in Ericsons paper he says that an orghogonal projection of the origin onto the closest feature is all that is needed to compute the point.
Barycentric coordinates: each point in a simplex can be described as the weighted sum of the simplex's vertices. The barycentric coordinates are the weights. The weights summate to 1 and are non-negative for points on the simplex.

Johnson's algorithm computes the barycentric coordinates of the point closest to the origin of a simplex. Ericsson's suggested alternative to Johnson does not compute the barycentric coordinates IIRC. Ericsson claims that since his approach is more intuitive it is easier to fix robustness problems. I agree on the intuitive part, but I'm a bit less convinced on the robustness part. In fact, since Ericsson's approach separates the subsimplex selection from the closest point computation it is a source of a new set of robustness problems: closest points not necessarily are contained by the selected subsimplex.

So my advice would be to just bite the bullet (no pun) and implement Johnson, since it is the fastests and most accurate solution around.

Gino
John McCutchan
Posts: 133
Joined: Wed Jul 27, 2005 1:05 pm
Location: Berkeley, CA
Contact:

### Re: Bullet's Voronoi Simplex Solver

gino wrote:
jmc wrote:I read the paper on using voronoi regions to replace johnsons distance algorithm in GJK. The paper didn't mention anything about using barycentric coordinates. So two questions, what exactly are barycentric coordinates? Why are you using them, what are you trying to solve? I see you use them when computing the closest point, but in Ericsons paper he says that an orghogonal projection of the origin onto the closest feature is all that is needed to compute the point.
Barycentric coordinates: each point in a simplex can be described as the weighted sum of the simplex's vertices. The barycentric coordinates are the weights. The weights summate to 1 and are non-negative for points on the simplex.

Johnson's algorithm computes the barycentric coordinates of the point closest to the origin of a simplex. Ericsson's suggested alternative to Johnson does not compute the barycentric coordinates IIRC. Ericsson claims that since his approach is more intuitive it is easier to fix robustness problems. I agree on the intuitive part, but I'm a bit less convinced on the robustness part. In fact, since Ericsson's approach separates the subsimplex selection from the closest point computation it is a source of a new set of robustness problems: closest points not necessarily are contained by the selected subsimplex.
The paper makes no mention of barycentric coordinates. What Ericson suggests, is to use voronoi regions to determine what subsimplex the origin is in. Then orthogonally project the origin on to this new sub-simplex. There doesn't seem to be a seperation between finding the subsimplex and calculating the closest point. But, maybe I am missing something...
gino wrote:So my advice would be to just bite the bullet (no pun) and implement Johnson, since it is the fastests and most accurate solution around.
I've decided to do just that. I've read the same 10 pages of your book, too many times. I'm finally starting to understand how to implement it.

John McCutchan
Erwin Coumans
Posts: 4232
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA
Contact:

### Re: Bullet's Voronoi Simplex Solver

gino wrote: Johnson's algorithm computes the barycentric coordinates of the point closest to the origin of a simplex. Ericsson's suggested alternative to Johnson does not compute the barycentric coordinates IIRC.
and
jmc wrote: The paper makes no mention of barycentric coordinates. What Ericson suggests, is to use voronoi regions to determine what subsimplex the origin is in. Then orthogonally project the origin on to this new sub-simplex. There doesn't seem to be a seperation between finding the subsimplex and calculating the closest point. But, maybe I am missing something...
Christer Ericson's voronoi method effectively calculates the barycentric coordinates: the result of the projection ARE the barycentric coordinates.

See line 365 in the Bullet VoronoiSimplexSolver code snippet here:

Code: Select all

``````00356     // Check if P in edge region of AB, if so return projection of P onto AB
00357     float vc = d1*d4 - d3*d2;
00358     if (vc <= 0.0f && d1 >= 0.0f && d3 <= 0.0f) {
00359         float v = d1 / (d1 - d3);
00360                 result.m_closestPointOnSimplex = a + v * ab;
00361                 result.m_usedVertices.usedVertexA = true;
00362                 result.m_usedVertices.usedVertexB = true;
00363                 result.SetBarycentricCoordinates(1-v,v,0);
00364                 return true;
00365         //return a + v * ab; // barycentric coordinates (1-v,v,0)
00366     }
``````
I still recommend you to compare the Johnson recursive determinant method to compute the same barycentric coordinate. That is what I did at least:

Here is the Solid35 Johnson Simplex Solver, and a Combined Simplex Solver, compatible with Bullet framework, and a modified Bullet demo file. The combined simplex solver allows switching between Bullet and Solid35 simplex solvers during the simulation (on the fly). 'ExtrasSolid35ForBullet' also has the Solid35 EPA penetration algorithm, with a robustness fix so it actually works with stacking. In other words, Bullet addresses all concerns of Pierre Terdiman in this post:
http://q12.org/pipermail/ode/2005-April/015738.html

All files are made compatible so it fits the bullet framework. Notice that the license is different, Solid35 is not free for commercial use.

http://www.continuousphysics.com/ftp/pu ... Bullet.zip
http://www.continuousphysics.com/ftp/pu ... ceDemo.cpp

As a last note, I have seen no robustness problems in practice with the voronoi solver. Also performance-wise there is no real difference. All the memory used for caching determinants is probably not a win on modern consoles, like Playstation 3 or XBox 2, because processor power (re-calculating from scratch) outperforms fetching from memory... memory doesn't keep up with faster processors nowadays !

Erwin
Last edited by Erwin Coumans on Sun Aug 14, 2005 7:01 pm, edited 1 time in total.
John McCutchan
Posts: 133
Joined: Wed Jul 27, 2005 1:05 pm
Location: Berkeley, CA
Contact:
Erwin Coumans wrote:Christer Ericson's voronoi method effectively calculates the barycentric coordinates: the result of the projection ARE the barycentric coordinates.
Alright, so I've been able to find information on barycentric coordinates for a triangle, but what about a tetrahedron & a line segment?

I'm guessing but for a line segment it looks like this:
Find where the origin projects onto the line, describe that as a linear combination of the start and end points of the line.

Then the barycentric coordinates are: (1-alpha, alpha) right?

Is that correct? And what about the tetrahedron case?
Erwin Coumans wrote:'ExtrasSolid35ForBullet' also has the Solid35 EPA penetration algorithm, with a robustness fix so it actually works with stacking. In other words, Bullet addresses all concerns of Pierre Terdiman in this post:
http://q12.org/pipermail/ode/2005-April/015738.html
What is this robustness fix? I thought that solid's HybridPenetrationDepth was robust?
Erwin Coumans wrote:Also performance-wise there is no real difference. All the memory used for caching determinants is probably not a win on modern consoles, like Playstation 3 or XBox 2, because processor power (re-calculating from scratch) outperforms fetching from memory... memory doesn't keep up with faster processors nowadays !
I can't believe this without proof. Where is your justification? The cache used by solid should fit within a cpu cache page, so we are talking about a single fetch from memory into the cpu's cache. Fetching from the cpu cache should out perform recalculating all the time.
Erwin Coumans
Posts: 4232
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA
Contact:
jmc wrote: Then the barycentric coordinates are: (1-alpha, alpha) right?
Is that correct? And what about the tetrahedron case?
All correct. GJK needs separation in order to work. If the point is outside the tetrahedron, it is only closest to 1 triangle.
If the point (origin) is inside the tetrahedron, a 'full' simplex (4 points) is detected, and GJK will terminate (possibly continue to the penetration depth if required). So you don't need to compute the coordinates for the tetrahedron case.
What is this robustness fix? I thought that solid's HybridPenetrationDepth was robust?
Did you read the link http://q12.org/pipermail/ode/2005-April/015738.html ?
It's probably better to discuss the details off-line. Pierre is on the forum too, you can send him a private-message ?

For me, I experienced problems with invalid normals (even with hybrid penetration depth, and also the epa failed when the determinant was close to zero. This was not detected, and this was a bad failing case. I fixed it by detecting the case, and handling it in the outside loop.

Code: Select all

``````bool Facet::computeClosest(const SimdVector3 *verts)
{
...
m_det = v1dv1 * v2dv2 - v1dv2 * v1dv2; // non-negative
//printf("m_det = %f\n",m_det);
//ASSERT(m_det >= 0.f);

if (m_det >= (SIMD_EPSILON*SIMD_EPSILON)) {
...
return true;
}

return false;
}
``````

erwin wrote: re-calculating from scratch) outperforms fetching from memory...
jmc wrote: Where is your justification?
Well, I work in Sony on PS3, but I'm not allowed to give you any details
L2 cache miss, or main memory cache miss on next-gen consoles is worse then on current generation consoles or Intel x86. SPU doesn't have a cache, but 256k local storage. If you want to deal with lots of collision pairs, it adds up.

Even more important, it is very unlikely that you will need the cached determinants if you cache the seperating axis: this will allow you to avoid performing the narrow phase all together. If you have access to the seperating vector (and distance), and project the relative motion on this axis, similar to the continuous collision detection in my paper, you can detect that the distance is still 'positive'. If so, there is no reason for running GJK at all.
Bullet is not optimized yet, but I will implement this seperating axis caching soon.
Christer Ericson
Posts: 25
Joined: Mon Jun 27, 2005 4:30 pm
Location: Los Angeles
Contact:

### Re: Bullet's Voronoi Simplex Solver

gino wrote:Johnson's algorithm computes the barycentric coordinates of the point closest to the origin of a simplex. Ericsson's suggested alternative to Johnson does not compute the barycentric coordinates IIRC. Ericsson claims that since his approach is more intuitive it is easier to fix robustness problems. I agree on the intuitive part, but I'm a bit less convinced on the robustness part. In fact, since Ericsson's approach separates the subsimplex selection from the closest point computation it is a source of a new set of robustness problems: closest points not necessarily are contained by the selected subsimplex.

So my advice would be to just bite the bullet (no pun) and implement Johnson, since it is the fastests and most accurate solution around.
That sounds a bit weird because my code is for all intents and purposes identical to Johnson's own implementation, which is actually very different from yours. Johnson himself didn't use the "recursive" formulation presented in the GJK paper and that you are using; like myself, his implementation "unrolled" it into what I labelled the geometric approach.

Also, in both my and Johnson's implementations the barycentric coordinates are computed as part of determining if the query point projects orthogonally onto the feature.
Christer Ericson
Posts: 25
Joined: Mon Jun 27, 2005 4:30 pm
Location: Los Angeles
Contact:
jmc wrote:
Erwin Coumans wrote:All the memory used for caching determinants is probably not a win on modern consoles, like Playstation 3 or XBox 2, because processor power (re-calculating from scratch) outperforms fetching from memory... memory doesn't keep up with faster processors nowadays !
I can't believe this without proof. Where is your justification? The cache used by solid should fit within a cpu cache page, so we are talking about a single fetch from memory into the cpu's cache. Fetching from the cpu cache should out perform recalculating all the time.
Just to corroborate Erwin's statement... On the next-generation consoles you can assume an L2 cache miss will cost on the order of 500 cycles (roughly -- I'm being a bit deliberately vague here). You can do a lot of (re-)computation in ~500 cycles! Caching computations is not the way to go, going forwards (and, arguably, not even today).
gino
Physics Researcher
Posts: 23
Joined: Mon Jun 27, 2005 9:28 am
Location: Helmond, Netherlands
Contact:

### Re: Bullet's Voronoi Simplex Solver

Christer Ericson wrote:
gino wrote:Johnson's algorithm computes the barycentric coordinates of the point closest to the origin of a simplex. Ericsson's suggested alternative to Johnson does not compute the barycentric coordinates IIRC. Ericsson claims that since his approach is more intuitive it is easier to fix robustness problems. I agree on the intuitive part, but I'm a bit less convinced on the robustness part. In fact, since Ericsson's approach separates the subsimplex selection from the closest point computation it is a source of a new set of robustness problems: closest points not necessarily are contained by the selected subsimplex.

So my advice would be to just bite the bullet (no pun) and implement Johnson, since it is the fastests and most accurate solution around.
That sounds a bit weird because my code is for all intents and purposes identical to Johnson's own implementation, which is actually very different from yours. Johnson himself didn't use the "recursive" formulation presented in the GJK paper and that you are using; like myself, his implementation "unrolled" it into what I labelled the geometric approach.

Also, in both my and Johnson's implementations the barycentric coordinates are computed as part of determining if the query point projects orthogonally onto the feature.
I do not have your book here (shame on me), but I recall that your version first selects the proper subsimplex (through Voronoi plane tests) and computes the closest points by projection. I understand from your post that this "projection" involves solving the well-known linear system for finding a vector orthogonal to the subsimplex's affine hull, and the barycentric coordinates are used for the Voronoi test as well. So, as I make it, the only thing different from your approach with respect to the original GJK approach is the fact that you do not reuse computations from lower-dimensional subsimplices. Am I right?

The reason you and Erwin give why earlier computations are not reused is the fact that retrieving cached computations may incur memory cache misses. In the SOLID 3.5 implementation the cached determinants take up 256 bytes (using single-precision), so it shouldn't be to hard to keep that in cache, if it weren't for the support computations that can mess up the data cache. With some effort the determinant cache size can be reduced to 96 bytes at a slightly higher retrieval cost, but anyhow the point stands that support computations can kick around wildly.

However, if we forget about saving determinant values for future iterations, it still makes sense to save them within a single iteration to compute the barycentric coordinates for the higher-dimensional subsimplices, so I (and probably also Dan Johnson) would still go for a recursive determinant computation.
Christer Ericson
Posts: 25
Joined: Mon Jun 27, 2005 4:30 pm
Location: Los Angeles
Contact:

### Re: Bullet's Voronoi Simplex Solver

gino wrote:I do not have your book here (shame on me), but I recall that your version first selects the proper subsimplex (through Voronoi plane tests) and computes the closest points by projection. I understand from your post that this "projection" involves solving the well-known linear system for finding a vector orthogonal to the subsimplex's affine hull, and the barycentric coordinates are used for the Voronoi test as well.
Sort of. Conceptually, yes. Practically, however, the calculations for determining the proper subsimplex has already computed the terms needed for the determination of barycentric coordinates on the feature. Also, note that it is a proper orthogonal projection (and not a "projection") because once you have determined the Voronoi feature region, the query point will project orthogonally onto it.
So, as I make it, the only thing different from your approach with respect to the original GJK approach is the fact that you do not reuse computations from lower-dimensional subsimplices. Am I right?
As I said before: my approach is the original GJK approach. I'm not sure why you're thinking otherwise. It's not the recursive formulation given in the paper, but it's the unrolled version as used in Daniel Johnson's own implementation. I recommend studying his implementation. I reference it in my book, but it seems the link I gave has gone stale. Erwin has a copy here:

http://www.erwincoumans.com/p/gilbert.c

(Technically this is a C translation of the original code, done by Diego Ruspini, which is nicer anyway in that it's more readable than the original FORTRAN code.)

The recursive and non-recursive approaches are for all intents and purposes identical, primarily differing on what order the subsimplicies are searched, but even that can be made the same. Again I recommend you read the code and make the comparison. Earlier computations are reused in both my code and Johnson's code. Johnson stores his to memory (in arrays del, d1, d2, etc). I keep them in registers, to avoid huge penalties in accessing potentially uncached data (penalties large enough to cover the whole GJK computation cost for a single cache line miss).
so I (and probably also Dan Johnson) would still go for a recursive determinant computation.
You should probably familiarize yourself with his code before making definitive statements about it! :) His code is fully unrolled (non-recursive and non-iterative). He does reuse previous determinant expressions, but this is basically equivalent to common subexpression elimination. Also, that code was written in the "pre-cache" era where computation was expensive (multiplications in particular) so that was a natural way of writing things. These days, it's not.

Anyway, the two approaches are the same. Math is math after all. The key difference is really that the unrolled version makes it explicit and obvious which subsimplex is being tested. The recursive formulation doesn't really (although you could obviously document it to make it clear). That's why I like the "geometric" formulation: it is more readable, and therefore more easy to argue about and to debug. The geometric view you can describe to someone and they'll understand it first try. The recursive formulation you can describe 10 times and people still don't get it.
gino
Physics Researcher
Posts: 23
Joined: Mon Jun 27, 2005 9:28 am
Location: Helmond, Netherlands
Contact:

### Re: Bullet's Voronoi Simplex Solver

Christer Ericson wrote:
gino wrote:I do not have your book here (shame on me), but I recall that your version first selects the proper subsimplex (through Voronoi plane tests) and computes the closest points by projection. I understand from your post that this "projection" involves solving the well-known linear system for finding a vector orthogonal to the subsimplex's affine hull, and the barycentric coordinates are used for the Voronoi test as well.
Sort of. Conceptually, yes. Practically, however, the calculations for determining the proper subsimplex has already computed the terms needed for the determination of barycentric coordinates on the feature. Also, note that it is a proper orthogonal projection (and not a "projection") because once you have determined the Voronoi feature region, the query point will project orthogonally onto it.
So, as I make it, the only thing different from your approach with respect to the original GJK approach is the fact that you do not reuse computations from lower-dimensional subsimplices. Am I right?
As I said before: my approach is the original GJK approach. I'm not sure why you're thinking otherwise. It's not the recursive formulation given in the paper, but it's the unrolled version as used in Daniel Johnson's own implementation. I recommend studying his implementation. I reference it in my book, but it seems the link I gave has gone stale. Erwin has a copy here:

http://www.erwincoumans.com/p/gilbert.c

(Technically this is a C translation of the original code, done by Diego Ruspini, which is nicer anyway in that it's more readable than the original FORTRAN code.)

The recursive and non-recursive approaches are for all intents and purposes identical, primarily differing on what order the subsimplicies are searched, but even that can be made the same. Again I recommend you read the code and make the comparison. Earlier computations are reused in both my code and Johnson's code. Johnson stores his to memory (in arrays del, d1, d2, etc). I keep them in registers, to avoid huge penalties in accessing potentially uncached data (penalties large enough to cover the whole GJK computation cost for a single cache line miss).
so I (and probably also Dan Johnson) would still go for a recursive determinant computation.
You should probably familiarize yourself with his code before making definitive statements about it! His code is fully unrolled (non-recursive and non-iterative). He does reuse previous determinant expressions, but this is basically equivalent to common subexpression elimination. Also, that code was written in the "pre-cache" era where computation was expensive (multiplications in particular) so that was a natural way of writing things. These days, it's not.

Anyway, the two approaches are the same. Math is math after all. The key difference is really that the unrolled version makes it explicit and obvious which subsimplex is being tested. The recursive formulation doesn't really (although you could obviously document it to make it clear). That's why I like the "geometric" formulation: it is more readable, and therefore more easy to argue about and to debug. The geometric view you can describe to someone and they'll understand it first try. The recursive formulation you can describe 10 times and people still don't get it.
I've seen this code. It does not cache subdeterminants over iterations but it *does* use the recursive determinant formula from the GJK paper. This does not mean that the code itself is recursive (and neither is mine). Furthermore, I cannot find this code more "geometric" than code in which there is no case distinction for the number of vertices in Y. It boils down to solving the same linear system for different dimensions. The code has at least five times the number of lines as my (partly unrolled) implementation so, draw your conclusions on readability and debug-ability. And then there is also the risk of incuring code cache misses. (Better stop this before we end up boasting the size of our "Johnson" )

Math is math, but on a machine A dot B - A dot C is not the same as A dot (B - C). A "geometric" view does not help a lot in improving accuracy. In the given code it would be incredibly hard to fix numerical issues. Something to keep in mind as well.
Etherton
Posts: 34
Joined: Thu Aug 18, 2005 6:27 pm
Christer Ericson wrote:Just to corroborate Erwin's statement... On the next-generation consoles you can assume an L2 cache miss will cost on the order of 500 cycles (roughly -- I'm being a bit deliberately vague here). You can do a lot of (re-)computation in ~500 cycles! Caching computations is not the way to go, going forwards (and, arguably, not even today).
I love these absolute statements.
Does this means that reusing intermediate values is not longer an optimization compiler will be doing anymore because risk of cach miss?
Christer Ericson
Posts: 25
Joined: Mon Jun 27, 2005 4:30 pm
Location: Los Angeles
Contact:
Etherton wrote:
Christer Ericson wrote:Just to corroborate Erwin's statement... On the next-generation consoles you can assume an L2 cache miss will cost on the order of 500 cycles (roughly -- I'm being a bit deliberately vague here). You can do a lot of (re-)computation in ~500 cycles! Caching computations is not the way to go, going forwards (and, arguably, not even today).
I love these absolute statements.
Does this means that reusing intermediate values is not longer an optimization compiler will be doing anymore because risk of cach miss?
My statement is a generalization obviously, but it's an important rule of thumb. In certain situations you will have full knowledge of what the caching situation is, and will be able to act accordingly (but in these multithreaded environments things are quickly becoming unpredictable). Note that I'm only refering to caching data/calculations to memory btw (and situations where there's a risk of suffering L2 cache misses). Reusing computations already existing in registers is fine of course, which is what your typical compiler does. That is, your comment about compilers reusing intermediate values is really a different issue that is only related in the case the compiler spills registers to memory. Anyway, the answer to the question is no, because AFAIK none of today's compilers are taking possible L2 cache misses into account when optimizing/scheduling code.
Etherton
Posts: 34
Joined: Thu Aug 18, 2005 6:27 pm
Christer Ericson wrote: My statement is a generalization obviously, but it's an important rule of thumb. In certain situations you will have full knowledge of what the caching situation is, and will be able to act accordingly (but in these multithreaded environments things are quickly becoming unpredictable). Note that I'm only refering to caching data/calculations to memory btw (and situations where there's a risk of suffering L2 cache misses)?..
If in these multithreaded environment things are so unpredictable what make you think your solutions are superior?

My second sentence was made out of irony.
The point is that you two are discussing about who rewrote JGK or Cameron function better. The way I see it, all you did was unrolling a function and for some reason you believe that will make it better. You seem to jump around a lot, first you bring the robustness argument and when your claim is challenged you quickly change it to the efficiency of implementation argument. Both arguments are weak since they are not supported by a rigorous proof.
Cache is strictly a hardware architectural issue. Any body can argue that what you did is slower because what you save in memory misses you loose in instruction and branch prediction misses which are equally if not more severe with these new deep pipe lines.
Erwin Coumans
Posts: 4232
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA
Contact:
If in these multithreaded environment things are so unpredictable what make you think your solutions are superior?
On a Playstation 3 it is fairly well predictable, each SPU has its own local storage. I don't think filling this with cached data between frames helps. Between the gjk iterations its better to keep the data in registers, 128 simd registers is more then enough.
My second sentence was made out of irony.
The point is that you two are discussing about who rewrote JGK or Cameron function better.
Wait, there is more then 2 people. I can confirm it is more intuitive, just compare bullet voronoi simplex solver with solid one. Bullet is not Christer Ericson implementation, it has bits, but I added the rest. It is not the optimized target platform code, obviously.

I did performance and robustness comparisons, and they are not much different. Just run the latest bullet demo if you like. Earlier in this threat I posted the Solid 3.5 integration files to compare it yourself
Code size between bullet simplex solver and solid simplex solver is similar too, I think that bullet simplex solver code is actually smaller, but I have to verify.
So in the end, not much difference:

(Bullet versus Solid 3.5 simplex solver)

0) the geometric version is more intuitive
1) robustness, code size and performance is similar
1) caching can hurt, keeping data in registers is better
2) between frames: its better to cache the separating vector, and avoid gjk all toghether, instead of caching the determinants

Erwin