Here's a picture that might explain a little bit of what's going on with the negative slope issue (just a gimp-job, not results from a simulation or anything...):
This is a position space picture of a two element chain attached to the ceiling, where the x-axis represents the distance from the ceiling to the first particle, and the y axis is the distance from ceiling to the second (I know we've been talking about velocity constraints, but this will serve to illustrate the general idea here). The blue line is the locus of points where the first link is satisfied, the red line is where the second link is. The global solution is where these intersect.
The way the normal iteration process works is:
1) Project the current position to the blue line along the line with slope 0.
2) Project the current position to the red line along the line with slope -m2/m1 (using Verlet this enforces conservation of momentum)
3) Goto 1 until done.
The green sawtooth shows this process in action - as you can see, for -m2/m1 fairly close to 0, we make very little progress in each iteration. Furthermore, after the first step, everything is very regular and predictable.
As far as the negative slope thing: Erin's approach is to wait until we've reached that regular region, which is measured by checking that the magnitude of the sawtooth is decreasing iteration to iteration, and then use that information to follow the purple path instead of the green one and jump right to (or just about to) the correct global solution. Notice that the first couple of iterations would lead us to think that the magnitude of correction was increasing, which means we can't make any best guess about where the convergent bit will end up. It seems like a physically plausible assumption that most of these processes will eventually end up in a predictable and slowly convergent tail period, and this is what the extrapolation would accelerate.
Obviously the main potential pitfall is that the horizontal purple line jumps WAY out into a different region of phase space (off the image, even) in order to force the diagonal move to end up at the global solution (keep in mind that the slope of the diagonal line is constrained by conservation of momentum); it goes further off the worse the mass ratio. For this particular case this is fine, since the error function is very simple over this 2d slice of phase space but with other constraints in effect, there is no guarantee that this far off point will have anywhere near the same properties as the current one - in particular, subsequent projections may be extremely poorly behaved under such a scheme. Also if one of the constraints takes a very strange form we may jump to the wrong global solution. The assumption of linearity may even lead us into a divergence if things go badly enough - I'm doing some testing at the moment to see about this, to figure out what assumptions we need about the constraint functions so we can avoid such scenarios.
To some extent this method is really just a slightly beefed up version of straightforward gradient descent: we start with a naive minimization along coordinate axes and see if we can use the information gained from that attempt to (effectively) pick better axes to minimize along. In particular, this method would get us along a thin valley very quickly; however, its potential failure is that it gets us along the valley by jumping very far out to the side of the valley and hoping that it will get pushed back in by the next constraint applications, which it probably will if they are linear enough. In contrast, a more usual (but not quite naive) gradient descent method would explicitly choose to minimize along the direction of the valley floor.
I am not familiar enough with Box2d or Bullet to put together a demo that would test this idea (I finally got both compiling, but I haven't hacked around either one before so I'd need too much time just learning what goes where to knock off something quick), however I do have a 2d Verlet engine running, so I will eventually be able to test the idea in that slightly different context. I'll certainly report what happens.