Ok, I give it a try.  
 
 
We integrate the forces and velocities to find a new position 
p using some unstabilized method of choice. Given the new 
p we now want to find a correction 
dp such that 
C(p + dp) = 0. We can approximate this as shown above by 
C(p + dp) ~= C(p) + J * dp = 0
We also want to restrict the correction to be orthogonal to the constraint manifold and scale the corrections relative to the masses such that
dp = M^-1 * J^T * lambda
Combining the two equations yields
J * M^-1 * J^T * lambda = -C(p)
dp = M^-1 * J^T * lambda
This is a single Newton-Raphson step for an iterative solution to a non-linear system of constraints. So we essentially get 
for ( n Newton iterations )
{
    J * M^-1 * J^T * lambda = -C(p)
    dp = M^-1 * J^T * lambda
    p += dp
}
This builds the outer iterative loop. We could now also solve the inner linear system 
J * M^-1 * J^T lambda = -C(p) using an iterative Gauss-Seidel solver. This is what I referred to as the inner loop earlier. NGS simply merges the two loops solving a single constraint at a time constantly updating the mass, Jacobians and constraint error to account for the non-linearity of the problem. The key observation is simply that both the inertia and Jacobian are functions of the position and need to be updated when iterating on the position level.
Of course you can make the argument now that for small errors you can keep 
M and 
J constant and only iterate on the constraint error. You might even want to reuse the Jacobian and mass from the velocity phase. This is essentially what split impulse does. My experience is identical to Erin's in Box2d that this works relatively good for contacts, but not so much for joints.
I am not a mathematician and I have more of a practical engineering view on the problem, but hopefully this still makes sense!
Cheers,
-Dirk