Actually I think this can be explained if you look between equations ( 7 ) and ( 8 ) in the paper. The cardinality is simply the number of particles that the constraint affects. In equation ( 8 ), n is the number of particles (what I believe you are calling the cardinality), and the constraint is expressed as C(p-sub-1, ..., p-sub-n).
I meant more: why do we need to use a cardinality term at all -- how did they arrive at it? Why can't we just scale things with weights which sum to 1? Just using (w_i/w_total) would be an example: why do we have to use n*(w_i/w_total)??
What's confusing to me is that a constraint function that takes 2 3D vectors (cardinality 2) could also be considered a constraint function that takes 6 scalars.. the gradients would be the same, and you could split the mass of each particle between its x,y,z coordinates so that the total mass remained constant, but in the second case you'd have a cardinality of 6, which would alter the results.
Shouldn't the results be identical in both cases, since it's simply a matter of changing C(p1,p2) to C(x1,y1,z1,x2,y2,z2) -- we've posed the same question in a different form, which shouldn't alter the solution. So it seems like scaling everything by the "cardinality" in addition to a weighting term is a bit arbitrary/ambiguous.. or at least I don't understand where it came from.
About the bending constraint, I found that arccos didn't behave all that well, arctan was much more well-behaved:
C = (atan2(n1) - atan2(n2)) - theta
Is no one else having fundamental problems with over-shooting? For the collision/normal constraint (section 4.3), there was definite overshooting when using particles with varied masses. I'm almost positive it wasn't due to a programming error as i derived the constraint in a few different ways, several times over a period of weeks. Plus my solver is generic so the other constraints would have shown weird behaviour if there was a bug -- only the constraint function and the gradients vary between constraints.
When all masses were the same, the behaviour of the original paper's method and my method was the same: they both behaved well.
However, if you took one of the particle's masses and doubled it, the paper's method over-corrects. In contrast, even at extreme mass ratios such as 1:50, the method I happened upon behaves perfectly well. I just wish I understood why!