I've been working for some time at a physics engine as a hobby. Recently I got more involved as I decided to make it my degree paper. Unfortunately I kind of got into a dead end, nothing I tried seemed to work so I finally decided to get a holiday. Now that I got back after two weeks, I want to get back to work as I have a presentation in two weeks, so I decided to ask for help.
I've read part of this forum, but it's quite large to view it all. My problem is stability too and then performance. I'll describe you briefly my engine and afterwards you can tell me what could be wrong or what could be improved.
I use a time-stepping scheme/velocity based formulation as described in Catto and Erleben: symplectic Euler integrator, a PGS solver with SOR and a Guendelman approach to contact and collision resolution. Although these should be some of the best approaches that I've read about, I get a lot of instability especially in my bilateral constraints. I'll give you an outline of the engine to be more precise:
1. relax joints
2. detect collision
3. update velocities
4. solve contact/constraints
5. update positions
1 means something similar to Jakobsen method in order to avoid large Baumgarte factors, then in 2 I detect pentration mostly between OBBs (althoug I have spheres and capped cylinders too) and get contact info in pairs of closest points. I add them to a list and then for each of them (one by one) I apply collision impulses like in the Baraff-Witkin tutorial (for pairs of objects) and do a lot of projection to decrease inter-penetration. After 3 I solve the mixed LCP that I've built using the list of contacts and the joints. I have only dynamic friction, as described in some Baraff or Anitescu papers, so I don't use any friction cone approximation or tangential directions or constraint forces. After the PGS I clear the contact list and go to 5 and back to 1 looping of course

I use a time step of 0.001, otherwise it blows up, but I do about 20 updates per frame, although I'm still far from 60hz. Gravity is -9, the restitution factor in the impulse-based part is 0.02, beta for joints and contact is 0.1, 15 iterations for PGS and w=1.2 for SOR, and I also relax the lambdas by 0.8. For now I only have ball-socket joints and I built an articulated rag-doll with OBBs and a sphere for the head (4 limbs each with 2 parts). It gets very unstable at violent impacts and when falling on other boxes. It gets a lot of extra energy and angular velocity (could be because I don't use hinges?) and the system gets into rest only in some particular situations.
Please help me! Hints, pieces of advice, tell me what do I wrong, possible mistakes, future improvements, anything. I can post more info i fyou need, and even demos or source code (although I'm nor very proud of them

Thank you.