Small Step XPBD FEM Demo

Please don't post Bullet support questions here, use the above forums instead.
Post Reply
Posts: 10
Joined: Mon Jun 07, 2021 5:12 am

Small Step XPBD FEM Demo

Post by jak »

Hi guys,

I've been working on a small step XPBD-based implementation of FEM off and on for the last several months, and I think I finally have something that someone might find interesting. Here's a WASM demo showing the current state of things: At some point I had delusions of grandeur of building an interactive walkthrough to show off the features, but unfortunately right now you're just presented with a wall of controls and an even bigger wall of text explaining them.

I think the two most interesting things about the demo are that it attempts to tackle incompressible FEM and it shows off higher-order elements (like quadratic quads/hexahedra). Until you peek at the demo this next sentence won't make too much sense, but the performance of the Sub energy function with Q4 and H8 element types seems promising. You can keep the number of steps/second pretty low and still maintain good stability and volume preservation, especially with 2-3 extra volume constraint passes.

Interestingly, I've found the WASM version running in Chrome to perform indistinguishably from native version. I have not attempted to add WASM SIMD support, but with an older version of the code I was getting a full 8x speedup with a trivial structure of arrays AVX2 implementation in a native build. The one issue I had to work around is that small step XPBD tends to require double precision floats and, in fact, single precision floats were insufficient for storing nodal position values. The constraints themselves, on the other hand, are calculated in normalized element space, which is not so sensitive, so the vast majority of calculations can be on regular floats.

The code is here:, though I abandoned all attempts at tutorializing anything, so it's probably only readable to someone with a pretty good handle on FEM to begin with. That said, someone implementing their own XPBD FEM may find it useful to look at the code for normalizing things like compliance and damping against element size.

In the not-too-distant future I intend to add support for higher-order triangle and tetrahedron element types. I may also investigate storing additional values than just positions per node. I briefly tried storing volume per node, and that fixed hourglassing in the fully integrated mixed energy function, but I didn't really dive into its performance characteristics, so I might do that later.

Finally, I'd like to give a shout out to Matthias Müller and his SIGGRAPH(?) 2008 course notes on linear, co-rotational FEM ( ... enotes.pdf). I've attempted to learn about FEM several times in the past, but always bounced off. Coming from a game development background, this is still the clearest intro to FEM I've seen, and it was enough to let me get a toe-hold into the wider subject.
Posts: 10
Joined: Mon Jun 07, 2021 5:12 am

Re: Small Step XPBD FEM Demo

Post by jak »

Found a sudden burst of inspiration to post an update to the demo ;p. The 3 major additions are a few new element types (second-order triangles and first- and second-order tetrahedrons), Stanford armadillo meshes for each of the element types, and a new set of energy functions that constrain per-vertex instead of per-element. The demo is still at If you've loaded the page before, you may need to press the "Reset Settings" button to clear your settings cache.

In terms of the new element types, there's not much to say, in isolation.

Adding the armadillo meshes was a learning experience, however. Now, it started gently enough with the tri and tet meshes. For the tri armadillo, I followed an old blog post by Miles Macklin. For tets, I was able to use TetGen without much fuss. The quad mesh was a bit more of a journey, but I eventually got gmsh to spit one out. The boundary could still use some refinement, but it seems like getting good-quality automatically-generated 2D quad meshes is tractable. The hex "armadillo," on the other hand, is where things really took a turn. Hex meshing is a much more difficult problem than tet meshing, and given that, the progress people have made on it is impressive. I tried a variety of automatic meshers, and I got farthest with the PolyCut toolchain from the University of British Columbia here: It was able to generate some nice meshes, but not at a low-enough resolution for a browser demo. In the end, I ended up just hand-modeling the armadillo-adjacent creature you see in the demo.

Now, up to this point, the H8 linear hex element seemed like by far the most promising direction, since it's locking-resistant and it's the most performant element on a per-vertex basis by a large margin. And I still think if you're in a situation where a hex mesh is available, it's the thing to use. For instance, adding a hex-based flesh layer to a skinned character seems possible. For arbitrary volume meshes, though, hex mesh generation is too much of an impediment, so I decided to look a bit more into getting tets up to snuff.

Like with triangles, if you use the basic technique of constraining the energy function per-element on linear tets, you get pretty bad volumetric locking. Quadratic tets don't lock too much, but they are vastly more costly. Eventually, inspired by this paper: ... /paper.pdf, I tried moving the constraint from the element to the one-ring around each vert. Each constraint is now quite a bit more costly, but you only have about 1/6th as many (in the case of tets, at least). Also, with the Gauss-Seidel solver, you need fewer Steps/Second to converge. With my current implementation, performance is a wash for tets between using element-centric constraints and one-ring-centric constraints, with the big tie-breaker being that one-ring constraints don't lock. All other element types are at least a bit slower. For the very simple mixed Neo-Hookean energy function (represented as "V Mix" in the demo), that's all there is to it: integrate each of the energies of the surrounding elements, divide by 4 (to represent the portion of the tet that belongs to the given vert), and you're set. For more complex energy functions, like Pixar or Yeoh Skin, you have to be a bit more careful. Integrating the energy function per-element still leads to locking. Instead it seems like you can just integrate the hyperelastic invariants I1 and J, and then use the weighted sum of those to calculate the energy function. I hacked in this technique for the T3 triangle version of "V Pix," though T3 "V Skin" and both T4 "V Pix" and "V Skin" still integrate the full energy function per-element, and thus lock.

Looking towards the future, I'm thinking of ditching the higher-order elements. It was educational to implement them, but I haven't been able to get them performance-competitive with the first-order elements on a per-vert basis. Even first-order tets are about 3x slower to achieve convergence compared to first-order hexes, but I think if I gave them a specialized pathway I could at least make up some of that difference. Along those lines, I might investigate WASM SIMD, since browsers are starting to roll out support for that, and it would be a good way to keep me honest about the complications of parallelizing one-ring constraints vs. simple element-constraints.
Posts: 10
Joined: Mon Jun 07, 2021 5:12 am

Re: Small Step XPBD FEM Demo

Post by jak »

The excellent paper Miles Macklin posted yesterday (A Constraint-based Formulation of Stable Neo-Hookean Materials) elucidated some things about the relationship between constraints and energy in XPBD that I had been fuzzy on, and I couldn't resist jamming the corrected mixed Neo-Hookean energy function into the demo. The updated version is still at, and I moved the previous version to The biggest issue with the previous version is that I was constraining the full Neo-Hookean energy, instead of the square root. The irreducible Pixar Neo-Hookean energy function was similarly squared compared to what it should be, and I went ahead and fixed that as well. With the fixed energy function, the material is more permissive of large extensions and Poisson's ratio is respected in most modes. The "Mixed" energy function (which is a fully integrated mixed energy function) also now works correctly with quadratic elements.
Posts: 10
Joined: Mon Jun 07, 2021 5:12 am

Re: Small Step XPBD FEM Demo

Post by jak »

Updated the demo. The current version is still at, and the previous version is now at Additionally, I uploaded the latest code to:


Rayleigh Damping
Took a deeper dive into the Rayleigh damping suggested in the original XPBD paper. It always looked quite nice with the single-equation irreducible Pixar Neo-Hookean, but I'd never been happy with the results for mixed energy functions. I eventually stumbled on the idea of switching over to working with gamma instead of beta, and feeding identical gamma values into both the deviatoric and volumetric constraints. This got the behavior matching between the irreducible and mixed energy functions pretty well. Still, there was one big problem, which is that the damping completely breaks down at high values, or even under high tension. Intuitively, it seems like by folding damping into the constraint you can get into an equilibrium where the velocity term cancels out the constraint term, netting zero force and causing your FEM object to melt slowly into formlessness, or at the very least massively violating incompressibility.

I have surfaced a couple mitigations via the "Rayleigh Type" tweakable. "Orig" lets you view damping as presented in the paper. The first mitigation, called "Lim," breaks the damping constraint out on its own and then limits how large it can get relative to the constraint function. This reduces the maximum strength of damping, but also greatly reduces the artifacts. Both constraints are still applied at the same time, so the cost is barely increased compared to the original. Another approach that allows for maximum damping and completely removes artifacts is to apply damping directly to velocity after all position constraints are calculated. This is selectable with "Vel." While this looks great, it is very expensive, as it's essentially doubling the number of constraints that are run each substep. To address the slowness, I added a final option, "Amor," which amortizes the cost of damping by only running it every 8 substeps. This reduces the maximum possible damping, but otherwise appears to generate very similar behavior in the demo (though I'd hesitate to recommend this as a generally viable solution, as this kind of thing feels like it has the potential to introduce weird resonances). In general, for all the mixed method energy functions, max damping is much stronger when the two mixed constraints are solved simultaneously, rather than serially.

Energy from Pre-Integrated Invariants
All vert-centric constraints on T3 and T4 elements now integrate invariants and use the results to calculate energy, rather than integrating energy directly. This means VPix, VMix and VSkin all have reduced locking.

Inspired by that idea, I added an element-based constraint version of Yeoh skin called "YFast" which integrates invariants instead of energies and the results are promising. Per-constraint, it's nearly the same speed as regular Neo-Hookean, but it behaves very similarly to full Yeoh (or in the case of Q4/H8 elements, which severely lock with full Yeoh, the selectively integrated "YSel"). At high Steps/Second and high Compliance YSel and YFast are indistinguishable. As you lower Steps/Second, YFast actually seems to stay converged longer. As you lower Compliance, YFast also avoids locking, which is an issue for YSel, though it does exhibit slightly more visible hourglassing. (And a little tip when playing with the Yeoh skin models: unlike the Neo-Hookean model, it's only designed for the nearly-incompressible range, so Poisson's Ratio needs to be 0.4999 or higher.)

Square/Cube approximations
Added two new Energy Functions, "CubeNeo" and "CubeSkin." These only work for Q4 and H8 elements, and only work correctly for perfect squares/cubes. The main draw is how simple they are. The most basic version of CubeNeo for Q4s looks like this:

Code: Select all

void ConstrainSquare(
  float dt, float compliance, float nodeMass, float area0,
  dvec2& X0, dvec2& X1, dvec2& X2, dvec2& X3
) {
  // Apply Neo-Hookean compression
  dvec2 centroid = (1.0 / 4.0) * (X0 + X1 + X2 + X3);
  float compressionScale = -2.0f * (dt * dt) / (compliance * nodeMass);
  X0 += compressionScale * (X0 - centroid);
  X1 += compressionScale * (X1 - centroid);
  X2 += compressionScale * (X2 - centroid);
  X3 += compressionScale * (X3 - centroid);
  // Apply volume conservation
  vec2 A = vec2(X0 - X2);
  vec2 B = vec2(X1 - X3);
  float area = 0.5f * (A.x * B.y - A.y * B.x);
  vec2 gA = 0.5f * vec2(B.y, -B.x); // Partial derivative of area with respect to A
  vec2 gB = 0.5f * vec2(-A.y, A.x); // Partial derivative of area with respect to B
  float volumeScale = -2.0f * (area - area0) / (dot(A, A) + dot(B, B));
  X0 += dvec2(volumeScale * gA);
  X1 += dvec2(volumeScale * gB);
  X2 -= dvec2(volumeScale * gA);
  X3 -= dvec2(volumeScale * gB);
The deviatoric constraint simply pulls all the nodes to the element's centroid. This is only an approximation of the true Neo-Hookean constraint, but for squares and cubes, it's shockingly close. Setting Rayleigh Damp to 0 and Poisson's Ratio to 0.5 runs the above version of CubeNeo, so you can see how fast it is. Setting the values to anything other than exactly 0 and 0.5 switches to a more generalized version of the algorithm which respects individual node weights and very closely mimics the serial version of the MSel energy function. CubeSkin is the same idea, but it mimics YFast. Looking at the terms of the prefactored integral of the Neo-Hookean energy suggests why this approximation works so well. For squares and cubes, the weights of the centripetal gradients account for two-thirds of the total and the rest of the terms multiply by dot products of generally perpendicular vectors.

Given the limitations of this approximation, I'm not sure it has any use in production, but it could be a good stepping stone for learners ("Wow, look! You're doing real FEM in 16 lines of code!"). I've been toying with a writeup along those lines, but we'll see ...

Smaller changes
Got rid of the concept of subregion constraints, since, with the correct treatment of energies, they didn't seem to add anything. This improved the speed of higher-order elements to the point where I'm not quite ready to write them off just yet.

The "Mixed XPBD" tweakable allows you to choose between Serial and Simultaneous solvers for the Mixed, MSel, Yeoh, YSel and YFast Energy Functions. Simultaneous means the deviatoric and volumetric constraints are solved simultaneously with the matrix form of XPBD, though the individual elements are still solved serially in a Gauss-Seidel fashion. The Simultaneous solver seems to be able to hold its shape better at low Steps/Second.

Internally, switched to a modified version of the XPBD equation which more directly uses potential energies (avoiding some square roots and divisions when solving constraints). I'll describe it very sketchily, here. Given the original small step XPBD, where:
Δ𝜆 = -C / (∇CM⁻¹∇Cᵀ + ᾶ) and
Δ𝑥 = M⁻¹∇CᵀΔ𝜆
Assuming a fully diagonal ᾶ, we can say U′ is the vector of all constraint energies with their compliances factored out. Then
U′ = ½C² and
Δ𝜆′ = -2U′ / (∇U′M⁻¹∇U′ᵀ + 2U′ᾶ) and
Δ𝑥 = M⁻¹∇U′ᵀΔ𝜆′ where
Δ𝜆 = Δ𝜆′√(2U′) (Though in this formulation raw Δ𝜆 is never needed.)
Posts: 1
Joined: Wed Oct 27, 2021 1:02 am

Re: Small Step XPBD FEM Demo

Post by zalo »

Hi Jak, I just wanted to say that I’m a big fan of your explorations here. The framerate and smoothness is hypnotic! Your implementation (and Miles’s most recent paper) inspired me to have a go porting an XPBD sim to run in WebGL2 Shaders:

This uses simple polar decomposition-based tet constraints with jacobi iterations (instead of the paper’s neohookean energy function with gauss-seidel iterations).

I think the example dragon model has too high of a max vertex valence (31) for graph coloring-based parallelization to work nicely, but uniform hexahedral meshes might work great (~8).

I haven’t worked out the physical correctness or damping yet, but I’m hoping to build out a more powerful framework for GPGPU calculations in WebGL2 before spending too much time on the details.
Posts: 10
Joined: Mon Jun 07, 2021 5:12 am

Re: Small Step XPBD FEM Demo

Post by jak »

Love it! That GPU sim is so smooth ~~

Great to see other people working in the area!
Posts: 10
Joined: Mon Jun 07, 2021 5:12 am

Re: Small Step XPBD FEM Demo

Post by jak »

Updated the demo. The current version is still at, and the previous version is now at

This update is a bit different in nature compared to the previous ones, as I added very few features to the main sim. Instead, I added two new modes which can be selected directly below the viewport. In addition to the original Sim mode, there's now also a Step mode and an Explore mode.

Step fairly straightforwardly lets you single-step through the simulation of a single (2D) element. This is something I would occasionally hack in for debugging purposes, but now it's permanent. One thing I've enjoyed doing is really cranking down the Steps/Second to see the solver at the limits of stability and beyond. The volume constraint tends to dominate, so I've expanded the Solve tweak var to let you choose deviatoric- or volumetric-only solves, which lets you build intuition about each type of constraint.

Explore is a bit of a deeper feature, but also not fully figured out, and so unfortunately probably not too useful. Like Step, it presents you with a single 2D element which you can fully manipulate, but instead of running the simulation on it, Explore mode surfaces many of the sub-components that go into calculating the simulation. The part I'm most happy with is being able to click on a specific component or invariant and then see the equation for calculating it with real values that update as you move the element nodes. On the other hand, trying to cram so many features onto a single page was probably a mistake, especially considering how many equations are still missing. I do think it could turn in to a really valuable learning resource with enough effort, say starting with the XPBD equations and filling in everything you need for full FEM simulation. Not sure that I will be the one putting in that effort ... we'll see ...

Even then, that may be putting the cart before the horse a bit, since another big unresolved issue is that, while you can represent all the simulation equations with simple 2D matrices (and indeed I have on the Explore page), it becomes pretty contorted from a linear algebra-perspective (what does the vector space of transposed shape function gradients represent, conceptually?). Many of the FEM equations seem to be best represented with higher order tensors, but those aren't objects in the average game dev's repertoire. The good version of the Explore page would maintain the use of intermediate game dev-friendly constructs, but increase sensibleness from a linear algebra-perspective. In my head I feel like I've got that 80% of the way figured out, but then again it's the second 80% that's always the hardest.

Well, okay, some things that I have found interesting even in the current incarnation:

It's nice to look at the T3 element and see that the deformation gradient is indeed constant throughout.

Being able to change the number of quadrature points used in integration is useful to see where the different invariants converge. For instance, it shows that full Yeoh requires more than 3x3 points to converge for quadratic elements (which is expected, but I never really considered it). On the topic of Yeoh, it's also interesting to observe how different the results are between real Yeoh and the fast Yeoh approximation, given how similarly they behave perceptually.

Finally, visualizing the deformation gradients led me to an optimization of the volumetric constraint. The volumetric constraint requires calculating |F| and its derivatives at various quadrature points. For mapped elements, you need the inverse map deformation gradient matrix at each quadrature point to calculate F at that quadrature point. Turns out though, if you only want to calculate |F|, which is the case when the I₁ term is being calculated with pre-factored integration, you only need to store and use the determinant of the inverse map deformation, saving both memory and computation time.

The only other general feature I added is the "serendipity" element type (Q8/H20). These are quadratic elements similar to Q9 and H27, just with fewer nodes, so they are faster to calculate. I had left them out originally because I had read that the mapped (non-square) versions resulted in inaccurate stress values in a linear stress-strain context. And, it's true, it does seem like when comparing nice square elements vs 0.5 Wonkiness elements serendipity elements result in faster desyncs vs Q9/H27, but it doesn't seem as catastrophic as it looked in the textbook. There may be some realtime context where these basic serendipity elements are a good trade-off.
Post Reply