bullet, bsp, performance

gogiii · Post by **gogiii** » Sat Oct 08, 2011 2:31 pm

Hi all.
I'm trying to make an android game with nice physics inside of closed environments, so I'd chose to use quake 2 bsp as a base for my levels.
First I was using old Tokamak physics engine, which is pretty nice. I used its' terrain callback methods with lookup for closest bsp leaves to generate indices the tokamak should use for collision computations. This lead me to pretty nice and fast rigid box collisions.
But this method had few limitations:
1. I didn't compute cw/ccw triangle indices so sphere collisions had holes with some polygons (I think tokamak uses the directions to compute normals and then use this normals for sphere test)

2. The terrain callback uses triangle tests without ccd and the outside of the world isn't solid this means on low fps (happend while I were optimizing renderer for android) I had boxes running through walls.
3. Tokamak had no character-controller\cloth\etc.
Thats why I decided to give bullet a chance.
So the compilation is pretty simple and I got it working on android and PC.
Then I found the bspdemo inside bullet sdk which made me happy 'cuz hell yeah - that's everything I need and more because no triangle computations needed, only original quake brushes converted with bullet into physical shapes!
But... the performance is SO BAD even on an empty room.
I throw around 30 cubes and get 30fps until they calm down and became static...
With Tokamak I could spam over 2k cubes on pretty huge level with high performance even if they don't calm down and even thought tokamak-based code was doing the physics on per-triangle base! Certanly for each box and its AABB the triangle-list ws regenerated for closest bsp leaves' triangles!
Any ideas how to make something like that on bullets' convexes or anything else?
Could it be that I'm using some wrong initialization?
The convex generation from quake brushes are based on bspdemo, the init is the simplest ever could be:

Code: Select all

m_collisionConfiguration = new btDefaultCollisionConfiguration();
m_dispatcher = new btCollisionDispatcher(m_collisionConfiguration);
m_broadphase = new btDbvtBroadphase();
m_solver = new btSequentialImpulseConstraintSolver;
m_dynamicsWorld = new btDiscreteDynamicsWorld(m_dispatcher, m_broadphase, m_solver, m_collisionConfiguration);
m_dynamicsWorld->setGravity(btVector3(0,-10,0));

I can't get tokamak speed (my video) result

With bullet I get 20 fps @ 50-100 cubes on same scene... this is ridiculous on core 2 cpu... I don't even think to try that on android...
Any ideas on optimizations (take bspdemo from sdk as a base - all brushes fed into bullet and processed by bullet internals)? How can I feed bullet with convex static objects in some range of closest-leafs for each rigid object?
And yes the game won't contain huge amount of rigids, but this is pretty like an example of the speed measurement.

PS: The performance loss is right in m_dynamicsWorld->stepSimulation(...); call. With high-geometry level I get low fps even if no cubes spawned.
Cube spawning:

Code: Select all

btRigidBody *addCube(const btVector3 & origin, float size = cubeScale, float mass = 10.0f) {
	btCollisionShape* colShape = new btBoxShape(btVector3(size, size, size));
	m_collisionShapes.push_back(colShape);

	btTransform startTransform;
	startTransform.setIdentity();
	startTransform.setOrigin(origin);

	btVector3 localInertia(0,0,0);
	colShape->calculateLocalInertia(mass, localInertia);

	btDefaultMotionState* myMotionState = new btDefaultMotionState(startTransform);
	btRigidBody::btRigidBodyConstructionInfo rbInfo(mass, myMotionState, colShape, localInertia);

	btRigidBody* body = new btRigidBody(rbInfo);
	m_rigidBodies.push_back(body);
	m_dynamicsWorld->addRigidBody(body);

	return body;
}

The part of bspdemo code for bsp brush conversion. addConvexVerticesCollider is called for each brush in leaf at loading time:

Code: Select all

btRigidBody *localCreateRigidBody(float mass, const btTransform & startTransform, btCollisionShape *shape) {
	bool isDynamic = (mass != 0.f);

	btVector3 localInertia(0,0,0);
	if (isDynamic)
		shape->calculateLocalInertia(mass,localInertia);
#define USE_MOTIONSTATE 1
#ifdef USE_MOTIONSTATE
	btDefaultMotionState* myMotionState = new btDefaultMotionState(startTransform);
	btRigidBody::btRigidBodyConstructionInfo cInfo(mass,myMotionState,shape,localInertia);
	btRigidBody* body = new btRigidBody(cInfo);
//	body->setContactProcessingThreshold(m_defaultContactProcessingThreshold);
#else
	btRigidBody* body = new btRigidBody(mass,0,shape,localInertia);	
	body->setWorldTransform(startTransform);
#endif//
	m_dynamicsWorld->addRigidBody(body);
	return body;
}

void addConvexVerticesCollider(btAlignedObjectArray<btVector3> & vertices, bool isEntity, const btVector3 & entityTargetLocation)
{
	if (vertices.size() > 0)	{
		float mass = 0.f;
		btTransform startTransform;
		startTransform.setIdentity();
		btCollisionShape* shape = new btConvexHullShape(&(vertices[0].getX()),vertices.size());
		m_collisionShapes.push_back(shape);
		localCreateRigidBody(mass, startTransform, shape);
	}
}

vspyder · Post by **vspyder** » Tue Oct 11, 2011 1:38 am

Are you sure you are compiling in "release" mode? This makes a big difference in the sample applications I have noticed.

gogiii · Post by **gogiii** » Tue Oct 11, 2011 12:02 pm

vspyder, hey, thanks for your answer.
Yep, I've been testing mostly debug version.
Then tried release as you suggested on PC - everything becomes much faster, but still there is a huge impact based on how much statics are added into bullet.

Here's some numbers (vsync off, c2d 2.9ghz, 2gb ram, gf8800):
bullet:
'base1' map, 5k bsp leaves:
debug, 27 fps, no cubes
debug, 17 fps, 67 cubes
release, 300 fps, no cubes
release, 180 fps, 67 cubes

'simple' map, 200 bsp leaves:
debug, 1000 fps, no cubes
debug, 20 fps, 67 cubes
release, 3000 fps, no cubes
release, 300 fps, 67 cubes

tokamak (vsync on, don't have any chance to test as much as bullet because commented it out and only one old build left):
release, 'base1' map, 5k bsp leaves:
no cubes: 60fps
100-1000 cubes (in one room, not calmed down and keeping to throw): still 60 fps (no impact at all)

I think there is something I just don't know about bullet which could optimize static environment processing,
because it's definitly doing a lot of computations even when there is no dynamic objects in scene.
This shows huge fps differences between big and small maps for bullet and a big slowdown when adding rigidbodies.
Feels like it does some computations for statics and moreover doesn't crop\cull them for different dynamic rigidbodies when I add them to the scene. I know there should be AABB tree and lookup must be fast... I probably missed something in init...

As for the speed of tokamak this is just because I send him only those bsp leaves which are close to current rigidbody, so using BSP I cull a LOT around cubes.
Any Idea how can same be done for bullet? Callbacks or what?
The reference manual is a bit complicated

PS:
some statistics from android and bullet (snapdragon 768mhz, adreno 200, not exact results, but around that values, can't really remember):
'simple' map, no cubes, 40 fps
'simple' map, 20 cubes, ~20-25 fps

'base1' map, no cubes, 18 fps
'base1' map, 10+ cubes, 10 fps and less
not too bad, but I'm sure we can get more.

gogiii · Post by **gogiii** » Fri Oct 14, 2011 5:53 pm

I've also noticed big slowdown at exit of my application when using big map in debug mode.

Code: Select all

delete m_dynamicsWorld; // takes a lot of time

Pausing debugger shows pointer at:

Code: Select all

void btHashedOverlappingPairCache::processAllOverlappingPairs(btOverlapCallback* callback,btDispatcher* dispatcher)

...with some huge values inside m_overlappingPairArray, not sure if it does any sense, but for me it looks really strange that there is so much overlapping pairs on a scene with 3 cubes and 1 capsule (dynamic objects) and a 5k of static convexhulls.
Well the statics probably able to overlap each other on bsp map but I'm not sure that bullet have to make pairs between statics.
Also for debug mode if I set some empty nearCallback:

Code: Select all

void nearCallback(btBroadphasePair& collisionPair, btCollisionDispatcher& dispatcher, const btDispatcherInfo& dispatchInfo) {
	return;
}

...it does not affect execution speed at all... I mean I get same 17 fps even thought no collisions happening and all objects dropping through floor\walls.
In release it gives huge speedup.
This really have to be something with huge for-loops based on arrays of overlappingpairs (if they're really processed somewhere inside bullet) which makes everything laggy.

Still looking for a way to feed bullet's rigids only with their closest bsp brushes\triangles each frame as alternative.

gogiii · Post by **gogiii** » Fri Oct 14, 2011 10:08 pm

Okay, I was playing around and decided to make some experiments.
I've already read wiki and it says like btConvexHullShape is the fastest, okay I ignore that now and try btBvhTriangleMeshShape for making level statics just for fun.
So here is the pseudo-code (because it uses my own bsp structure which is converted for better mobile rendering, eg faces regrouped per texture so I have less texture state changes for each leaf)
Here I use geometry (indexed vertices) held in leaves which is used for rendering NOT the quake brushes (multiple planes generating single convexhull which are used in bspdemo)

Code: Select all

void BSP_GenerateTRIMESHcolliders(BSP *bsp) {
	int numLeaves = bsp->leaves.size();
	for(int i = 0; i<numLeaves; i++) {
		BSPLeaf *leaf = &bsp->leaves[i];
		if(leaf->cluster == -1) // skip unreachable leafes
			continue;
		int numFaces = leaf->faces.size();
		if(numFaces < 1) // i have few empty seriously, skip them too
			continue;
		// create triangle mesh for this leaf
		btTriangleMesh *trimesh = new btTriangleMesh();
		BSPFace *faces = &leaf->faces[0];
		for(int j = 0; j<numFaces; j++) {
			BSPFace *face = &faces[j];
			// convert internal face data into bullet compatible data
			// bspface is a bit misscalled here - its' just indices into vertices, 
			// forming multiple triangles in one face object, it's a group of vertices using one texture
			for(int k = 0; k<face->indices.size(); k+=3) { 
				btVector3 v[3];
				for(int t = 0; t<3; t++) {
					point3f vert = bsp->vertices[face->indices[k + t]].position;
					v[t] = btVector3(vert.x, vert.y, vert.z);
				}
				// insert triangle
				trimesh->addTriangle(v[0], v[1], v[2]);
			}
		}
		// create shape and pass into bullet
		btBvhTriangleMeshShape *shape = new btBvhTriangleMeshShape(trimesh, true);
		btTransform startTransform = btTransform::getIdentity();
		localCreateRigidBody(0,startTransform, shape);	// you can find this in previous post or bspdemo in sdk
		// delete trimesh; // will crash later, do not delete, probably used later inside bullet, means the meshshape doesnt hold the trimesh data, just a pointer to it
	}
}

Map: 'base1' from quake 2, 3 rigid cubes and 1 capsule as a player. Vsync: off.
Results:
trimesh release: 700 fps, less than second to generate meshes.
convexhull release: 300 fps, around 1 second to generate hulls.

trimesh debug: 63 fps, 1 second to build meshes
convexhull debug: 23 fps, around 7 seconds to build hulls

Same map after spawning 150 cubes:
trimesh debug 12fps
convexhull debug 9fps
trimesh release 150fps
convexhull release 100fps

How can it be possible that trimesh is much faster at every point: faster generation, faster collision detection.
But as a result there could be a possibility of objects penetrating triangles (still didn't noticed, dunno if it's bulletproof though).
Still can't reach 60 fps when having around 2k cubes

But for 500 cubes it doesn't drop lower than 60fps.

I'll try on android later. Currently busy porting some rendering features.
Also this doesn't really adds any optimization tricks and probably pretty memory intense, so the question is open and I'm still looking for ideas to solve that.

PS *little update to the post*
Some leaves have less than 10 triangles, so I also tried putting just every bsp face of all 'base1' map into one single huge trianglemesh.
This gave 5-10 fps more to speed, looks like internal aabb really works fine in this case.
I have no idea why generating hulls have unbelievable (on huge maps) huge impact on performance.

dphil · Post by **dphil** » Fri Oct 14, 2011 10:54 pm

I think bvhTriangeMeshShapes are optimized for static use, while convex hulls are for general purpose dynamic/static/whatever. Maybe that could by why, alogn with a possibility that the generated convex hulls have a lot more points/faces than you might hope or think. Just some guesses.

NaN · Post by **NaN** » Sat Oct 15, 2011 9:19 am

Hi gogiii.

I've also found bullets speed for static geometry suboptimal.

A few other things I've found out:
- Set setForceUpdateAllAabbs(false); for your world.
- Set setActivationState(DISABLE_SIMULATION); for static collision objects.
- Try to keep the number of static objects small.
- Batch them into a single btBvhTriangleMeshShape if possible, btTriangleIndexVertexArray supports multiple indexed meshes(max speedup for me).

You can call CProfileManager::dumpAll(); after stepSimulation to get an idea where time is wasted. My old thread: http://bulletphysics.org/Bullet/phpBB3/ ... f=9&t=6397

MaxDZ8 · Post by **MaxDZ8** » Mon Oct 17, 2011 10:04 am

NaN wrote:- Batch them into a single btBvhTriangleMeshShape if possible, btTriangleIndexVertexArray supports multiple indexed meshes(max speedup for me).

As I'm considering to squezze some more perf, mind estimating the difference?
I'm considering mangling all the static meshes for graphics batching and perhaps I will do the same for the physics.

Real-Time Physics Simulation Forum

bullet, bsp, performance

bullet, bsp, performance

Re: bullet, bsp, performance

Re: bullet, bsp, performance

Re: bullet, bsp, performance

Re: bullet, bsp, performance

Re: bullet, bsp, performance

Re: bullet, bsp, performance

Re: bullet, bsp, performance