Bullet on GPU

User avatar
SteveBaker
Posts: 127
Joined: Sun Aug 13, 2006 4:41 pm
Location: Cedar Hill, Texas

Post by SteveBaker »

Erwin Coumans wrote:Here is some good previous discussion on GPU physics:
http://www.gamedev.net/community/forums ... _id=383736
Local cached copy (pdf)
The PhysX hardware gizmo seems to be pretty lame. For the test setup we put together at work, if you have a fairly underloaded CPU, it's hard to tell whether the hardware is turned on or off. The nVidia physics stuff requires one graphics card for the physics and a separate one for the graphics - which is a little ugly IMHO. I don't know whether we can do better - but it's a very new field and there is scope for innovation.
Steve, will you make some progress this weekend on a simplified GPU rendering sample?
Yep - I'm on the case.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Post by Erwin Coumans »

SteveBaker wrote:The nVidia physics stuff requires one graphics card for the physics and a separate one for the graphics
The Havok GPU demos run on single ATI and NVidia cards/GPUs. I have played on several exhibitions with this. It just works faster when it can have its dedicated GPU (SLI).

Thanks for your work. I just updated the Bullet 2.0 user manual. Now I want to create the basic forklift demo. Then this will be released this weekend as Bullet 2.0 version.

Thanks,
Erwin
User avatar
SteveBaker
Posts: 127
Joined: Sun Aug 13, 2006 4:41 pm
Location: Cedar Hill, Texas

Post by SteveBaker »

Erwin Coumans wrote: The Havok GPU demos run on single ATI and NVidia cards/GPUs. I have played on several exhibitions with this. It just works faster when it can have its dedicated GPU (SLI).
Oooohhhh! I didn't know that. I'll pass that on to my physics crew at work.
Thanks for your work. I just updated the Bullet 2.0 user manual. Now I want to create the basic forklift demo. Then this will be released this weekend as Bullet 2.0 version.
Awesome - I'm off to read that now just as soon as I......
User avatar
SteveBaker
Posts: 127
Joined: Sun Aug 13, 2006 4:41 pm
Location: Cedar Hill, Texas

Post by SteveBaker »

ANNOUNCE GPUphysics-0.2

I've added some debugging options into the GPU physics demo program. You can download it from:

http://www.sjbaker.org/tmp/GPUphysics-0.2.tgz

* Merged in your MacOSX support.
* Added a ton of command-line debug options (see DEBUGGING_README)
* Changed to ZLib license.

My hope is that by running it with the various command line options, we can narrow the problem down to just one of the 'exotic' features that the program uses.

1) Run without any shaders at all:

Code: Select all

 GPU_physics_demo -s
This should produce a grey screen with a neat grid of randomly coloured cubes that are sitting completely motionless. If this doesn't work then render-to-texture and shaders are not the problem and we have some very basic OpenGL problem to worry about.

2) Run with just one shader - but no render-to-texture:

Code: Select all

 GPU_physics_demo -p
Just like '-s', this should produce a grey screen with a neat grid of randomly coloured cubes that are sitting completely motionless. This time, the vertex shader is reading the positioning information from a texturemap. If this doesn't work then render-to-texture isn't the problem but something is amiss in shader-land.

There are several possibilities - the nastiest of which might be that either:

a) Your graphics card/driver doesn't support floating point textures. (This is pretty much 'Game Over' for you because without that, doing physics in the GPU is going to be virtually impossible).

b) Your graphics card/driver doesn't support vertex shader textures (or it supports them but sets the maximum number to zero - which is the same thing). This means that we can't move things around using GPU textures - but we can still use the GPU to accellerate physics calculations. In practical game scenarios, I think the CPU needs to know where all the objects are - so this may not be the serious issue it sounds like. What it mostly does is to clobber the idea of running physics on particle system types of effect where a vast number of objects are involved but where individual objects have zero effect on game play.

3) Run without forces being applied:

Code: Select all

GPU_physics -f
This sets the cubes off moving at constant speed (each cube going at a different speed and spinning).

4) Run without collision against the ground:

Code: Select all

GPU_physics -c
The cubes move under gravity - but don't interact with the notional ground plane so the eventually fall off the bottom of the screen..

If either (3) or (4) fails but (1) and (2) worked then the problem is probably something to do with render-to-texture.

This is the most likely scenario and it is a pain - but fixable.

5) Run with everything:

Code: Select all

GPU_physics
...exactly as per my previous releases (ie everything is enabled).
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Post by Erwin Coumans »

thanks a lot for the new version.

I just tried under Mac, only the first one runs and displays cubes.
See below the results with -p and -f

Code: Select all

erwin-coumans-computer:~/Downloads/GPUphysics apple$ ./GPU_physics_demo -p
Compiling:cubeShader.vert - 
ERROR: 0:39: 'assign' :  l-value required "gl_Vertex" (can't modify an attribute)

Failed to compile shader 'cubeShader.vert'.
GPU_physics_demo.cxx:389: failed assertion `cubeShader -> compiledOK ()'
Abort trap

Code: Select all

erwin-coumans-computer:~/Downloads/GPUphysics apple$ ./GPU_physics_demo -f
Compiling:CollisionGenerator Frag Shader - 
ERROR: 0:1: '<' :  wrong operand types  no operation '<' exists that takes a left-hand operand of type 'float' and a right operand of type 'const int' (or there is no acceptable conversion)

Failed to compile shader 'CollisionGenerator Frag Shader'.
GPU_physics_demo.cxx:294: failed assertion `collisionGenerator -> compiledOK ()'
Abort trap
I will try this test on a few other machines and let you know the results.
Thanks!
Erwin
User avatar
SteveBaker
Posts: 127
Joined: Sun Aug 13, 2006 4:41 pm
Location: Cedar Hill, Texas

Post by SteveBaker »

Erwin Coumans wrote:thanks a lot for the new version.

I just tried under Mac, only the first one runs and displays cubes.
(Which sadly only shows that basic OpenGL features are working.

Code: Select all

Compiling:cubeShader.vert - 
ERROR: 0:39: 'assign' :  l-value required "gl_Vertex" (can't modify an attribute)
DAMN! I knew about that - but forgot to fix it. Please grab:

http://www.sjbaker.org/tmp/GPUphysics-0.3.tgz

...try again! Sorry...my bad.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Post by Erwin Coumans »

I tried 0.3 under windows, and it fails at linking stage (-p)
Will let you know about OS X in one reboot :)

Thanks for the quick turnaround.
Erwin

The -p version:

Code: Select all

Linking:CubeShader -
 Link failed.
Validate:CubeShader -
 Link failed. Validation failed - link has not been called or link has failed.
Failed to link shader.
Assertion failed: cubeShader -> compiledOK (), file .\GPU_physics_demo.cxx, line
 391
The -f version gives:

Code: Select all

Linking:PositionGenerator -
 Link successful. The GLSL vertex shader will run in hardware. The GLSL fragment
 shader will run in hardware.
Validate:PositionGenerator -
 Link successful. The GLSL vertex shader will run in hardware. The GLSL fragment
 shader will run in hardware. Validation successful.
Linking:CollisionGenerator -
 Link successful. The GLSL vertex shader will run in hardware. The GLSL fragment
 shader will run in hardware.
Validate:CollisionGenerator -
 Link successful. The GLSL vertex shader will run in hardware. The GLSL fragment
 shader will run in hardware. Validation successful.
Linking:CubeShader -
 Link failed.
Validate:CubeShader -
 Link failed. Validation failed - link has not been called or link has failed.
Failed to link shader.
Assertion failed: cubeShader -> compiledOK (), file .\GPU_physics_demo.cxx, line
 391
It seems, even using option -p under Mac OS X, the shader link fails, due to lack of vertex shader sampler. I will try an NVidia card later. It would have been nice if this ATI X1600 could work.

Code: Select all

erwin-coumans-computer:~/Downloads/GPUphysics apple$ ./GPU_physics_demo -p
Linking:CubeShader - 
ERROR: Implementation limit of 0 active vertex shader samplers (e.g., maximum number of supported image units) exceeded, vertex shader uses 2 samplers

Validate:CubeShader - 
ERROR: Implementation limit of 0 active vertex shader samplers (e.g., maximum number of supported image units) exceeded, vertex shader uses 2 samplers

Validation Failed: Program is not successfully linked.

Failed to link shader.
GPU_physics_demo.cxx:389: failed assertion `cubeShader -> compiledOK ()'
Abort trap
User avatar
SteveBaker
Posts: 127
Joined: Sun Aug 13, 2006 4:41 pm
Location: Cedar Hill, Texas

Post by SteveBaker »

Code: Select all

Assertion failed: cubeShader -> compiledOK (), file .\GPU_physics_demo.cxx, line
 391
Hmm - so it says that the shader failed to link - but didn't tell us why?!
It seems, even using option -p under Mac OS X, the shader link fails, due to lack of vertex shader sampler. I will try an NVidia card later. It would have been nice if this ATI X1600 could work.
Well, the ability to read textures in the vertex shader is very useful - but we CAN do without it.

Code: Select all

ERROR: Implementation limit of 0 active vertex shader samplers (e.g., maximum number of supported image units) exceeded, vertex shader uses 2 samplers
Yes - so this is one of those "We support vertex shader samplers - and the number we support is zero."....thanks guys!
User avatar
SteveBaker
Posts: 127
Joined: Sun Aug 13, 2006 4:41 pm
Location: Cedar Hill, Texas

Post by SteveBaker »

I'm getting a bit confused as to what is happening on what machines - so let me see if I have this right:

With '-c', both MacOSX and Windows produce a grey screen with a bunch of teeny-tiny cubes sitting stationary in the bottom half of the screen.

With '-p' (no physics - but with a vertex shader that reads textures):

1) Mac OSX doesn't mind compiling the 'cubeShader' shader code - but can't link/run it because it's ATI card doesn't support more than zero textures ("samplers") being read into the vertex shader. That's not unexpected - ATI hardware often lags nVIdia on features like this.

2) Under Windows (you talk about 'rebooting' - so this is the same hardware presumably?) - it also won't link - but it doesn't say why.

But if the underlying hardware is the same in both cases - and if your ATI graphics chip truly can't do vertex shader textures/samplers - then the difference between OSX and Windows is only that the OSX driver is a bit more helpful about the nature of the problem.

When you use '-f', other shaders are also required - they compile OK in both OS's (although OSX insists on telling you about it - where Windows is silent) - but the 'cubeShader' still kills us.

I think this is all due to the ATI card - I think that if you have an nVidia card, it will all "just work".

I'll try to find some time to eliminate the texture read in the vertex shader on hardware that doesn't support it - but it will be a less impressive demo on those kinds of hardware because the rotations and translations that were computed on the GPU will have to be read back into the CPU in order for them to affect the positions of the cube models. Also, in order for each cube to move differently, I need to send each one individually down the graphics pipe with 'glRotate/glTranslate' calls stuffed in between them. That's going to be a lot slower than shoving all 16,000 of them down into the GPU and issuing a single, simple call to say "Draw all of them - using positions coming from this texture and rotations coming from that texture".

But at least you'll be able to see something working.

In practical games/simulation applications, the CPU is likely to need to know where objects are for other reasons - so it's likely that this isn't a practical problem - just something that's annoying for the demo.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Post by Erwin Coumans »

Hi Steve,

indeed, the underlying hardware/drive on ATI X1600 seems to not want to link, and OS X gives more feedback on why.
In practical games/simulation applications, the CPU is likely to need to know where objects are for other reasons - so it's likely that this isn't a practical problem - just something that's annoying for the demo.
Well, I was hoping to tell the GPU which objects require feedback/readback (in a texture). Most interactions are probably not interesting for the CPU, except the ones involving game play elements (player, car, etc).

I just did a test on a Nvidia under Windows. The first 2 tests work, and test 3/4 run with a grey screen. This is the glew info
So -s and -p work fine on that card.

Thanks,
Erwin
User avatar
SteveBaker
Posts: 127
Joined: Sun Aug 13, 2006 4:41 pm
Location: Cedar Hill, Texas

Post by SteveBaker »

Erwin Coumans wrote:
In practical games/simulation applications, the CPU is likely to need to know where objects are for other reasons - so it's likely that this isn't a practical problem - just something that's annoying for the demo.
Well, I was hoping to tell the GPU which objects require feedback/readback (in a texture). Most interactions are probably not interesting for the CPU, except the ones involving game play elements (player, car, etc).
I'm surprised you say that - I'd have guessed the reverse. In the things that I've been playing with, eye candy effects don't seem like they'd generally need the full power of a physics engine - so this technique (for me) was mostly about accellerating the kinds of physics that the game engine *IS* interested in....but I guess a lot depends on what kind of thing you have in mind.
I just did a test on a Nvidia under Windows. The first 2 tests work, and test 3/4 run with a grey screen.
Well, that's a step forward at least. So we know that vertex textures work - but my render-to-texture stuff doesn't. OK - well I can add some more tests. It's painful - but at least we're making progress.

Once this core set of routines all work (or at least when we understand why they don't), progress should be a lot faster.

Meanwhile, I have a new version that avoids the vertex texture thing.

Unfortunately I can't upload it onto my machine at home because I'm at work now and our firewall is a bit over-enthusiastic. I don't seem to be able to post attachments via the Bullet forum - and I don't have your email address to email it to you - so I guess you'll have to wait until I get home in a few hours.

It works - but it seems to be running about six times slower than the regular version! :-(
(145Hz with vertex textures, 23Hz without - for 16,000 cubes).

Hopefully I can squeeze some time out of that - I did it kinda crudely just to get it working quickly.

Anyway - I'll put it up on my web site tonight and let you know when it's there.

I don't see any significant problems from the glewinfo report - I ran the same program on one of my Linux/nVidia boxes and got an almost identical report back. (Apart from all of the glX/wgl extensions of course). Your hardware claims to be a 6800 - mine is a 6800 Ultra and you have a slightly newer version of GLEW than I do. Your OpenGL claims to be 2.0.3 and mine is 2.0.1 with driver version 81.78 Your computer says it's using the PCI bus to talk to the graphics card - mine is using AGP.

None of those things seem to be significant.

Do you get any messages out of my code?

The one thing that I *suspect* may be a bit 'iffy' about how I'm doing things is that when I do something like 'new_position = old_position + velocity * delta_T' - I bind the same texture as the input ('old_position') and the output ('new_position'). This is potentially dubious - and it's something I never tried to do before - but it works just fine in Linux/nVidia - so I assumed it was OK. The savings aren't all that significant - so maybe I'll read the fine print of the OpenGL extension to find out whether it's actually legal - and perhaps re-jigger the code so it doesn't do that.
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Post by Erwin Coumans »

SteveBaker wrote:
Erwin Coumans wrote:
In practical games/simulation applications, the CPU is likely to need to know where objects are for other reasons - so it's likely that this isn't a practical problem - just something that's annoying for the demo.
Well, I was hoping to tell the GPU which objects require feedback/readback (in a texture). Most interactions are probably not interesting for the CPU, except the ones involving game play elements (player, car, etc).
I'm surprised you say that - I'd have guessed the reverse. In the things that I've been playing with, eye candy effects don't seem like they'd generally need the full power of a physics engine - so this technique (for me) was mostly about accellerating the kinds of physics that the game engine *IS* interested in....but I guess a lot depends on what kind of thing you have in mind.
I think GPU physics is most useful in destruction and debris effects.
And also in adding additional small street furniture, like chairs, post boxes, little bins, street signs etc etc. that is hanging around. You don't need their transforms, unless a car/player hits it.

Say a building partially collapses, and lots of debris is flying around and settles down (bricks etc). It is nice if debris stacks, interacts and settles down nicely. A simple particle system typically is too simple for this task: GPU physics includes rotations, stacking and can approximate shapes better.
For most of this debris you are not interested in their transforms, unless they hit the player.
I just did a test on a Nvidia under Windows. The first 2 tests work, and test 3/4 run with a grey screen.
Well, that's a step forward at least. So we know that vertex textures work - but my render-to-texture stuff doesn't. OK - well I can add some more tests. It's painful - but at least we're making progress.

Once this core set of routines all work (or at least when we understand why they don't), progress should be a lot faster.

Meanwhile, I have a new version that avoids the vertex texture thing.

Unfortunately I can't upload it onto my machine at home because I'm at work now and our firewall is a bit over-enthusiastic. I don't seem to be able to post attachments via the Bullet forum - and I don't have your email address to email it to you - so I guess you'll have to wait until I get home in a few hours.

It works - but it seems to be running about six times slower than the regular version! :-(
(145Hz with vertex textures, 23Hz without - for 16,000 cubes).

Hopefully I can squeeze some time out of that - I did it kinda crudely just to get it working quickly.

Anyway - I'll put it up on my web site tonight and let you know when it's there.

I don't see any significant problems from the glewinfo report - I ran the same program on one of my Linux/nVidia boxes and got an almost identical report back. (Apart from all of the glX/wgl extensions of course). Your hardware claims to be a 6800 - mine is a 6800 Ultra and you have a slightly newer version of GLEW than I do. Your OpenGL claims to be 2.0.3 and mine is 2.0.1 with driver version 81.78 Your computer says it's using the PCI bus to talk to the graphics card - mine is using AGP.

None of those things seem to be significant.

Do you get any messages out of my code?

The one thing that I *suspect* may be a bit 'iffy' about how I'm doing things is that when I do something like 'new_position = old_position + velocity * delta_T' - I bind the same texture as the input ('old_position') and the output ('new_position'). This is potentially dubious - and it's something I never tried to do before - but it works just fine in Linux/nVidia - so I assumed it was OK. The savings aren't all that significant - so maybe I'll read the fine print of the OpenGL extension to find out whether it's actually legal - and perhaps re-jigger the code so it doesn't do that.
Thanks for the effort. No additional messages, just a grey screen on NVidia. I had to #ifdef a few things under windows. the fopen requires "r" instead of "ra" as mode, no random() but rand(), and no includes for time.h and unistd.h. But that's less then a minute work each time you give me a new drop ;-)

Would it help if I add GPUphysics code into Bullet Subversion, in the Bullet/Extra/GPUphysics?
Erwin

PS: Sorry for not finishing the ForkLift yet, I added documentation and reserved files for the demo, but didn't have the time to actually implement it.
User avatar
SteveBaker
Posts: 127
Joined: Sun Aug 13, 2006 4:41 pm
Location: Cedar Hill, Texas

Post by SteveBaker »

Erwin Coumans wrote: Well, I was hoping to tell the GPU which objects require feedback/readback (in a texture). Most interactions are probably not interesting for the CPU, except the ones involving game play elements (player, car, etc).
OK - well with the nVidia boards we can certainly do that. But I've checked carefully and the word is that NO ATI GPU's support vertex textures. So I guess we're waiting for them to play catchup here. But then the nVidia '5xxx' series don't support this stuff either - so it was always going to be something we'd have to be able to turn off for any widespread audience.
I think GPU physics is most useful in destruction and debris effects.
And also in adding additional small street furniture, like chairs, post boxes, little bins, street signs etc etc. that is hanging around. You don't need their transforms, unless a car/player hits it.
Yeah - I guess. Destruction is certainly the thing that concerns me the most when I'm wearing my 'work hat'. Accurate simulation of building damage is a hot button item in Military simulation.
Say a building partially collapses, and lots of debris is flying around and settles down (bricks etc). It is nice if debris stacks, interacts and settles down nicely. A simple particle system typically is too simple for this task: GPU physics includes rotations, stacking and can approximate shapes better.
For most of this debris you are not interested in their transforms, unless they hit the player.

It's the "unless they hit the player" thing that bothers me - but I guess if the collision detection happens in the GPU too then we can arrange only to bring stuff back into the CPU if there actually was a collision. The other place it might be of concern would be in route-planning for AI entities. If a building collapse blocks their route - they need to know about it. If a hole appears in a building - providing an alternate route - they need to know that too.
Thanks for the effort. No additional messages, just a grey screen on NVidia. I had to #ifdef a few things under windows. the fopen requires "r" instead of "ra" as mode, no random() but rand(), and no includes for time.h and unistd.h. But that's less then a minute work each time you give me a new drop ;-)
Oh - yeah. I forgot. I know most of those little gotcha's I'll try to nail them in 0.4 tonight.
Would it help if I add GPUphysics code into Bullet Subversion, in the Bullet/Extra/GPUphysics?
That's a good idea. My sourceforge account name is 'sjbaker'.
PS: Sorry for not finishing the ForkLift yet, I added documentation and reserved files for the demo, but didn't have the time to actually implement it.
That's OK - I have enough other things to keep me busy!
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA

Post by Erwin Coumans »

It's the "unless they hit the player" thing that bothers me - but I guess if the collision detection happens in the GPU too then we can arrange only to bring stuff back into the CPU if there actually was a collision. The other place it might be of concern would be in route-planning for AI entities. If a building collapse blocks their route - they need to know about it. If a hole appears in a building - providing an alternate route - they need to know that too.
Probably I was not clear: I was assuming we do the broadphase on the CPU. Based on the axis aligned bounding box, we can upload a 2D bitmap that tells wether a pair of objects can potentially has collisions or not. This same bitmap can also encode wether we are interested in collision feedback.

At the end of the frame, we just read back the AABB's, to update the broadphase on CPU, and readback the transforms, ONLY for the objects that we expressed interest in. This is encoded in the broadphase bitmap.

Please also see broadphase description in page 10 of the Bullet user manual

At the moment, the Broadphase interfaces with the OverlappingPairCache. We could modify this with a GpuOverlappingPairCache, where we implement

Code: Select all

void	AddOverlappingPair(BroadphaseProxy* proxy0,BroadphaseProxy* proxy1);
void	RemoveOverlappingPair(BroadphaseProxy* proxy0,BroadphaseProxy* proxy1);
and this just sets or reset a bit in a 2d bitmap. This bitmap gets uploaded each frame to the GPU. In addition to this, the bitmap can encode wether we are interested in feedback. This can be done, because the bitmap has additional information, the upper part above the main diagonal has the broadphase info, the lower part below the main diagonal the 'additional' collision feedback request info. So the player has '1' in its row and column for that matrix.
User avatar
SteveBaker
Posts: 127
Joined: Sun Aug 13, 2006 4:41 pm
Location: Cedar Hill, Texas

Post by SteveBaker »

Erwin Coumans wrote: Probably I was not clear: I was assuming we do the broadphase on the CPU. Based on the axis aligned bounding box, we can upload a 2D bitmap that tells wether a pair of objects can potentially has collisions or not. This same bitmap can also encode wether we are interested in collision feedback.
But with massive parallelism (think 48 CPU's!) - we need to think a bit differently.

Since all of the GPU's processors run in lockstep (it's a SIMD machine), the test to decide whether not to do something takes time - whereas doing it redundantly comes 'for free'. So rather than flagging which pairs of objects to compare, it'll probably be significantly FASTER to just mindlessly compare them all!

Also, the results come in the form of a big block of results - all of which arrive in parallel. Again - it's vastly faster to mindlessly read back all of the results than it is to read back a flag - then test the flag - then conditionally read back results.

We can't treat the GPU as a fast CPU - in fact, each individual processor isn't really all that fast. Where we get speed is from the insane parallelism (48 CPU's!).
At the end of the frame, we just read back the AABB's, to update the broadphase on CPU, and readback the transforms, ONLY for the objects that we expressed interest in. This is encoded in the broadphase bitmap.
When reading things back into the CPU from the GPU, the setup time totally dwarfs the bandwidth considerations (especially on PCI-express). So rather than picking the half dozen objects we care about and incurring a half dozen setup costs - it'll probably be significantly faster to mindlessly read them all back - incurring just one setup cost.

But the tradeoff depends on the ratio of the number of objects out there versus the number we care about compared to the ratio of data transfer time to setup time.
Please also see broadphase description in page 10 of the Bullet user manual
Comparison of a vast number of AABB's to one particular AABB is something the GPU can do insanely fast. So the cost to compare every AABB with every other AABB is going to be fairly small - even if we did it mindlessly and without any kind of finesse at all.

So Broadphase and midphase might end up being parallelizable for relatively low cost. (Of the order of one or two polygons per object - which has to be seen in the context of the time needed to render that object - which might be made of dozens of polygons).

Narrowphase seems like something we need to do on the CPU...but I don't know enough about this stuff.