I did some side-by-side comparisons of HP xw9300 Linux (RHEL WS 4) systems running a new 8800GTX card vs. an identical system with a Quadro FX 4500. It was not a standard benchmark, but a real application of our visual simulation. One model I tested in particular has the following stats:
543975 vertices 1728 groups 31 materials
551774 tex coords 70 smoothing groups 458189 triangles
1386128 normals 462333 faces 3193 quads
951 polygons 0 lines 2225 geo prims
On the Quadro, with standard OpenGL hardware rendering (single light, single-sided lighting, anti-aliasing on/off, one one texture, etc.) I get a nice 60 FPS (with Vsync on). With the 8800GTX, I get only 15 FPS. Apples to Apples.
We have another, larger model, in which the performance difference was as bad or worse.
Now when I start loading down the simulation with shadows maps, cube maps and GLSL shaders the 8800GTX starts to shine giving me roughly 100% faster frame rates 6 FPS vs 3 FPS). I had previously identified the shader as the bottleneck for sure.
My question is: what is it about the "gamer" cards that gives poor performance on large polycount models? This was the case when I tried a 7800GTX once before as well.
I know we will eventually buy the new Quadro's when they come out (we need overlays, AA lines and stereo), but I would just like to understand better the 8800GTX (intentional?) limitations or what I'm doing wrong in the GL set-up. A magic "Oh, you need to turn off <blank> for the 8800!", is exactly what I'm hoping for. We have about 10 SLI machines running the sim. It would be very helpful in the short term to equip some of them with 8800's for shader performance until the n80 Quadro's arrive.
Incidentally, the 8800GTX did very
well with large Framebuffer Object render-to-texture maps. Even the Quadro FX 5500 which we have (1 GB) would not handle 2 simultaneuos 4K x 4K shadow (depth24) maps and a 4K x 4K (RBG8) dynamic cube map...but the 8800GTX did just fine.
Hurray, nVidia, champion of Linux.
Long live OpenGL!