squeen 01-13-07 12:28 PM

I did some side-by-side comparisons of HP xw9300 Linux (RHEL WS 4) systems running a new 8800GTX card vs. an identical system with a Quadro FX 4500. It was not a standard benchmark, but a real application of our visual simulation. One model I tested in particular has the following stats:

543975 vertices 1728 groups 31 materials
551774 tex coords 70 smoothing groups 458189 triangles
1386128 normals 462333 faces 3193 quads
951 polygons 0 lines 2225 geo prims

On the Quadro, with standard OpenGL hardware rendering (single light, single-sided lighting, anti-aliasing on/off, one one texture, etc.) I get a nice 60 FPS (with Vsync on). With the 8800GTX, I get only 15 FPS. Apples to Apples.

We have another, larger model, in which the performance difference was as bad or worse.

Now when I start loading down the simulation with shadows maps, cube maps and GLSL shaders the 8800GTX starts to shine giving me roughly 100% faster frame rates 6 FPS vs 3 FPS). I had previously identified the shader as the bottleneck for sure.

My question is: what is it about the "gamer" cards that gives poor performance on large polycount models? This was the case when I tried a 7800GTX once before as well.

I know we will eventually buy the new Quadro's when they come out (we need overlays, AA lines and stereo), but I would just like to understand better the 8800GTX (intentional?) limitations or what I'm doing wrong in the GL set-up. A magic "Oh, you need to turn off <blank> for the 8800!", is exactly what I'm hoping for. We have about 10 SLI machines running the sim. It would be very helpful in the short term to equip some of them with 8800's for shader performance until the n80 Quadro's arrive.

Incidentally, the 8800GTX did very well with large Framebuffer Object render-to-texture maps. Even the Quadro FX 5500 which we have (1 GB) would not handle 2 simultaneuos 4K x 4K shadow (depth24) maps and a 4K x 4K (RBG8) dynamic cube map...but the 8800GTX did just fine.
Hurray, nVidia, champion of Linux.:) Long live OpenGL!

squeen 01-17-07 05:35 AM

Is this the expected behavior?

Lithorus 01-17-07 09:32 AM

I wouldn't be surprised if the Quadro models have better handling of large models, it's afterall what they are aimed at. What would be interesting is whether a soft-modded geforce would behave similarly. Also remember that G8800 and FX4500 are of 2 different families. G8800 is G80 and FX4500 is G70. Will be interesting to see the G80 variant as a Quadro (a few months from now I've heard)

cohen 01-17-07 02:05 PM

A friend of mine had a similar problem. It turned out he was using a somewhat outdated rendering mode that was no longer the fastest path. What sort of rendering are you doing? Are you using VBOs?


squeen 01-17-07 04:13 PM

@Lithorus: When I (once or twice) tried using a 7800GTX before we got the Quadros (same models) I had exactly the same problem with very slow frame rates. I actually remember thinking it was busting the display list memory size limit (if there is such a thing).

@cohen: No, I'm not using VBO's right now, I'm using display lists (outdated?). If that's an issue, I'd love to know because I can start looking at switching to (static) VBOs...non-trivial, but would be worth it.

I have also heard that VBO and triangle-strips are mutually exclusive. Anyone know if that's true?

Thank you both for the replies.

cohen 01-17-07 04:30 PM

I do think you will probably get your performance back using VBOs. I know that display lists are convenient (if a bit slow to initialize, and a bit of a memory hog). I'm not sure why there would be such a big discrepency, though. BTW, you certainly can use triangle strips with VBOs. They can reduce the space/bandwidth required, but may not improve your performance compared to indexed vertex arrays that are well-ordered (to maximize vertex cache coherence).


squeen 01-18-07 09:12 AM

Great info. I'll give VBO's try. Do you have a reference for proper vertex sorting?

Thanks again.

cohen 01-18-07 10:50 AM

A standard reference on vertex sorting is:

Hoppe, H. 1999. Optimization of mesh locality for transparent vertex caching. In Proceedings of the 26th Annual Conference on Computer Graphics and interactive Techniques International Conference on Computer Graphics and Interactive Techniques. ACM Press/Addison-Wesley Publishing Co., New York, NY, 269-276. DOI=http://doi.acm.org/10.1145/311535.311565

There's an NVIDIA tool at:


One can do a bit better in terms of optimal cache hit ratio, but the tool works pretty well and has a variety of useful options.


cohen 01-18-07 11:00 AM

BTW, my GeForce 7900 GTX beats my Quadro FX 4500 by a little bit on big mesh rendering throughput (idealized benchmark, not real application), so I don't believe this stuff about Quadro being better for large data. Quadro has a few specialized features that are useful for certain applications. If you really need those particular features, the expensive Quadro is for you. Otherwise, go with GeForce.

squeen 01-19-07 05:27 AM

I sincerely hope switching from Displaylists to VBO's does the trick (and thanks for the reference). My gut feeling is that there should be little difference between the two product lines except for AA lines/pts, stereo, and overlays (which we need). That's why this problem threw me for a loop.

It would be great if someone from the nVidia team would comment about GeForce and displaylists.

Thanks cohen.

jolle 01-19-07 05:35 AM

Softmodding my 6800GT makes a nice difference in Maya 8.
The main difference Ive noticed is that when selecting a model, it freezes for a second or so before it brings up the highlighted wireframe on smooth shaded surface, this doesnt happen with the softmod running the card as a Quadro.
I assume this is cause the Geforce drivers arent really supposed to do any wireframe rendering or viewport rendering, while the Quadros are targeting this sort of stuff.

I missed out the chance to try a softmod on the 8800 series as I was reviewing them since the Quadro versions arent out yet.
Looking forward to some tests when the softmod shows up, if anyone is up for it.

