Inno3D Home Page Inno3D Home Page

FAQ News Search Archive Forum Articles Tweaks Technology Files Prices SETI
Visit NVIDIA's home page.
Favorite Pics
Click to Enlarge
Articles/Reviews
OCZ Tech Titan 3
Absolute MORPHEUS
1.0GHz Pentium III
eVGA MX Shootout
nForce Preview
AMD AXIA CPU
VisionTek GeForce3
2001 Spring Lineup
GeForce3 Preview
eVGA TwinView Plus
VisionTek GF2 Ultra
OCZ GeForce2 Pro
Winfast GeForce2 MX
GeForce2 vs Quake3
Linksys Cable Router
GF2 FSAA Shootout
GeForce2 MX Preview
Benchmarking Guide
Quake 3 Tune-Up
Power DVD Review
Live! Experiences
Sponsors
Memory from Crucial.com


FastCounter by bCentral

 Visitors Are Online
Powered by Perlonline.com
Drivers/FAQ
NVIDIA
3D Chipset
Gamers Ammo
Reactor Critical
GeForce FAQ
Associates
Dayton's Misc.
G-Force X Sweden
Maximum Reboot
Media Xplosion
NVchips-fr
nV Italia
Riva Station
3D GPU
nV News Home Page

3dfx Challenges GeForce2 T&L

Steve Mosher Responds

By: Brian Gray - May 11, 2000


The History and The Challenge

Things are getting a little heated out there. Voodoo Extreme's recent interview of Pete Wicher was full of loaded questions to address the competition. After a response from Steve Mosher of Creative Labs, I received an email in the nV News complaint box from Brian Burke of 3dfx. Here is what BB offered up:


I am a little confused by Steve's response and was hoping that your site could clarify the issue for everyone.

Steve says that:

"When asked about the T&L improvements in the GF2, I think Peter really misrepresented the facts, quoting some nonsense about low res Q3 numbers. The easiest way to see the increase in T&L power is to take a test that STRESSES T&L, like tree, and run it on both systems. Bottom Line: the core clock boost alone ( 120 to 200 ) gives you a tremendous boost in T&L power."

Peter said this:

"It looks like a frequency fix, going from 120MHz to 200MHz. That's a 67% jump in frequency but the performance increase doesn't track with the frequency increase. You can look at the independent reviews from Firingsquad and Anand. Q3 scores at low rez do NOT improve, but this is where you're T&L limited!

Is that an untrue statement? Are you not T&L limited at low rez in Q3? If it is incorrect, we would like to be corrected. The problem with Tree is that it's nVidia's own demo. It's pretty obvious that they can cook it to give any result that they want. Q3 and SOF are real games, they are truly impartial tests. What's especially interesting to us is that Mosher did not attempt to address nVidia's track record of missing their marketing claims, GeForce2 GTS's problems with D3D title compatibility, or the fact that with AA enabled the V5 crushes the GeForce2 GTS in nVidia's own level of Q3.

And this was pulled from the nVidia press release:

Major 3D features of the GeForce2 GTS include: Second generation transform and lighting engines

So that no one is misrepresenting the facts, which is it : a "second generation T&L engine" or the result of the same T&L engine and "the core clock boost alone"?

And why doesn't the performance increase to scale with the frequency ....

Bottomline???

3dfx seconds his suggestion to test claims and encourages all sites to test manufacturers claims and make manufacturers live up to the specs that they release.

Thanks
BB
Senior PR Manager
3dfx, Inc.


The Response

I quickly forwarded this to Steve Mosher, as I was not going to put words in his mouth.

Here is the response I received from Steve Mosher. It gets fairly in-depth as to system limitations vs. T&L limitations of the GeForce2. Using the same benchmarks Pete Wicher used to claim that the T&L throughput of GeForce2 was limited, Steve shows the error in Pete's understanding of system performance.


Thanks for asking me to explain the low resolution Q3 numbers. Generally, speaking I think most if not all of the web sites out there (like nV News and the Firing Squad) understand the concept of being CPU limited, but it appears that some guys interpreting the Firing Squad numbers don't understand. I will use the Firing Squad numbers Pete Wicher cited to explain the concept, and hopefully shed some light on benchmarking.

First we need to understand a little bit about how processing happens in a game. The most important thing to note is that the graphics portion happens is parallel. That is, while the CPU is working on the AI and game logic the graphics card is drawing. Because they happen in parallel we have three possibilities:

  1. The CPU takes longer to finish its job.
  2. The graphics takes longer to finish its job
  3. They both take the same time... (and I win the lottery)

If we are in case A, then the graphics card always completes its drawing before the CPU completes its tasks. We call that being CPU bound. Your frame rate, when CPU bound will be a strict function of your CPU speed. Increase the speed of your CPU and you will see a direct 1 to 1 increase in frame rate.


Case A

For example, lets suppose the CPU takes 33.3milliseconds and the graphics take 5 milliseconds. Our frame rate will be 1000/33 or 30 Frames per second. If we double the speed of the CPU. We cut 33 milliseconds down to 16.6 Since 16 ms is still greater than 5, we are still CPU bound, but we double our frame rate to 1000/16 or 60FPS. This is one way you tell if a game is CPU bound. Crank up the CPU frequency and see what happens to the frame rate. Lets look for example at the Firing Squad numbers since they did this. (Thanks to the Squad for a complete testing methodology!)


CPU Limitations

At this resolution the time spent by the graphics card is totally hidden by the time used by the CPU. You can see that most plainly by looking at the scaling with CPU speed. The increase in frame rate is directly proportional to the increase in CPU clock! (The quickest way to check this is as follows: 119/77 compared to 867/566, always check this ratio!) What this means is that the frame rate is entirely determined by the CPU, the graphics card is waiting on the CPU at this resolution. Now lets compare the GeForce number with the Geforce2 GTS numbers at 512*384:


GeForce vs. GeForce2

How is this possible? Isn't the GeForce2 much faster? Well it's simple. At this resolution you are CPU bound. In the case of the Geforce 2 the graphics happens twice as fast, but the frame rate is still determined by the CPU speed. In fact the graphics could happen INFINTELY FAST, and at 512*384 your frame rate would still be 119.

When you are CPU bound it means you can crank the resolution up without performance penalty. Why? It all comes back to the diagram above. If the CPU is taking 8.5 Milliseconds to do its job (117 FPS) and the graphics is only taking 2 ms, then you should be able to quadruple the graphics work load and not see a drop on frame rate. The easiest way to increase the workload is to increase the resolution. This also serves as a check about whether we are CPU bound or not. Again to the Firing squad numbers:


GeForce2 - CPU Limits

These are essentially equal frame rates (probably within the bounds of accuracy of timing, but nobody quotes the statistical variation in Q3 testing). At 512*384 you have 200K pixels. At 800*600 you have 480K pixels. The graphics is doing more work but not taking anymore time. How is that possible?. It's simple. . When you are CPU bound you can increase the time spent by the graphics without impacting frame rate.

Back to the Firing Squad numbers. We see at 1024*768 (about 750K pixels) the frame rate starts to drop off at little moving down to 113 FPS... We see that the frame rates for the GeForce2 do not drop off appreciably until 1280*1024, where it drops to 98 FPS. At this resolution you are drawing 1.3 Million pixels or 6 times as many pixels as at 512 *384.

To accurately measure the performance improvement in geometry throughput, you have to test at resolutions where the performance is dominated by the time spent on graphics. The difficulty with this, however, is that the geforce2 has more than just a boost in geometry power. The rasterizer has multitexture improvements and the backend bandwidth has been raised. So, to measure the improvement in the geometry section of the chip, you have to select a test that is not CPU bound and NOT fillrate bound or texelrate bound.

The bottom line is this. If you want to judge the claims about performance in a geometry engine, you have to measure it with a test that can logically show a difference. At 512*384, You CANNOT measure differences between a GeForce and GeForce2 because the time of the test is largely a function of the CPU. At higher resolutions you can see the improvement from added geometry power, but it is mixed with the other benefits of the GF2, namely better texel rates and better fill rates.

Again, I think it would be constructive to select a test that is geared toward measuring geometry rates and compare the GeForce to the GeForce2. Maybe we could sponser a code writing contest ? I think I have a card I could give away. Maybe nV News, Voodoo Extreme and Firing Squad could help judge the winner?

Steve Mosher
V.P Graphics Business - Product Group
Creative Labs


Conclusion

There you have it. Any of you want to code a T&L test for a video card? Let me know.

Regardless, Steve's answer should certainly clear up some of the misconceptions about Pete Wicher's comments on T&L limitations.

My question is, how do I always end up in the middle of this stuff? I am beginning to think I bring it on myself.



Last Updated on May 11, 2000

All trademarks used are properties of their respective owners.