Home FAQ Search Archive Forum IRC Prices Reviews Tweaks Benchmarks Files DistComp


Product Search
Search
for


Shop Online
AMD64
Compare at AMD
4200+ X2
5600+ X2
6400+ X2
Phenom 9500
Phenom 9550
Phenom 9600
Phenom 9850
Intel Core 2
Compare at Intel
Duo E4500
Duo E8400
Duo E6750
Duo E6850
Quad Q6600
Quad Q9300
Quad QX9650
Drivers/Support
Events
Articles
Associates
GeForce 6 Series Question & Answer with NVIDIA - Page 1 of 1

INTRODUCTION

nV News was given the opportunity to conduct a question and answer session with NVIDIA's Tony Tomasi, Senior Director of Desktop Product Management and Ujesh Desai, General Manager of Desktop Graphics Processing Units. The questions are naturally related to the GeForce 6 Series, which was announced by NVIDIA on April 14.

GeForce 6 Series Logo

I would like to thank the group of nV News staff and selected visitors who assisted us in compiling a list of questions for NVIDIA. A total of 30 questions were submitted and I had the unfortunate duty of trimming the list down to 10 as requested by NVIDIA. Since most of the submitted questions were technically oriented, I chose to keep the Q&A along those lines.

nV News:

What are the primary factors that would lead to a difference in performance between FP16 and FP32 calculations with the GeForce 6800?

Tony Tomasi:

FP16 uses less storage than FP32, and GeForce 6800 supports FP16 texture filtering and frame buffer blending in hardware. In particular, support for FP16 texture filtering and frame buffer blending can be a very significant performance win for high dynamic range applications. There are also some operations that can be performed faster, or in some cases "for free" using FP16 in the shading hardware. For example, partial precision normalize (FP16 normalize) is essentially "free" on GeForce 6800 hardware.

nV News:

X-bit labs has shown benchmark results (chart at bottom of page) of the GeForce 6800 Ultra generating more than its theoretical limit of 32 z-pixels per clock cycle when color writes are disabled. Are these results correct?

Tony Tomasi:

Depending on the impact of occlusion culling, one could measure rates beyond 32 pixels per clock of effective fill. But the Z-ROP hardware in GeForce 6800 is capable of 32 pixels per clock of z/stencil-only rendering. Rates beyond that would be due to other factors. High z/stencil rendering rates are a nice win for applications that do 2-pass shadow algorithms, like Doom3, which is why GeForce 6800 is capable of rendering z/stencil-only at such high rates.

nV News:

We have seen partial precision benchmark results (compare "PS 2.0 - Simple" to "PS 2.0 PP - Simple" in first table) on the GeForce 6800 drop in performance over full precision. Is it possible that the GeForce 6800 could have lost performance in some applications due to having Pixel Shader 2.0 shaders optimized for the GeForce FX?

Tony Tomasi:

We have filed a bug against the results generated by Marko Dolenc's fill-rate tester and are checking in to it. Something is definitely up there. We're not aware of any reason that partial precision floating point should be slower than full precision (FP32) floating point.

There are architectural differences between the GeForce FX architecture and the GeForce 6800 architecture that can lead the GeForce FX to have higher performance per Quad of shading horsepower in some limited number of partial precision cases, but since the GeForce 6800 has 4x the number of Quads for shading computation, GeForce 6800 should deliver better absolute performance.

While its possible to build a pathologically bad register combiner program that could potentially run faster on GeForce FX than on GeForce 6800, in practice no application we've seen does or would do that.

nV News:

Reviewers have noticed image quality issues with Far Cry and are perplexed as to why the game performs better on the newly announced Radeon X800 graphics chipsets than the GeForce 6800. Comments? Is a future patch in the works that will support the new features of Shader Model 3.0?

Ujesh Desai:

We are aware of all the issues with Far Cry and we are working with Crytek to solve them as soon as possible. Some of the issues are driver bugs that we are working on, and some of the issues are application related and Crytek is working on a patch for this. We are also working with Crytek to get a Shader Model 3.0 patch added to the game.

nV News:

Performance of the GeForce FX sometimes dropped when a Pixel Shader 1.x shader was executed compared to executing a similar shader under Pixel Shader 2.0. Will the GeForce 6800 have a similar performance drop when executing a shader using Pixel Shader 3.0?

Ujesh Desai:

Actually it will be the opposite. I think there is a bit of confusion about Shader Model 3.0 and Shader Model 2.0. While Shader Model 3.0 will enable some "new" effects, it is better characterized by ease of programming, more efficient use of the hardware, and higher scene complexity/or frame rates. Shader Model 3.0 makes developers lives easier due to the support for advanced programming features such as loops and branches.

This is a fundamental requirement and will improve the efficiency in how programmers can write their code. Without support for loops and branches enabled by Shader Model 3.0, developers will be forced to break up longer Shader Model 3.0 shader programs into smaller segments that will run on Shader Model 2.0 hardware. This will absorb clock cycles which will hamper performance in games that use the latest version of DirectX and have more sophisticated Shader Model 3.0 pixel shaders.

It is important to note that in some cases, developers can create the same effect with Shader Model 2.0 and Shader Model 3.0, however it may take longer to program using Shader Model 2.0 and may require more passes through the hardware to render.

Shader Model 3.0 does introduce some new functionality - particularly dynamic branching in the pixel shader, which must be used carefully for good performance. But in general, Shader Model 3.0 should actually make development easier, and can offer some nice performance benefits for complex shaders that can be executed in pixel shader 2.0, but can be executed more efficiently in Shader Model 3.0.

nV News:

Do you believe that mixed precision is still relevant knowing that the uses of 64-bit floating point precision over 128-bit still benefits the GeForce 6 series?

Ujesh Desai:

Yes. In general, as with any processor, you should always use the fewest bits of precision that performs the function with the degree of accuracy you are after. In a CPU, people don't always declare doubles for good reasons - there are performance trade offs. I expect much the same behavior with GPU's. While 128-bit pixels are certainly higher precision than 96-bit or 64-bit pixels, they have more storage requirements as well.

Additionally, most high dynamic range applications being developed work quite well with 16-bits of floating point per component, and that in combination with floating point texture filtering and frame buffer blending of 64-bit floating point data makes mixed precision a large benefit. In fact, 64-bit floating point (called half) is enough precision for many high quality rendering and image processing systems. OpenEXR has some great examples of this as well.

nV News:

When using static branching, are those branches "free?" That is, as long as the constant value that changes which branch is executed is not changed, can that shader run at exactly the same speed as an "unrolled" shader with no branches? If not, about how many clock cycles does a static branch cost?

Tony Tomasi:

On the pixel shader side, shaders are recompiled based on constant state, so the hardware should see an unrolled shader independent of the input. This is not true in the vertex shader, but static branches are pretty cheap, all things considered (~2 clocks / branch in vertex shader).

nV News:

When using static branching, does changing the constant value that changes which branch is executed result in a state change? That is, can "uber shaders" be used to avoid state changes, and thus increase performance?

Tony Tomasi:

Yes, this is a state change (and shader recompile in pixel shader). Uber shaders can be used to avoid state changes, but uber shaders use extremely coherent dynamic branching (branch condition supplied as a vertex attribute), rather than static branching.

nV News:

About how many cycles does a dynamic branch cost at a minimum? Under what situations would a developer want to use dynamic branching and why?

Tony Tomasi:

There is a 2 cycle latency per branch instruction in the pixel shader, so IF/ELSE/ENDIF adds 6 cycles to a program (IF/ENDIF adds 4). If the branches are coherent (such as uber shaders, or potentially skipping calculations if N.L <= 0), and the number of instructions that can be skipped is greater than the latency, a developer should try dynamic branching. The vertex shader has a 2 cycle latency, too, but branch coherence isn't important, since the vertex shaders are MIMD.

nV News:

What programming feature of the GeForce 6 series do you find the most exciting?

Ujesh Desai:

I am really excited to see NVIDIA continuing to push technology forward and continue to be the technology leaders in the industry by pioneering Shader Model 3.0. Bottom line is developers want Shader Model 3.0. It is really disappointing to see some GPU manufacturers trying to hold the industry back by down playing the significance of Shader Model 3.0.

Next time someone downplays Shader Model 3.0, ask them if a Shader Model 3.0 part is on their roadmap. Then ask Microsoft what they think about their next version of DirectX 9. If given the choice between supporting it or not, any GPU manufacture would want Shader Model 3.0 support.

Tony Tomasi:

I would say both Shader Model 3.0 AND floating point texture filtering and frame buffer blending. FP16 texture filtering and frame buffer blending makes developer's lives much better for high dynamic range type applications, as well as having a significant positive impact on performance and quality for those classes of applications as texture filtering can be done in higher precision w/o using shader passes (and/or without performing filtering in lower precision).

In the same way, frame buffer blenders can be used on this same FP16 data, again avoiding extra passes in the pixel shader. By making 64-bit floating point fully orthogonal (mipmapping, all texture filtering modes, etc.) developers no longer have to special case high dynamic range functionality.

nV News:

Is there any information on the low and mid-range GeForce 6 product lineup that you can share?

Ujesh Desai:

Sorry, but we cannot discuss unannounced products. Stay tuned. We're quite pleased with the per-clock performance of the NV4x architecture, and look forward to delivering the benefits to a top–to-bottom family of the GeForce 6 GPUs.

We would like to thank Brian Burke, NVIDIA's Desktop Public Relations Manager, for suggesting a Q&A session with Tony and Ujesh. And of course thanks go out to Tony and Ujesh for taking time to provide us with their responses!

Back to nV News

Last Updated on May 13, 2004


Shop Online at PriceGrabber!


nV News - Copyright © 1998-2008. All rights reserved.
Reproduction in any form or medium without written permission of the site's owners is prohibited.
Shop Online
TOP GRAPHICS CARDS
NVIDIA GeForce 200
GTX 280
GTX 260
NVIDIA GeForce 9
9600 GT
9800 GTX
9800 GX2
NVIDIA GeForce 8
8400 GS
8500 GT
8600 GT
8600 GTS
8800 GS
8800 GT256
8800 GT
8800 GTS512
8800 GTS
Graphics Utilities
Add-In Partners
For Developers