nV News Deals Shop Archive Search Files Forum Feed Articles IRC Chat GeForce.com


Search Site
Ads by Google
Links To NVIDIA
Drivers
Products
Communities
Support
NVIDIA Blog
News Room
About NVIDIA
GeForce Technology
CUDA
DirectX 11
Optimus
PhysX
SLI
3D Vision
3D Vision Surround
Articles
GeForce GTX 580
GeForce GTX 570
GeForce GTX 560 Ti
GeForce GTX 480
GeForce GTX 465
GeForce GTX 460
GeForce GTS 450
GeForce GTX 295
GeForce GTX 280
GeForce GTX 260
GeForce GT 240
GeForce 9800 GTX
GeForce 9800 GX2
GeForce 9600 GT
GeForce 8800 Ultra
GeForce 8800 GTX
GeForce 8800 GTS
GeForce 8800 GT
GeForce 8600 GTS
GeForce 8500 GT
GeForce 7950 GX2
GeForce 7950 GT
GeForce 7900 GTX
GeForce 7900 GS
GeForce 7800 GTX
Watercooling Project
My Book 500GB
Raptor Hard Drive
Guide To Doom 3
EVGA Stuff
EVGA E-LEET
EVGA Precision
GPU Voltage Tuner
OC Scanner
SLI Enhancement
EVGA Bot
EVGA Gear
Reviews and Awards
Associates
Benchmark Reviews
Fraps
GeForce Italia
GPU Review
Hardware Pacers
LaptopVideo2Go
MVKTECH
News3D (NVITALIA)
OutoftheBoxMods
OSNN.net
Overclocker Cafe
PC Extreme
PC Gaming Standards
PhysX Links & Info
TestSeek
3DChip (German)
8Dimensional
GeForce 6 Series Question & Answer with NVIDIA - Page 1 of 1

INTRODUCTION

nV News was given the opportunity to conduct a question and answer session with NVIDIA's Tony Tomasi, Senior Director of Desktop Product Management and Ujesh Desai, General Manager of Desktop Graphics Processing Units. The questions are naturally related to the GeForce 6 Series, which was announced by NVIDIA on April 14.

GeForce 6 Series Logo

I would like to thank the group of nV News staff and selected visitors who assisted us in compiling a list of questions for NVIDIA. A total of 30 questions were submitted and I had the unfortunate duty of trimming the list down to 10 as requested by NVIDIA. Since most of the submitted questions were technically oriented, I chose to keep the Q&A along those lines.

nV News:

What are the primary factors that would lead to a difference in performance between FP16 and FP32 calculations with the GeForce 6800?

Tony Tomasi:

FP16 uses less storage than FP32, and GeForce 6800 supports FP16 texture filtering and frame buffer blending in hardware. In particular, support for FP16 texture filtering and frame buffer blending can be a very significant performance win for high dynamic range applications. There are also some operations that can be performed faster, or in some cases "for free" using FP16 in the shading hardware. For example, partial precision normalize (FP16 normalize) is essentially "free" on GeForce 6800 hardware.

nV News:

X-bit labs has shown benchmark results (chart at bottom of page) of the GeForce 6800 Ultra generating more than its theoretical limit of 32 z-pixels per clock cycle when color writes are disabled. Are these results correct?

Tony Tomasi:

Depending on the impact of occlusion culling, one could measure rates beyond 32 pixels per clock of effective fill. But the Z-ROP hardware in GeForce 6800 is capable of 32 pixels per clock of z/stencil-only rendering. Rates beyond that would be due to other factors. High z/stencil rendering rates are a nice win for applications that do 2-pass shadow algorithms, like Doom3, which is why GeForce 6800 is capable of rendering z/stencil-only at such high rates.

nV News:

We have seen partial precision benchmark results (compare "PS 2.0 - Simple" to "PS 2.0 PP - Simple" in first table) on the GeForce 6800 drop in performance over full precision. Is it possible that the GeForce 6800 could have lost performance in some applications due to having Pixel Shader 2.0 shaders optimized for the GeForce FX?

Tony Tomasi:

We have filed a bug against the results generated by Marko Dolenc's fill-rate tester and are checking in to it. Something is definitely up there. We're not aware of any reason that partial precision floating point should be slower than full precision (FP32) floating point.

There are architectural differences between the GeForce FX architecture and the GeForce 6800 architecture that can lead the GeForce FX to have higher performance per Quad of shading horsepower in some limited number of partial precision cases, but since the GeForce 6800 has 4x the number of Quads for shading computation, GeForce 6800 should deliver better absolute performance.

While its possible to build a pathologically bad register combiner program that could potentially run faster on GeForce FX than on GeForce 6800, in practice no application we've seen does or would do that.

nV News:

Reviewers have noticed image quality issues with Far Cry and are perplexed as to why the game performs better on the newly announced Radeon X800 graphics chipsets than the GeForce 6800. Comments? Is a future patch in the works that will support the new features of Shader Model 3.0?

Ujesh Desai:

We are aware of all the issues with Far Cry and we are working with Crytek to solve them as soon as possible. Some of the issues are driver bugs that we are working on, and some of the issues are application related and Crytek is working on a patch for this. We are also working with Crytek to get a Shader Model 3.0 patch added to the game.

nV News:

Performance of the GeForce FX sometimes dropped when a Pixel Shader 1.x shader was executed compared to executing a similar shader under Pixel Shader 2.0. Will the GeForce 6800 have a similar performance drop when executing a shader using Pixel Shader 3.0?

Ujesh Desai:

Actually it will be the opposite. I think there is a bit of confusion about Shader Model 3.0 and Shader Model 2.0. While Shader Model 3.0 will enable some "new" effects, it is better characterized by ease of programming, more efficient use of the hardware, and higher scene complexity/or frame rates. Shader Model 3.0 makes developers lives easier due to the support for advanced programming features such as loops and branches.

This is a fundamental requirement and will improve the efficiency in how programmers can write their code. Without support for loops and branches enabled by Shader Model 3.0, developers will be forced to break up longer Shader Model 3.0 shader programs into smaller segments that will run on Shader Model 2.0 hardware. This will absorb clock cycles which will hamper performance in games that use the latest version of DirectX and have more sophisticated Shader Model 3.0 pixel shaders.

It is important to note that in some cases, developers can create the same effect with Shader Model 2.0 and Shader Model 3.0, however it may take longer to program using Shader Model 2.0 and may require more passes through the hardware to render.

Shader Model 3.0 does introduce some new functionality - particularly dynamic branching in the pixel shader, which must be used carefully for good performance. But in general, Shader Model 3.0 should actually make development easier, and can offer some nice performance benefits for complex shaders that can be executed in pixel shader 2.0, but can be executed more efficiently in Shader Model 3.0.

nV News:

Do you believe that mixed precision is still relevant knowing that the uses of 64-bit floating point precision over 128-bit still benefits the GeForce 6 series?

Ujesh Desai:

Yes. In general, as with any processor, you should always use the fewest bits of precision that performs the function with the degree of accuracy you are after. In a CPU, people don't always declare doubles for good reasons - there are performance trade offs. I expect much the same behavior with GPU's. While 128-bit pixels are certainly higher precision than 96-bit or 64-bit pixels, they have more storage requirements as well.

Additionally, most high dynamic range applications being developed work quite well with 16-bits of floating point per component, and that in combination with floating point texture filtering and frame buffer blending of 64-bit floating point data makes mixed precision a large benefit. In fact, 64-bit floating point (called half) is enough precision for many high quality rendering and image processing systems. OpenEXR has some great examples of this as well.

nV News:

When using static branching, are those branches "free?" That is, as long as the constant value that changes which branch is executed is not changed, can that shader run at exactly the same speed as an "unrolled" shader with no branches? If not, about how many clock cycles does a static branch cost?

Tony Tomasi:

On the pixel shader side, shaders are recompiled based on constant state, so the hardware should see an unrolled shader independent of the input. This is not true in the vertex shader, but static branches are pretty cheap, all things considered (~2 clocks / branch in vertex shader).

nV News:

When using static branching, does changing the constant value that changes which branch is executed result in a state change? That is, can "uber shaders" be used to avoid state changes, and thus increase performance?

Tony Tomasi:

Yes, this is a state change (and shader recompile in pixel shader). Uber shaders can be used to avoid state changes, but uber shaders use extremely coherent dynamic branching (branch condition supplied as a vertex attribute), rather than static branching.

nV News:

About how many cycles does a dynamic branch cost at a minimum? Under what situations would a developer want to use dynamic branching and why?

Tony Tomasi:

There is a 2 cycle latency per branch instruction in the pixel shader, so IF/ELSE/ENDIF adds 6 cycles to a program (IF/ENDIF adds 4). If the branches are coherent (such as uber shaders, or potentially skipping calculations if N.L <= 0), and the number of instructions that can be skipped is greater than the latency, a developer should try dynamic branching. The vertex shader has a 2 cycle latency, too, but branch coherence isn't important, since the vertex shaders are MIMD.

nV News:

What programming feature of the GeForce 6 series do you find the most exciting?

Ujesh Desai:

I am really excited to see NVIDIA continuing to push technology forward and continue to be the technology leaders in the industry by pioneering Shader Model 3.0. Bottom line is developers want Shader Model 3.0. It is really disappointing to see some GPU manufacturers trying to hold the industry back by down playing the significance of Shader Model 3.0.

Next time someone downplays Shader Model 3.0, ask them if a Shader Model 3.0 part is on their roadmap. Then ask Microsoft what they think about their next version of DirectX 9. If given the choice between supporting it or not, any GPU manufacture would want Shader Model 3.0 support.

Tony Tomasi:

I would say both Shader Model 3.0 AND floating point texture filtering and frame buffer blending. FP16 texture filtering and frame buffer blending makes developer's lives much better for high dynamic range type applications, as well as having a significant positive impact on performance and quality for those classes of applications as texture filtering can be done in higher precision w/o using shader passes (and/or without performing filtering in lower precision).

In the same way, frame buffer blenders can be used on this same FP16 data, again avoiding extra passes in the pixel shader. By making 64-bit floating point fully orthogonal (mipmapping, all texture filtering modes, etc.) developers no longer have to special case high dynamic range functionality.

nV News:

Is there any information on the low and mid-range GeForce 6 product lineup that you can share?

Ujesh Desai:

Sorry, but we cannot discuss unannounced products. Stay tuned. We're quite pleased with the per-clock performance of the NV4x architecture, and look forward to delivering the benefits to a top–to-bottom family of the GeForce 6 GPUs.

We would like to thank Brian Burke, NVIDIA's Desktop Public Relations Manager, for suggesting a Q&A session with Tony and Ujesh. And of course thanks go out to Tony and Ujesh for taking time to provide us with their responses!

Back to nV News

Last Updated on May 13, 2004


Advertisement

nV News - Copyright © 1998-2014.
Search Products
Search
for


Ads by Casale
Tweaks
Metro: Last Light
PlanetSide 2
Miscellaneous Links
AutoDesk 123 Design
Build Your Gaming PC
FPS vs. Frame Time
Free Games And MMOs
GeForce SLI Technology
HPC For Dummies
PC Game Release Dates
Play Classic PC Games
Steam Hardware Survey
Video Game Designers
TechTerms Dictionary
GPU Applications
AMD GPU Clock Tool
AMD System Monitor
ATITool
aTuner
EVGA E-LEET
EVGA OC Scanner
EVGA Precision
EVGA Voltage Tuner
Gainward ExperTool
GPU-Shark
GPU Voltage Tuner
Fraps
FurMark
GLview
GPU Caps Viewer
GPU PerfStudio
GPU Shark
GPU-Z
MSI Afterburner
nHancer
NiBiTor
NVClock (Linux)
NVFlash
NVIDIA Inspector
NvTempLogger
NVTray
PowerStrip
RivaTuner
SLI Profile Tool
The Compressonator
3DCenter Filter Test
3DMark 11
3DMark Vantage
PhysX Applications
Cell Factor Revolution
Cryostatis Tech Demo
Cube Wall Demo
PhysX FluidMark
Fluid Physics
NV PhysX Tweaker
NVIDIA OPTIX 2
PhysX Downloads
PhysX at YouTube
Add-In Partners
AFOX
ASUS
AXLE
BFG Technologies
BIOSTAR
Chaintech
Colorful
ELSA
emTek
EVGA
GAINWARD
GALAXY
GIGABYTE
FORSA
FOXCONN
Inno3D
Jaton
Leadtek
Manli
MSI
Palit
PNY
Point of View
Prolink
SPARKLE
XFX
ZOGIS
ZOTAC