Home FAQ Search Archive Forum IRC Chat Shop Reviews


Search Site
Search Products
Search
for


Shop Online
ALL PROCESSORS
AMD64
-  Athlon II X4 620
-  Athlon II X4 630
-  Phenom II X4 940
-  Phenom II X4 945
-  Phenom II X4 955
-  Phenom II X4 955 BE
-  Phenom II X4 965 BE
Intel Core i5 / i7
-  Intel Core i5 750
-  Intel Core i7 860
-  Intel Core i7 870
-  Intel Core i7 920
-  Intel Core i7 940
-  Intel Core i7 950
-  Intel Core i7 975 EE
NVIDIA Stuff
Executive Profiles
NVIDIA Drivers
Laptop Drivers
Beta Drivers
Archived Drivers
Driver Feedback
GPU Computing
OpenCL Computing
Direct Compute
Desktop Products
Workstation GPUs
Desktop GPUs
Laptop GPUs
Netbook GPUs
Handheld Devices
Portable Media
Automotive Devices
Server Solutions
Application Engines
Apple Products
Game Consoles
System Tools
Power Packs
Get A Balanced PC
Pure Video SD
Pure Video HD
Extreme HD
GeForce PC Kit
NVIDIA 3D Vision
NVIDIA Cool Stuff
NVIDIA Software
NVIDIA PhysX
NVIDIA CUDA Zone
GPU Venture Zone
NVIDIA nZone
NVIDIA SLI Zone
SLI App Request
SLI Profile Patches
Developer Zone
NVIDIA Support
FreeBSD Support
Linux Support
Solaris Support
NVISION '08
GPU Conference '09
NVIDIA at CES '10
PAX East 2010 1
NVIDIA nTersect
NVIDIA Newsroom
NVIDIA at Facebook
NVIDIA at Flickr
NVIDIA at Twitter
NVIDIA at YouTube
NVCUDA at YouTube
NVIDIA Online Store
1 March 26, 2010
EVGA Stuff
EVGA E-LEET
EVGA Precision
GPU Voltage Tuner
SLI Enhancement
EVGA Gear
Reviews and Awards
Articles
GeForce GTX 295
GeForce GTX 280
GeForce GTX 260
GeForce GT 240
GeForce 9800 GTX
GeForce 9800 GX2
GeForce 9600 GT
GeForce 8800 Ultra
GeForce 8800 GTX
GeForce 8800 GTS
GeForce 8800 GT
GeForce 8600 GTS
GeForce 8500 GT
GeForce 7950 GX2
GeForce 7950 GT
GeForce 7900 GTX
GeForce 7900 GS
GeForce 7800 GTX
Watercooling Project
My Book 500GB
Raptor Hard Drive
Guide To Doom 3
Other Stuff
Game Releases
  By Date
  Alphabetical
Litigation
  FTC vs. Intel Corp.
Steam
  Hardware Survey
CES 2010
  Press Conference
GF100 White Papers
  GPU Architecture
  GF100 Compute

GeForce 6 Series Question & Answer with NVIDIA - Page 1 of 1

INTRODUCTION

nV News was given the opportunity to conduct a question and answer session with NVIDIA's Tony Tomasi, Senior Director of Desktop Product Management and Ujesh Desai, General Manager of Desktop Graphics Processing Units. The questions are naturally related to the GeForce 6 Series, which was announced by NVIDIA on April 14.

GeForce 6 Series Logo

I would like to thank the group of nV News staff and selected visitors who assisted us in compiling a list of questions for NVIDIA. A total of 30 questions were submitted and I had the unfortunate duty of trimming the list down to 10 as requested by NVIDIA. Since most of the submitted questions were technically oriented, I chose to keep the Q&A along those lines.

nV News:

What are the primary factors that would lead to a difference in performance between FP16 and FP32 calculations with the GeForce 6800?

Tony Tomasi:

FP16 uses less storage than FP32, and GeForce 6800 supports FP16 texture filtering and frame buffer blending in hardware. In particular, support for FP16 texture filtering and frame buffer blending can be a very significant performance win for high dynamic range applications. There are also some operations that can be performed faster, or in some cases "for free" using FP16 in the shading hardware. For example, partial precision normalize (FP16 normalize) is essentially "free" on GeForce 6800 hardware.

nV News:

X-bit labs has shown benchmark results (chart at bottom of page) of the GeForce 6800 Ultra generating more than its theoretical limit of 32 z-pixels per clock cycle when color writes are disabled. Are these results correct?

Tony Tomasi:

Depending on the impact of occlusion culling, one could measure rates beyond 32 pixels per clock of effective fill. But the Z-ROP hardware in GeForce 6800 is capable of 32 pixels per clock of z/stencil-only rendering. Rates beyond that would be due to other factors. High z/stencil rendering rates are a nice win for applications that do 2-pass shadow algorithms, like Doom3, which is why GeForce 6800 is capable of rendering z/stencil-only at such high rates.

nV News:

We have seen partial precision benchmark results (compare "PS 2.0 - Simple" to "PS 2.0 PP - Simple" in first table) on the GeForce 6800 drop in performance over full precision. Is it possible that the GeForce 6800 could have lost performance in some applications due to having Pixel Shader 2.0 shaders optimized for the GeForce FX?

Tony Tomasi:

We have filed a bug against the results generated by Marko Dolenc's fill-rate tester and are checking in to it. Something is definitely up there. We're not aware of any reason that partial precision floating point should be slower than full precision (FP32) floating point.

There are architectural differences between the GeForce FX architecture and the GeForce 6800 architecture that can lead the GeForce FX to have higher performance per Quad of shading horsepower in some limited number of partial precision cases, but since the GeForce 6800 has 4x the number of Quads for shading computation, GeForce 6800 should deliver better absolute performance.

While its possible to build a pathologically bad register combiner program that could potentially run faster on GeForce FX than on GeForce 6800, in practice no application we've seen does or would do that.

nV News:

Reviewers have noticed image quality issues with Far Cry and are perplexed as to why the game performs better on the newly announced Radeon X800 graphics chipsets than the GeForce 6800. Comments? Is a future patch in the works that will support the new features of Shader Model 3.0?

Ujesh Desai:

We are aware of all the issues with Far Cry and we are working with Crytek to solve them as soon as possible. Some of the issues are driver bugs that we are working on, and some of the issues are application related and Crytek is working on a patch for this. We are also working with Crytek to get a Shader Model 3.0 patch added to the game.

nV News:

Performance of the GeForce FX sometimes dropped when a Pixel Shader 1.x shader was executed compared to executing a similar shader under Pixel Shader 2.0. Will the GeForce 6800 have a similar performance drop when executing a shader using Pixel Shader 3.0?

Ujesh Desai:

Actually it will be the opposite. I think there is a bit of confusion about Shader Model 3.0 and Shader Model 2.0. While Shader Model 3.0 will enable some "new" effects, it is better characterized by ease of programming, more efficient use of the hardware, and higher scene complexity/or frame rates. Shader Model 3.0 makes developers lives easier due to the support for advanced programming features such as loops and branches.

This is a fundamental requirement and will improve the efficiency in how programmers can write their code. Without support for loops and branches enabled by Shader Model 3.0, developers will be forced to break up longer Shader Model 3.0 shader programs into smaller segments that will run on Shader Model 2.0 hardware. This will absorb clock cycles which will hamper performance in games that use the latest version of DirectX and have more sophisticated Shader Model 3.0 pixel shaders.

It is important to note that in some cases, developers can create the same effect with Shader Model 2.0 and Shader Model 3.0, however it may take longer to program using Shader Model 2.0 and may require more passes through the hardware to render.

Shader Model 3.0 does introduce some new functionality - particularly dynamic branching in the pixel shader, which must be used carefully for good performance. But in general, Shader Model 3.0 should actually make development easier, and can offer some nice performance benefits for complex shaders that can be executed in pixel shader 2.0, but can be executed more efficiently in Shader Model 3.0.

nV News:

Do you believe that mixed precision is still relevant knowing that the uses of 64-bit floating point precision over 128-bit still benefits the GeForce 6 series?

Ujesh Desai:

Yes. In general, as with any processor, you should always use the fewest bits of precision that performs the function with the degree of accuracy you are after. In a CPU, people don't always declare doubles for good reasons - there are performance trade offs. I expect much the same behavior with GPU's. While 128-bit pixels are certainly higher precision than 96-bit or 64-bit pixels, they have more storage requirements as well.

Additionally, most high dynamic range applications being developed work quite well with 16-bits of floating point per component, and that in combination with floating point texture filtering and frame buffer blending of 64-bit floating point data makes mixed precision a large benefit. In fact, 64-bit floating point (called half) is enough precision for many high quality rendering and image processing systems. OpenEXR has some great examples of this as well.

nV News:

When using static branching, are those branches "free?" That is, as long as the constant value that changes which branch is executed is not changed, can that shader run at exactly the same speed as an "unrolled" shader with no branches? If not, about how many clock cycles does a static branch cost?

Tony Tomasi:

On the pixel shader side, shaders are recompiled based on constant state, so the hardware should see an unrolled shader independent of the input. This is not true in the vertex shader, but static branches are pretty cheap, all things considered (~2 clocks / branch in vertex shader).

nV News:

When using static branching, does changing the constant value that changes which branch is executed result in a state change? That is, can "uber shaders" be used to avoid state changes, and thus increase performance?

Tony Tomasi:

Yes, this is a state change (and shader recompile in pixel shader). Uber shaders can be used to avoid state changes, but uber shaders use extremely coherent dynamic branching (branch condition supplied as a vertex attribute), rather than static branching.

nV News:

About how many cycles does a dynamic branch cost at a minimum? Under what situations would a developer want to use dynamic branching and why?

Tony Tomasi:

There is a 2 cycle latency per branch instruction in the pixel shader, so IF/ELSE/ENDIF adds 6 cycles to a program (IF/ENDIF adds 4). If the branches are coherent (such as uber shaders, or potentially skipping calculations if N.L <= 0), and the number of instructions that can be skipped is greater than the latency, a developer should try dynamic branching. The vertex shader has a 2 cycle latency, too, but branch coherence isn't important, since the vertex shaders are MIMD.

nV News:

What programming feature of the GeForce 6 series do you find the most exciting?

Ujesh Desai:

I am really excited to see NVIDIA continuing to push technology forward and continue to be the technology leaders in the industry by pioneering Shader Model 3.0. Bottom line is developers want Shader Model 3.0. It is really disappointing to see some GPU manufacturers trying to hold the industry back by down playing the significance of Shader Model 3.0.

Next time someone downplays Shader Model 3.0, ask them if a Shader Model 3.0 part is on their roadmap. Then ask Microsoft what they think about their next version of DirectX 9. If given the choice between supporting it or not, any GPU manufacture would want Shader Model 3.0 support.

Tony Tomasi:

I would say both Shader Model 3.0 AND floating point texture filtering and frame buffer blending. FP16 texture filtering and frame buffer blending makes developer's lives much better for high dynamic range type applications, as well as having a significant positive impact on performance and quality for those classes of applications as texture filtering can be done in higher precision w/o using shader passes (and/or without performing filtering in lower precision).

In the same way, frame buffer blenders can be used on this same FP16 data, again avoiding extra passes in the pixel shader. By making 64-bit floating point fully orthogonal (mipmapping, all texture filtering modes, etc.) developers no longer have to special case high dynamic range functionality.

nV News:

Is there any information on the low and mid-range GeForce 6 product lineup that you can share?

Ujesh Desai:

Sorry, but we cannot discuss unannounced products. Stay tuned. We're quite pleased with the per-clock performance of the NV4x architecture, and look forward to delivering the benefits to a top–to-bottom family of the GeForce 6 GPUs.

We would like to thank Brian Burke, NVIDIA's Desktop Public Relations Manager, for suggesting a Q&A session with Tony and Ujesh. And of course thanks go out to Tony and Ujesh for taking time to provide us with their responses!

Back to nV News

Last Updated on May 13, 2004


Sponsors

Shop Online at PriceGrabber!


nV News - Copyright © 1998-2010. All rights reserved.
Reproduction in any form or medium without written permission of the site's owners is prohibited.
Shop Online
TOP GRAPHICS CARDS
NVIDIA GeForce 200
-  GeForce GT 220
-  GeForce GT 240
-  GeForce GTS 250
-  GeForce GTX 260
-  GeForce GTX 275
-  GeForce GTX 285
-  GeForce GTX 295
NVIDIA GeForce 9
-  GeForce 9400 GT
-  GeForce 9500 GT
-  GeForce 9600 GSO
-  GeForce 9600 GT
-  GeForce 9800 GT
-  GeForce 9800 GTX
Graphics Utilities
AMD GPU Clock Tool
ATITool
aTuner
EVGA Precision
EVGA Voltage Tuner
Gainward ExperTool
GPU Voltage Tuner
Fraps
FurMark
GLview
GPU Caps Viewer
GPU-Z
MSI Afterburner
nHancer
NiBiTor
NVClock (Linux)
NvTempLogger
NVTray
PowerStrip
RefreshForce
RefreshLock
RivaTuner
3DCenter Filter Test
3DMark Vantage
Add-In Partners
Albatron
ASUS
AXLE
BFG Technologies
BIOSTAR
Chaintech
Colorful
ELSA
EVGA
GAINWARD
GALAXY
GIGABYTE
FORSA
FOXCONN
Inno3D
Jaton
Leadtek
MSI
Palit
PNY
Point of View
Prolink
SPARKLE
XFX
ZOGIS
ZOTAC
For Developers
ACM SIGGRAPH
AMD
DevMaster.net
flipCode
Gamasutra
GameDev.net
GPGPU
Intel
Microsoft
CiteSeer
NeHe Productions
NVIDIA
OpenGL.org
Programmers Heaven
Real-Time Rendering
Stanford Graphics
3dRender.com
Associates
Benchmark Reviews
Fraps
GeForce Italia
GPU Review
Hardware Pacers
LaptopVideo2Go
MVKTECH
News3D (NVITALIA)
OutoftheBoxMods
OSNN.net
Overclocker Cafe
PC Extreme
PC Gaming Standards
PhysX Links & Info
3DChip (German)
8Dimensional