Since the introduction of R300 VPU from ATI, NVIDIA has been
struggling to reclaim the performance and image quality crowns
(Some like NVIDIA's quality output and some don't. That being
said, in my opinion R3x0 has a better overall IQ than NV30). You
may recall a conference at Cannes where Jen-Hsun Huang, the CEO
of NVIDIA confessed that GeForce FX 5800 Ultra hasn't delivered
the performance everyone anticipated. But is NV30 really a flop?
Not at all. Early jump to .13 process and DDR II RAM implementation
contributed to the delay and scarce availability. NV30 was just
a start...yes a bad start, but I don't suppose you can rule all
the way down the line and hold the crown forever?
Can NVIDIA reclaim both crowns? Well let's take a look.
NV35 Lineup:
GeForce FX 5900 Ultra: Highest performance production GPU,
sporting 256MB of RAM. $499 ESP *available June
GeForce FX 5900: Blazing fast 128MB solution. $399 ESP
GeForce FX 5800: Also 128MB based solution. $299 ESP
So what's the big deal with the new enthusiast GPU's from NVIDIA?
FlowFX has been dropped. The new cooling system looks similar
to GeForce FX 5800, surrounded by RAM heatsinks. What we have
here is a bracket that occupies two slots, though there is a
big probability the retail cards will sport a single slot bracket.
Ultra Shadow Architecture (Shadow Volume Acceleration for
next generation games such as Doom III)
IntelliSample HCT; enhanced antialiasing
New Antialiasing and Anisotropic filtering Techniques
2nd Generation Compression and Caching (4 to 1)
True 128-bit floating-point color per pixel
Full DirectX 9 and OpenGL support
Advanced shader operations in OpenGL through extensions
NV30
NV35
Memory Type
128
256
Memory Bandwidth (GB/sec)
16
27.2
Color + Z PPC
4
4
Z PPC
8
8
Stencil PPC
8
8
Textures per clock*
8
8
Ultra Shadow
NO
YES
IntelliSample HCT
NO
YES
CineFX
1.0
2.0
Floating Point Shader Operations
1x
2x
*With the introduction of NV30, NVIDIA claimed
that it can do 8 pixels per clock. Earliest revelations over at
Beyond3D proved it wrong. NV30 could do 8 pixels per clock only
in the following situations (Supposedly only Color+Z is ran at
4 pixels per clock):
shader operations
texture operations
stencil operations
z-rendering
NVIDIA states that NV35 can do Up
to 8 pixels per clock cycle, which highly indicates
that the architecture hasn't changed (4 pipelines with two texture
units per pixel pipeline). I just haven't had enough information
to confirm that.
256-bit Memory Controller
Finally a memory controller that deserves few words. By moving
to 256-bit memory controller, NVIDIA has almost doubled the raw
bandwidth on NV35 GPU (27.2 GB/sec). The new controller will allow
for increased efficiency and overall throughput, especially at
higher resolutions such as 1280x1024 and 1600x1280. Fill rate
is everything™ some say :)
CineFX 2.0
The improved CineFX implementation will provide for a 2X increase
in floating point pixel shader performance (Note that NV30's
pixel shader performance was poor compared to R300's).
NVIDIA's GeForce FX 5900 and FX 5900 Ultra GPUs will sport
UltraShadow technology (Shadow Volume Acceleration):
Figure 1. Programmers can define a subset of the scene (within
z-min and z-max) to limit lighting/shadow calculations to the
appropriate area for each light source.
Speeding Up Shadows
Accurate shadows are key for realistic and believable
scenes. The complex interactions between multiple light sources
and numerous objects and characters involve multiple-pass
programming. For every frame, every light source must be analyzed
relative to every object. The patent-pending UltraShadow technology
can be applied to today’s games to introduce stunning
visual effects that create distinctive looks and digital environments,
that can set a game apart from the competition.
Software Advances
UltraShadow gives programmers the ability to calculate
shadows much more quickly by eliminating unnecessary areas
from consideration. With UltraShadow, programmers can define
a bounded portion of the scene (often called depth bounds)
that limits calculations of lighting source effects to objects
within a specified area. (See Figure 1.) By limiting calculations
to the area most affected by a light source, the overall shadow
generation process can be greatly accelerated. Programmers
can fine-tune shadows within critical regions, create incredible
visualizations that effectively mimic reality, and still achieve
awesome performance for fast-action games. The accelerated
shadow generation can also free up time that can be allocated
to other sophisticated but time-consuming effects.
Hardware Advances
Because stenciled shadow volumes require no texturing
or color updates, the hardware “doubles up” the
rendering horsepower to generate stenciled shadow volumes
at speeds of up to double the standard pixel-processing rate.
Other graphics solutions have to render stenciled shadow volumes
in two passes. UltraShadow accomplishes the shadow volume
rendering in a single pass, reducing CPU overhead and improving
GPU performance. The NVIDIA approach also interoperates with
NVIDIA Intellisample™ high-resolution compression technology
(HCT) to make sure that shadow edges are properly antialiased.
The GeForce FX 5900 GPUs maintain the stencil information
on a sub-pixel basis, ensuring that shadow edges are antialiased
rather than “blocky” or “jaggy.”
Applications
Anytime a game or application calculates shadows, UltraShadow
will enhance the application performance. The more passes
that are required for the lighting and shadow calculations—for
example, scenes that involve multiple light sources and many
physical objects in sight—the more significant the performance
improvement, with the most complex scenes achieving the most
noticeable results. Emerging next-generation games, such as
Doom III and Abducted, will see dramatic improvements in execution
speeds. The GeForce FX 5900 GPUs with UltraShadow technology
continue to enable a new generation of gaming effects.
In simple terms, how will we benefit from this
technology? More realistic shadows from multiple sources. A
specific area of an object in a scene can be shadowed for more
realistic effects. Concluding, UltraShadow™ appears to
be a very promising technology, it's just a matter of time until
we see it in action (Doom III).
IntelliSample HCT
The FX 5900 GPU will be able to deliver enhanced Z and Color
compression as well as accelerated texture compression capabilities.
Although it's still 4 to 1 technology, the new compression will
improve antialiasing performance and image quality. IntelliSample
HCT will allow for up to 50 percent increase in compression efficiency.
The rest of the features are NV30 related which were discussed
by Mike in his GeForce
FX preview.
Conclusion
Can NVIDIA deliver this time? There is only one way you can find
out. Read Mike's GeForce
FX 5900 Ultra preview. Based on the information given to me,
FX 5900 Ultra (256MB RAM) and FX 5900 (128MB RAM) should do a
fine job competing against Radeon 9800 Pro at all modes. Whether
the chip design was greatly improved we could rattle about this
all day long. Let's just hope we don't see any hacks in the drivers
this time, because that's a bad sign.
NVIDIA will announce their new NV35 GPU today during the press
conference at their HeadQuarters. Make sure you go (If you can).
Will they show off Four Dawns running around? I bet that's what
you wished for! Oh and don't forget about Detonator FX! It's coming.