As early as May, articles about NVIDIA's next generation graphics chip code-named NV30 were already being published. CNET's News.com reported that President and CEO Jen-Hsun Huang commented at the Merrill Lynch Hardware Heaven Technology Conference that the NV30 "is the most important contribution we've made to the graphics industry since the founding of this company." We didn't have any concrete information as to why the NV30 would be NVIDIA's most important contrubution, but there's been a tremendous amount of speculation on the subject since that report was published. But with the NV30, NVIDIA is another step closer in their goal to achieve cinematic quality rendering in real-time.
From the same report, we also learned that the NV30 would be manufactured using Taiwan Semiconductor's (TSMC) latest 0.13-micron process technology and that it was a fundamentally new architecture from the recently announced GeForce 4 (Ti)tanium. Even the statement "...a new graphics chip slated to arrive in August", which linked the NV30 to a date, became the subject of scrutiny as August passed by.
What we do know is that NVIDIA worked through design issues and according to Huang, TSMC initially struggled with the new manufacturing process. However, the problem may not have been attributed to the "standard" 0.13-micron process, but rather a result of integrating copper and low-k dialectric interconnect technology. By using low-k materials, performance degradation of the integrated circuit that's associated with scaling can be minimized.
While the exact impact on the NV30's production schedule attributed to design and manufacturing issues is unknown, NVIDIA wasn't in a position to counter industry rival ATI's release of the R300, which took the performance crown from the GeForce4 Ti 4600. NVIDIA is now in an unfamiliar role of playing catching up as the R300 will have been on the market for 5-6 months before the NV30 is readily available in the market. However, having their first 0.13-micron part completed gives NVIDIA time to further refine designs of the GeForce FX product line and future 0.13-micron based products.
NVIDIA will be officially announcing the GeForce FX today as the world's first graphics processing unit (GPU) manufactured using TSMC's 0.13-micron process technology. With an investment of over $100 million, the GeForce FX will also be their first product designed using the experience of former engineers from 3dfx. Other technologies that have made their way into the GeForce FX include Flip Chip BGA Packaging, FX Flow Thermal Management, and an advanced board design.
With 125 million transistors and copper interconnects, the die size of the GeForce FX is 25% smaller than a comparable 0.15-micron part and offers performance improvements of 25-30%. Compared to the 0.15-micron GeForce4 Ti 4600's 300MHz clock speed, NVIDIA is looking for the high-end GeForce FX core to operate at 500MHz. But even at a 0.13-micron process, a 500MHz clock speed required special attention in keeping the GPU's temperature in check. Pictured below, NVIDIA's FX Flow Thermal Management consists of a copper heat spreader, heat pipes, and an air flow system.
Outside air is pulled inward and is cooled as it passes over the heat pipe and circulates over the heat spreader. The heated air is eventually blown out of the case. The temperature of the GPU is monitored and adjustments are made to the amount of air flow as needed. The GeForce FX contains an auxiliary Molex four-pin power connector on the right side of the graphics card and is used to supply the additional power needed to operate at maximum processor clock speeds. The combination of the cooling and power features takes up enough space to occupy two PCI slots. It's not certain if lower clocked GeForce FX processors require either of these solutions, and it will be left to add-in manufacturers to determine the best cooling solution for their add-in cards.
While the GeForce FX relies on an optimized 128-bit memory bus, the use of 500MHz DDR2 memory will provide a substantial increase in memory bandwidth by effectively doubling throughput to 1.0GHz. An added benefit of using DDR2 is that the GeForce FX and future graphics chipsets are capable of using faster rated DDR2 memory as it becomes available. DDR memory on the GeForce4 Ti 4600 was clocked at an effective speed of 650MHz.
Memory optimization technologies have undergone a redesign on the GeForce FX to take advantage of DDR2's longer burst lengths, while improved features of Lightspeed Memory Architecture 2 were incorporated. The addition of color compression and refinements to the occlusion system effectively increase available memory bandwithde, which NVIDIA estimates is close to 20GB.
GEFORCE FX FEATURES
The primary features of the GeForce FX, some of which will be discussed, are as follows:
TSMC's 0.13 Micron Process
125 Million Transistors
500MHz Graphics Processing Unit
1GHz (Effective) DDR2 Memory
128-Bit Memory Bus
16GB/Sec Physical Memory Bandwidth
1GB Maximum Addressable Memory
8 Pixels Per Clock / 8 Textures Per Clock
128-Bit Floating Point Precison
AGP 8X Interface
Cg Shader Language
Unified Driver Architecture
Digital Vibrance Control 3.0
At its core, CineFX is the umbrella that covers both the hardware and software capabilities of the GeForce FX. The hardware was designed to mirror features available in Microsoft's DirectX 9 Application Programming Interface (API), which include 128-bit floating point color precision and vertex and pixel shader version 2.0. The GeForce FX contains a full 128-bit graphics pipeline and shader processing is handled by an array of thirty-two 128-bit floating-point processors that are dynamically allocated based on the resources needed by shader programs.
While the specific details in the chart below are beyond the scope of this article, you should be able to recognize that the GeForce FX and DirectX 9 provide developers with unprecedented levels of programmability. This includes longer programs, more variables, greater program flow using loops and branching, and increased precision. Other capabilities not shown include new instructions and per-component condition codes. The 2.0+ version notation indicates functionality that was added to DirectX 9 specifically for the GeForce FX.
GeForce4 Ti versus GeForce FX Programmability
High Order Surfaces
High Order Surface
Vertex Displacement Mapping
Geometry Displacement Mapping
Max Static Instructions
Call And Return
Static Control Flow
Dynamic Control Flow
Max Texture Instructions
Max Color Instructions
Max Temp Storage
As an example of the benefit of increased programmability, consider a case for using conditional and dynamic branching for developing reusable instructions, or program code.
Along with subroutines, one could envision general purpose shaders that are capable of dynamically altering their properties at run-time based on variable parameter data.
With respect to OpenGL, Mark Kilgard of NVIDIA referred to programability as the "Heart and Soul" of the GeForce FX as it surpasses the capabilities of DirectX 9.
The software side of CineFX is exposed by traditional programming techniques or by utilizing high-level graphics languages that will debut in DirectX 9 and OpenGL 2.0, or using NVIDIA's Cg, or C for Graphics. Cg will be discussed towards the end of the article.
The GeForce FX Intellisample technology consists of a series of hardware based efficiencies that are aimed at increasing graphics performance as well as image quality. Increased performance is obtained through the use of color compression along with other hardware assisted techniques. Image quality is enhanced by offering the user with increased levels of antialiasing, adaptive texture filtering, and dynamic gamma correction.
The GeForce FX utilizes a proprietary method of compressing color data in real time. The compression method provides up to a 4:1 compression ratio and is loss-less meaning that there is no reduction in image quality or loss of precision. Color compression is a memory bandwidth saving technology used to improve antialiasing performance.
Fast Color Buffer Clear
The color buffer can be cleared in hardware with the GeForce FX resulting in improved performance.
Dynamic Gamma Correction
To provide an accurate depiction of the amount of luminance associated with a rendered image, the GeForce FX's dynamic gamma correction can correct color inaccuracies.
The image on the right has been gamma corrected.
Adaptive Texture Filtering
The GeForce FX supports adaptive texture filtering, which will exposed through driver settings, and is aimed at improving texture quality while limiting the hit against performance. These methods are based on "intelligent adjustments" by the GeForce FX, which continuously monitors the number and type of samples taken for texturing operations on a pixel-by-pixel basis. Adaptive texture filtering works in conjunction with trilinear and anisotropic filtering modes.
The traditional texture filtering algorithm, which was implemented on previous GeForce graphics processors and offered the highest level of texture filtering (up to 8X anisotropic), is supported by the GeForce FX as well. In either case, anisotropic filtering will be significantly faster on the GeForce FX than it was on the GeForce4.
Higher Antialiasing Modes
The GeForce FX offers higher modes of antialiasing compared to the GeForce4. Under Direct3D and OpenGL the level of antialiasing has increased from 4X to 8X. The 4XS antialiasing mode under Direct3D, which was introducted with the GeForce4, has increased to 6XS on the GeForce FX. Both new antialiasing modes are based on a combination of multisampling and supersampling. As for sampling patterns, 6XS is based on a skewed grid, while 8X is based on an ordered grid.
As a quick refresher, the following are examples of antialiasing under Direct3D and OpenGL on a GeForce4. The images were taken from the cockpit in the flight simulation IL-2 Sturmovik and were enlarged 2 times their original size to better see the benefit of antialiasing (smoothing out jagged edges).
No Antialiasing - 100% Enlarged
Direct3D 4XS Antialiasing - 100% Enlarged
OpenGL 4X Antialiasing - 100% Enlarged
When comparing 4X and 4XS antialiasing, notice the slight difference between the edges on the top left, which are aligned horizontally, and the edges on the bottom right, which are aligned vertically. Flight and racing simulations are notorious for being CPU limited, which makes the use of high levels of antialiasing practical.
128-BIT FLOATING POINT COLOR PRECISION
The GeForce4 FX offers 64-bit and 128-bit color precision, which allows 16 and 32-bit floating point representation for each Red, Green, and Blue color component. The increased floating point accuracy benefits volumetric effects, per-pixel lighting, and bumpmapping. Referred to as FP16 and FP32 modes, a developer is free to use both color formats, along with other floating point or integer based color formats, as needed. Areas that require greater attention to detail can make use of FP32, while the accuracy of FP16 may be sufficient for others.
Calculations involving floating point numbers, as opposed to integers, are less prone to rounding errors and result in greater precison. The lack of precision will cause artifacts and lowers the visual quality as shown in the images above. Other features that gain accuracy with increased color precision are:
Gamma Correct Lighting
Dynamic Range of Image/Tone Mapping
Per Pixel Specular Components
Distance Base Effects
"C" FOR GRAPHICS
Our first look at Cg coincided with NVIDIA's release of the beta Cg toolkit back in June. In laymans terms, Cg is a set of development tools that can be utilized by graphics programmers and digital content creators (artists) to accomplish more work in less time. A Cg program can be thought of as a series of higher level instructions, that reside on top of the actual application progamming interface (API), and used to program specific portions of the graphics pipeline. To garner suppor for Cg, the web site CgShaders.org made its debut.
The first graphics shading languages began to appear in the mid-1980's and are now targeting real-time rendering applications on consumer class graphics hardware. Pixar's RenderMan is capable of rendering photorealistic quality images and contains a proprietary shading language used to describe how lighting and shading is to be computed. At SIGRRAPH 2000, Peercy, Olano, Airey, and Ungar authored the paper Interactive Multi-Pass Programmable Shading, which demonstrates how OpenGL was used for compiling the RanderMan Shading Language. id Software's Quake 3 graphics engine contains a parameter driven shading languages that interacts with OpenGL to perform tasks that include multi-pass rendering.
A highly respected and on-going shading project has been Stanford's Real-Time Programmable Shading Project, which recently created a compiler back-end for the NV30. Bill Mark, who was a research associate with the project, worked at NVIDIA for a year as the lead designer of the Cg language.
In the paper Shader-Driven Compilation of Rendering Assets (983KB), Paul Lalonde and Eric Schenk of Electronic Arts developed a shader system that was used as a development tool that targeted a variety of hardware. The paper contains a performance analysis of their shader system, which consisted of rendering a model comprised of 3,998 polygons and then increasing its complexity by adding shading and skinning effects.
Cg generated quite a bit of discussion at the time of its release as speculation arose that Cg was being marketed by NVIDIA as a proprietary language targeted exclusively for its products. While I'll leave that open as a subject for the experts to debate, Cg consists of two main components - an open sourced front-end and a back-end compiler, which is proprietary to NVIDIA. The compiler component represents the "brains" behind Cg and is responsible for taking the high level source code, along with the front-end "profile", or CgFX file, and translating it into a machine language that is understood by the GPU. Optimizations for a specific graphics chipset, such as the GeForce FX, are generated by the compiler. An example of an optimization under OpenGL would be for the compiler to make use of NVIDIA's NV_vertex_program2 or NV_fragment_program extensions. However, other company's that design graphics chipsets are at liberty to write their own back-end compiler, which makes sense since they will be most familiar with the optimazations their products offer.
Cg provides unique benefits to the graphics industry as a whole. First, Cg is based on high-level shader languages, which will be incorporated into future releases of the the two prominent API's that are used by game developers today - DirectX and OpenGL. While the shader language structure and syntax of DirectX and OpenGL differ, NVIDIA's Unified Driver Compiler (UDC) has the capability to generate instructions for both APIs as well as NVIDIA and non-NVIDIA GPUs. The UDC also targets a number of computing platforms, which include Windows, Linux, Mac OS, and Xbox. There may eventaully be a point when certain components of Cg are incorporated under the realm of DirectX or OpenGL, but it's unlikely that we'll see a complete and independent solution since DirectX and OpenGL are competing API's. Another roadblock is that the Linux and Macintosh operating systems use OpenGL as their 3D graphics API. And second, by releasing Cg early, NVIDIA gave developers the opportunity to become familiar with the capabilities of a high level graphics language.
The second beta of Cg allowed developers the ability to emulate key features of the NV30. NV30 emulation allowed Cg programs to imitate the functionality of NV30 by being executed using the central processing unit. Under this scenerio, the same data and programs could be processed on either the CPU or GPU and in theory, achieve the same results. An example of the work produced using NV30 emulation can be seen in the entries submitted to a recent contest at Cgshaders.org.
NVDIA's Brian Burke and Tony Tomasi were upbeat and confident about the capabilities of the GeForce FX in a conference call on Friday. I sensed a feeling of relief from them, as I'm certain that many NVIDIA employees have worked very hard in getting to this point. There are a couple of items that I didn't follow-up on, due to a lack of time, and its possible that other previews will provide more information. The follow-up questions would have included the ideas that former 3dfx and Gigapixel engineers had implemented on the GeForce FX and with additional details about LMA III and its affect on theoretical memory bandwidth.
We didn't receive a reference card to test and NVIDIA wasn't in a position to divulge specific performance results since the GeForce FX drivers are still under development. While some previews may report that the GeForce FX may provide a 3X or 4X increase in performance over the GeForce4 Ti 4600, they may not provide detailed test conditions. In what's become customary with the latest graphics hardware, applications that demonstrate or measure the performance of their latest technology will be limited. In most cases, the GeForce FX will be judged based on the features that provide immediate benefits to the consumer.
At this point, you can bet that NVIDIA's top priority is getting the GeForce FX on the market. Those of you that have been loyal in using NVIDIA's products may continue to patiently wait for the initial GeForce FX reviews, which I'm sure NVIDIA appreciates. On the other hand, NVIDIA realizes that every day that passes without the GeForce FX on the market is another day they can lose potential customers to ATI. The GeForce FX is scheduled to be available in the January/February 2003 timeframe.
Although an official announcement hasn't been made, NVIDIA appears to have scrapped plans to further manufacture the GeForce FX 5800 Ultra, which leaves the lower clocked GeForce FX 5800 to fill the high-end until the NV35 arrives. However, customers that pre-ordered the GeForce FX 5800 Ultra are just now beginning to receive their cards. This leads me to believe that yields of the high-end GeForce FX weren't good enough to make it a profitable part. NVIDIA's President and CEO Jen-Hsun Huang recently commented that TSMC's 0.13-micron yields are in-line with expectations, but he would like them to be better, with lower cycle time, and lower wafer cost. Even ATI's next generation 0.13-micron designed R400 graphics chip, which was expected to appear in 2003, has been put off until 2004.
One last noteworthy event that took place was NVIDIA's negative stance on FutureMark's 3DMark03 benchmark, which simulates DirectX 9 game performance. Having supported past 3DMark programs, NVIDIA parted their ways with FutureMark a few months ago as they decided the benchmark was not indicative of how actual DirectX 9 games would be programmed. Ironically, the first 3DMark03 results for the GeForce FX 5800 Ultra were behind the Radon 9700 Pro, although a set of "new" drivers supplied by NVIDIA gave the GeForce FX 5800 Ultra a lead. NVIDIA demonstrated that they were able to optimize drivers to improve the performance of a synthetic benchmark. However, arguments for 3DMark03 can also be made as the benchmark is said to contain no graphics chipset specific optimizations since it's based on DirectX 9 functionality.
GEFORCE FX 5600 ULTRA AND 5200 ULTRA
On March 6, 2003 NVIDIA will announce new products based on the NV31 and NV34 code-name graphics cores. The GeForce FX 5600 Ultra (NV31) and the GeForce FX 5200 Ultra and GeForce FX 5200 (NV34) represent NVIDIA's upcoming performance and mainstream lineup, which are expected to be available in April. The following chart shows NVIDIA's graphics processors for the first half of 2003 compared to the second half of 2002.
NVIDIA Graphics Lineup - 2H 2002 vs 1H 2003
GeForce FX 5800 Ultra
GeForce FX 5800 Ultra
GeForce 4 Ti
GeForce FX 5800
GeForce 4 Ti
GeForce FX 5600 Ultra
GeForce 4 Ti / MX
GeForce FX 5200 Ultra
GeForce 4 MX
GeForce FX 5200
GeForce 4 MX
GeForce 4 MX
By extending the GeForce FX, NVIDIA's product line will soon be covered from top to bottom with graphics chipsets that support DirectX 9 functionality. However, it appears that there will be wider performance gap between the GeForce FX enthusiast and performance segments than there was with last years GeForce 4 Ti lineup. Compared to the three GeForce 4 Ti models (4200, 4400, and 4600), which were basically higher clocked parts, differences between the GeForce FX 5800 and GeForce FX 5600 Ultra will result in a greater impact on performance. The differences include core clock speed, memory type, and the number of pixel and shader pipelines.
The primary features of the GeForce FX 5600 Ultra are as follows:
TSMC's 0.13 Micron Process
80 Million Transistors
350MHz Graphics Processing Unit
700MHz (Effective) DDR Memory
128-Bit Memory Bus
11.2GB/sec Physical Memory Bandwidth
128-Bit Floating Point Precision
4x1 Pipeline Architecture
Lossless Color And Z Compression
Z Occlusion Culling
AGP 8X Interface
Integrated TV-Encoder, TMDS Transmitters
Dual Integrated 400MHz Ramdacs
Integrated Full Hardware MPEG-2 Decoder
According to NVIDIA, the GeForce FX 5600 Ultra is targeted to replace the popular GeForce4 Ti 4200, yet contains a processor and memory clock speed that exceeds that of the GeForce4 Ti 4600 (300MHz/650MHz). The GeForce FX 5600 Ultra will compete against ATI's Radeon 9500 (non pro model), but ATI's positioning in the performance segment may grow stronger with new products based on their R350 graphics chipset.
The GeForce FX 5200 Ultra (45 million transistors, 325MHz/650MHz effective DDR, 10.4GB/sec memory bandwidth) and GeForce FX 5200 (250MHz/400MHz effective DDR) are manufactured at 0.15-micron and contain support for DirectX 9 including vertex and pixel shader 2.0+. The GeForce 5200 Ultra and GeForce 5200 contain all the memory bandwidth maximizing features of Intellisample technology except hardware assisted color and z compression.
In a recent conference call, NVIDIA specifically mentioned that the image quality of all GeForce FX chipsets would be identical, which could indicate that the low-end of the product line may not provide full hardware acceleration for DirectX 9. While the image quality will be consistent across the GeForce FX product line, some chipsets will obviously run slower. The GeForce FX 5200 Ultra will replace the GeForce4 MX at the low-end and compete with the Radeon 9000.
SO WHAT'S NEXT?
During the past two days, most of my free time has been spent testing the GeForce FX 5600 Ultra. NVIDIA extended the date for publishing benchmark results to March 10th, which gives reviewers some additional breathing room for testing. The 42.72 drivers we will be using have a build date of February 24th. I'm planning on comparing the performance of the GeForce FX 5600 Ultra and GeForce FX 5200 Ultra to the GeForce 4 Ti 4600 and GeForce4 Ti 4200. But the test suite will be limited, which is a result of the increased number of image quality settings that need to be addressed.
The Detonator driver Performance and Quality settings allow the user to control the level of anisotropic texture filtering and antialiasing, which offer improved image quality at the expense of performance. Upon the debut of the GeForce4 Ti 4600, using anisotropic filtering caused a significant loss in performance. Values for the level of anisotropic texture filtering include none, 2X, 4X, and 8X and based on past experience with the GeForce4 Ti 4600, the greatest improvement in texture quality is normally achieved when moving from none to 4X. In most cases, going from 4X to 8X doesn't provide significant improvements, although performance will decrease.
As a result of the GeForce FX, NVIDIA has implemented three Performance modes that control the performance, and quality, of anisotropic filtering - Application, Balanced, and Aggressive. Although I have yet to test the performance associated with all three settings, I thought it would be beneficial to provide you with some sample images using the 42.72 Detonator drivers with antialiasing disabled.
Note that you'll need to be quite close to the monitor in order to see the subtle differences between the images. While image quality is subjective, what I've done in this exercise is match the best quality from the Application setting to a comparable quality offered by the Balanced and Aggressive settings. For each anisotropic filtering mode (Application, Balanced, and Aggressive) and each anisotropic filtering level (none, 2X, 4X, and 8X), I started a multiplayer game, which insures that the position the in-game screen shot was taken from was the same. When I was done, a total of twelve screen shots had been taken. I concentrated on three different areas in the screen shot where anisotropic filtering was beneficial and cut out a section that provided the best image quality from the Application setting. I then went through the process of finding a comparable image from the Balanced and Aggressive modes.
If you have questions about the comparison, feel free to begin a discussion in the forum. A copy of the full size screen shots have been saved in high-quality PNG format on the server.