NVIDIA GeForce4 Ti 4600 Preview
By: Mike Chambers - February 6, 2002
Those of us familiar with NVIDIA's graphics chipset releases, which have been taking place twice a year, have come to expect new and exciting features with the spring refresh of their GeForce product. Following the debut of the GeForce 256 back in the fall of 1999, the GeForce2 GTS appeared in the spring of 2000. In February of 2001, the GeForce3 was being previewed and brought us a variety of new features including a programmable transform and lighting engine, an advanced memory controller, and the ability to use antialiasing at high resolutions.
Not only are graphics chipsets being refreshed at the high-end, but NVIDIA continues to up the ante for consumers on a budget as well. The most recent example was that of the GeForce3 Titanium 200. The GeForce3 Ti 200 became a hot commodity as they generally performed within 25% of the GeForce3 Ti 500 and cost half as much. However, market conditions did have an effect on NVIDIA's original pricing scheme for the Titanium lineup as their primary competitor ATI was due to release the Radeon 7500 and 8500.
GeForce3 Ti 500 / GeForce4 Ti 4600 (Bottom)
Today marks the opening of another chapter for NVIDIA, as they are announcing the next iteration of the GeForce - the GeForce4. In what has become customary with recent graphics chipset rollouts, NVIDIA is offering the GeForce4 in several configurations. At the high-end is the GeForce4 Ti (NV25) while the GeForce4 MX (NV17) is targeted at the mainstream market segment.
The GeForce4 Ti
The GeForce4 Ti series of chipsets consists of two products - the GeForce4 Ti 4600 and the GeForce4 Ti 44400. The only difference between the two is the graphics processor and memory clock speeds. Both models can support up to 128MB of dual data rate (DDR) memory, which happens to be the amount of memory necessary for using 4X antialiasing at a resolution of 1600x1200. Let's take a look at what makes the GeForce4 Ti click.
Custom Thermal System
The GeForce4 Ti 4600 is NVIDIA's most advanced graphics processor with 63 million transistors and continues to be fabricated using a 0.15 micron process. With the increased number of transistors and processor clock speed, NVIDIA incorporated a custom thermal control system to counteract the heat generated by the GeForce4 Ti graphics processor. The patented system consists of a heat slug over the graphics processor combined with active cooling capable of generating a high pressure flow of air around the graphics processor and memory.
NVIDIA also updated the packaging of the graphics processor and memory by incorporating a Plastic Ball Grid Array (PBGA) in lieu of a Pin Grid Array (PGA). PBGA is becoming the semiconductor industry's packaging of choice for high clock frequency integrated circuits since it offers greater performance in a smaller form factor. Unlike a PGA, where memory is attatched to the circuit board from the side, a PBGA is normally attached from the bottom which results in shorter electrical paths.
Lightspeed Memory Architecture II
Prior to the release of the GeForce4, rumors were circulating that additional performance over the GeForce3 could be achieved with additional rendering pipelines and/or increasing the number of texture processing units per pipeline. While this may have provided marginal increases in performance, the GeForce4 still contains four rendering pipelines with two texture processing units per pipeline. With a 300MHz graphics processor clock speed, the GeForce4 is capable of producing 10.4GB of memory bandwith per second. The Lightspeed Memory Architecture first introduced with the GeForce3 was NVIDIA's first step in efficiently utilizing memory bandwidth. The enhanced Lightspeed Memory Architecture for the GeForce4 is the next step.
The Lightspeed Memory Architecture II (LMA II) continues to have at it's core four independent crossbar memory controllers. However, it wasn't until the debut of the Detonantor 4 drivers that the potential of this memory architecture was fully realized on the GeForce3. During that time we found that under heavy loads the memory crossbar architecture is extremely effective as it delivered two to four times the memory bandwidth efficiency compared to previous memory architectures.
Loss-free Z-buffer compression continues to provide a 4:1 compression ratio. However, second generation occlusion culling is said to provide up to a 50% increase in efficiency. Occlusion culling is the process of removing hidden objects early on in the graphics pipeline.
The most significant feature of LMA II is its Quadcache architecture. Employed by central processing units and hard disk drives, a cache is a specialized area of high-speed memory where frequently used instructions and/or data is stored. The Quadcache architecture contains dedicated memory caches for pixel, texture, primitive, and vertex data. A cache dedicated to holding texture data can save processing time by avoiding access to slower graphics memory and decreasing the memory bandwidth. Accessing the primitive and vertex data cache can decrease the amount of traffic sent across the AGP bus by having the needed geometry data already available.
Two additional features of LMA II are Auto Pre-Charge and Fast-Z Clear. The former is a speculative technique of pre-charging memory for use by the graphics processor. The latter is a hardware solution used to initialize the Z-buffer (depth values) quickly.
The purpose of antialiasing is to visually smooth out the jagged edges that appear on objects which is typcially done by blending the colors of multiple pixel samples. Without antialiasing, a common technique used to lessen the effects of aliasing is by increasing the display resolution which results in a greater number pixels to display objects. However, the effects of aliasing can still be seen even at a resolution as high as 2048x1536 although the size of the artifacts are reduced. With the limited resolutions provided by today's graphics cards and monitors, the only way to combat aliasing is by creating the effect of having more pixels on the screen.
With the GeForce3, NVIDIA implemented multisampling antialiasing which offered excellent performance when compared to the supersampling technique used by the GeForce and GeForce2. Multisampling antialiasing is a hardware solution as the graphics processor carries with it the necessary samples required to determine a final pixel color.
NVIDIA has coined the term Accuview as the sub-system for their implementation of antialiasing on the GeForce4. Internal data paths have been further widened to provide increased antialiasing performance over the GeForce3 and a patent-pending pipeline was developed to increase overall performance.
The 2X, Quincunx, and 4X antialiasing modes that debuted with the GeForce3 are an integral part of Accuview along with a new 4XS mode. At the forefront of Accuview is a new sampling pattern that aims to increase antialiasing image quality. The following sampling pattern is more susceptible to color errors.
2X And Quincunx Sampling Pattern
With Accuview, subpixel samples are closer together which results in a higer quality antialiased image.
Accuview Shifted AA Sampling Pattern
The new 4XS antialiasing mode, which operates only under Direct 3D, provides improved subpixel coverage along with a higher level of texture quality. This method of antialiasing undergoes multiple iterations in determining a final pixel color which concludes with a color reconstruction algorithm based on weights assigned to contributing subpixels. Based on the images below, 4XS is of higher quality than 4X antialiasing.
Accuview can also be used in conjunction with anisotropic filtering which increases texture detail. Here's an example of the benefit of anisotropic filtering from a recent GeForce3 Ti 200 review.
Texture Filtering - Serious Sam
nFinite FX II Engine
The programmable transform and lighting engine on the GeForce4 has undergone a few changes from the GeForce3. The first change is the addition of a second vertex processor which will provide better performance during the execution of vertex programs. The second change is updated pixel shader support. The GeForce4 supports the DirectX 8.1 implementation of pixel shader versions 1.2 and 1.3.
nView is a robust set of tools for desktop management and multiple monitor support which was initally released by NVIDIA with the GeForce2 MX as TwinView.
The GeForce4 MX
The MX series of graphics chipsets was first introduced by NVIDIA with the GeForce2 and were aimed at the budget segment. The MX was not part of the GeForce3 family, but has been resurrected with the GeForce4 and will be offered in a variety of flavors.
GeForce4 MX 460 - 64MB DDR
GeForce4 MX 440 - 64/128MB DDR
GeForce4 MX 420 - 64/128 MB SDR
Although the name may be misleading, the GeForce4 MX does not contain a programmable transform and lighting engine (nFinite FX Engine) that made its debut with the GeForce3. What this means is that specialized graphics instructions will be carried out by the central processing unit as opposed to the graphics processor.
GeForce4 MX 460
New features of the GeForce4 MX include Accuview, a scaled down version of LightSpeed Memory Architecture II with two crossbar memory controllers, and nView.
While the GeForce4 MX still has two rendering pipelines with two texture processing units, it has finally made the transition to using faster DDR memory.
The majority of this preview looks at the performance and image quality of the GeForce4 Ti 4600. Fortunately, I had enough time to include a few benchmarks for the GeForce4 MX 460.