Turning back the clock a few years, we've come to realize that the original Quake brought on much of the early revolution that took place in 3D graphics. GLQuake, which was the OpenGL accelerated version of Quake, used a feature referred to as light maps to improve the quality of in-game lighting. However, along with this improvement came a corresponding increase in texture fill-rate.
GLQuake was a fill-rate hungry application. Without the ability of NVIDIA's earliest graphics chipsets to multitexture, many of the pixels in one of my favorite first person shooters ended up being rendered twice; once with a base texture and a second time with a light map.
In late 1998 a series of events allowed NVIDIA to become a legitimate player in the consumer based 3D graphics chipset market. First, NVIDIA was induction into the OpenGL Architecture Review Board (ARB) which was followed by the ARB's acceptance of multitexturing into version 1.2 of OpenGL. The final piece of the puzzle was the debut of the TNT and its TwinTexel architecture.
NVIDIA is not only excited to be inducted into the OpenGL ARB, but we also applaud the decision to add multitexturing into OpenGL 1.2, said David Kirk, chief scientist at NVIDIA. This new extension from the ARB will create a standard way of multitexturing with OpenGL which is supported with our RIVA TNT and all future products from NVIDIA."
The TNT became a force in the 3D gaming arena as it contained two texturing units which allowed a light map and base surface texture to be rendered in a single pass. Before the TNT came out, I had been chugging along playing Quake in software mode at a resolution of 320x200. After having upgraded to a 450MHz Pentium 2 and Diamond Viper V550 graphics card, I was able to run GLQuake at a resolution of 1024x768 at 50 frames per second.
|Multitexturing Under Direct3D
Fast forwarding a couple of years saw the debut of the GeForce2 with its fixed function NVIDIA Shading Rasterizer. By utilizing the NSR, per pixel effects such as bump mapping were possible in hardware as multiple textures could be combined to create realistic visual effects.
For example, the following images were taken from NVIDIA's NVEffectsBrowser which demonstrates the varied capabilities of the GeForce. In this example, dot3 bump mapping is shown which requires the application of multiple texture layers.
The GeForce2 architecture consisted of two texture processing units for each of its four per pixel pipelines and was capable of rendering four two-layered textures in a single pass, or clock cycle. However, if an application required a third texture layer, the GeForce2 needed an extra pass in order to blend in the third texture.
GeForce2 Texture Pipeline
To showcase the capabilities of the NSR, Computer Artworks completed Evolva shortly after the GeForce2 became available. Working closely with NVIDIA, they also followed up with a special version of the game that employed dot3 bump mapping.
||Bump Mapped Mode
As with the GeForce2, the GeForce3 contains four pixel pipelines with two texture processing units per pipeline. However, the GeForce3 texture pipeline was given a boost and is capable of processing up to four textures in a single pass.
GeForce3 Texture Pipeline
As a side note, John Carmack of id Software even mentions (Feb. 22, 2001) the benefits of the GeForce3 early on during his testing:
Adding more texture units and more register combiners is an obvious evolutionary step.
An interesting technical aside: when I first changed something I was doing with five single or dual texture passes on a GF to something that only took two quad texture passes on a GF3, I got a surprisingly modest speedup. It turned out that the texture filtering and bandwidth was the dominant factor, not the frame buffer traffic that was saved with more texture units. When I turned off anisotropic filtering and used compressed textures, the GF3 version became twice as fast.
Pretty impressive, but does it actually make a difference?
Let's put the GeForce3 to the test by first examining the frame rate performance of the normal, or non-bump mapped, version of Evolva. A GeForce2 Ultra is used for comparison. The test system consists of a 700MHz Pentium 3 overclocked to 735MHz.
Evolva Normal Demo - 32-Bit Color
Next up is the bump mapped version of Evolva. Knowing that dot3 bump mapping uses multiple texture layers, the GeForce3 should provide a decisive edge in performance over the GeForce2 Ultra in fill-rate limited resolutions.
Evolva Bump Map Demo - 32-Bit Color
Applying multiple textures in a single pass normally provides better performance as opposed to using multiple passes. With the GeForce2 Ultra, multiple passes translate into additional geometry and z-buffer calculations which slows down the overall rendering process.
If you're familiar with the VillageMark benchmark, you'll probably know that it was developed by PowerVR Technologies to show the benefits of tile based rendering of the Kyro graphics chipset.
Ironically, I had planned to use VillageMark in an article on occlusion culling and the GeForce3. But after a few performance tests, I realized that the benefits of multitexturing on the GeForce3 play a significant role in VillageMark performance.
If we take a look at performance of the GeForce3 and GeForce2 Ultra in 16-bit color, you see that both cards offer similar performance.
VillageMark Results - 16-Bit Color
What's interesting about the results is that the occlusion culling technique used by the GeForce3 is not effective in VillageMark. This is more than likely due to the rendering method that is used by VillageMark. NVIDIA recommends front-to-back rendering to achieve optimal performance on the GeForce3.
However, in 32-bit color we see a different story.
VillageMark Results - 32-Bit Color
What's being shown here is the benefit of single pass quadtexturing with the GeForce3. As shown below, the results indicate that the GeForce3 can process the three texture layers used by VillageMark in a single pass.
With the GeForce2 Ultra, the same results screen shows 2 layers plus 1 which indicates that multiple passes are needed for texture processing.
Similar results were achieved in MadOnion's 3DMark2001 single and multitexturing fill rate tests. However, performance was significantly better on the GeForce3 - especially in the multi-texturing test where the fill rate was close to double that of the GeForce2 Ultra (1,308 vs. 681 MTexels/second).
|Multitexturing Under OpenGL
To test multitexturing performance under OpenGL, I used a variety of synthetic benchmarks along with the retail version of Serious Sam with patch 1.02 applied. The synthetic benchmarks consist of TexBench (version 1.0) and GL Excess (version 1.1).
This test consists of measuring fill rate at a resolution of 1024x768 in 32-bit color. Two textures were rendered simultaneously using the following textures sizes - 256x256, 512x512, and 1024x1024. The fill rate is reported in millions of pixels per second.
TexBench Fill Rate Tests
The GL Excess fill rate tests were selected using a resolution of 1024x768 in 32-bit color. Results are provided in frames per second.
Again, the GeForce3 outperforms the GeForce2 Ultra in this particular test.
GL Excess Fill Rate Tests
For real-world performance testing I chose Serious Sam, which was developed by Croteam. Serious Sam is great for a quick pick-me-up as the first person shooter is reminiscent of the plentiful enemy-filled levels of the original Doom.
Sporting a modern 3D graphics engine, there are instances when a polygon can contain up to five textures (main texture, hyper map, detail texture, shadow map, and haze or fog).
The following results are based on the Quality setting at a resolution of 1024x768 in 32-bit color using the Dunes demo. EAX sound effects were enabled during the tests.
Serious Sam Results
At a resolution of 1600x1200, the GeForce3 manages a 17% gain in performance over the GeForce2 Ultra. While this isn't exactly earth shattering, at least we have come to realize that the GeForce3 has an ace up its sleeve when it comes to texture intensive applications which include antialiasing.
Next: Occlusion Culling