As discussed in the section on Lightspeed Memory Architecture, occlusion culling is a feature of the GeForce3, which is used to reduce the amount of memory and processing required by a 3D graphics application. We learned that objects in any given 3D scene can be occluded from view and that removing them early in the graphics rendering pipeline is beneficial for increased performance.
In many 3D rendering applications, a database such as SurRender Umbra, is used to track all of the objects that can appear in the 3D world. The applications task is to determine the entries in the database to be processed at any given time otherwise referred to as scene management.
Double Eagle Tanker - Piping Model
During the course of rendering these objects, a determination can be made if an object, or part of an object, is obstructed from view by other objects. Without such a determination, a pixel ends up being rendered multiple times per frame resulting in excess reads from and writes to the color and depth buffers. The number of times a pixel is rendered per frame determines the scenes depth complexity. As depth complexity increases, so do processing time and memory usage.
While the GeForce3 utilizes a Z-occlusion culling technology, there are other visibility detection techniques:
- View Frustum Culling - removes objects that are outside the camera viewport.
- Backface Culling - removes objects that are facing away from the camera viewport.
The idea behind occlusion culling using the Z-buffer is to test whether on object is located closer to the camera (player viewport) than the current depth buffer value. If the test fails, the current object is not visible. Otherwise, the depth buffer value is updated, and the object is drawn into the frame buffer.
Using the object database as an example, in order to determine if other objects in a scene occlude an object, the database could be sorted and rendered from front to back. In addition, the database can be sorted hierarchically in an order that the outer objects are rendered first and the inner objects are rendered last. This type of occlusion detection algorithm relies on the techniques found in Binary Space Partitioning.
As previously discussed, traditional graphics architectures render every pixel of every triangle as it
receives them, accessing the frame buffer with each pixel to determine the appropriate values for the
color and z (or depth) for each of those pixels. This method produces correct results, but requires all
of the pixels to be rendered, regardless of their visibility or not.
For example, a rendered image of a wall has a depth complexity of one. An image of a person standing in front of a wall has a depth complexity of two. An image of a dog behind the person but in front of the wall has a depth complexity of three, and so on.
In an interesting turn of events, I happened to run across a variety of 3D models of the Double Eagle Tanker at The Walkthru Project. The massive double hulled tankers were designed and manufactured where I currently work as an IT Analyst. However due to excessive cost overruns only five of the ships were delivered before the project was abandoned.
Double Eagle Tanker
Some of the Double Eagle models used by the project consist of as many as 82 million triangles and its sheer size taxes the limits of their current computing resources in terms of memory, CPU, disk I/O, and graphics hardware.
Double Eagle Tanker - Whole Model
We will spare my system the expense of rendering this mammoth model and run a test that requires far less computer resources. A few months ago, a benchmark was being discussed over at the Beyond 3D forums. The author of the program, who posted under the name of Humus, developed GL_EXT_reme, which includes a series of tests to measures the effectiveness of occlusion culling on a graphics card.
GL_EXT_reme Overdraw Test
This test pits the GeForce2 Ultra and GeForce3 in a head-to-head comparison. The results are measured in frames per second and the first test is based on an overdraw factor of three.
Overdraw Factor Of 3
Although NVIDIA has remained secretive of the occlusion culling technology used in the GeForce3, it's obvious that rendering scenes in a front to back sequence increases performance. The early depth buffer check in the rendering pipeline will reject more pixels from being drawn.
A front to back renderer works in the opposite way of back to front. Objects closer to the viewport are rendered first, and objects further away are clipped against the objects already drawn which results in a minimal amount of overdraw.
Overdraw Factor Of 8
By increasing the amount of overdraw, the GeForce3 widens it's lead over the GeForce2 Ultra percentage-wise when rendering front to back with gains of over 170% at all three resolutions.
Back to front rendering carries with it additional overhead as objects that are already rendered become obscured by closer ones. Realizing that the memory clock speed of the GeForce2 Ultra and GeForce3 are both 460MHz, you see that when rendering back to front, memory bandwidth saving efficiencies such as the memory crossbar architecture of the GeForce3, come into play to provide better performance.
The implementation of occlusion culling in the GeForce3 is a step in the right direction in order to maximize frame rates. It also provides benefits for us as the performance penalty for increasing depth complexity is lessened. Of course this assumes that developers plan on increasing the complexity of scenes in future game titles.
Next Page: DroneZ & The nFiniteFX Engine