CineFX is the engine that powers the GeForce series of GPU's and consists of the graphics hardware and ForceWare drivers, both of which are closely tied to a specific version of the DirectX and OpenGL Applications Programming Interfaces (API's). The third-generation of CineFX debuts on the GeForce 6 Series and embraces the technology behind DirectX 9.0 and Shader Model 3.0.
Nalu Technology Demo
Under OpenGL, graphics acceleration in hardware is achieved through vendor specific extensions or extensions that have been adopted by the OpenGL Architectural Review Board (ARB). For example, the ARB_vertex_shader extension introduced programmable vertex shaders to OpenGL in version 1.4.
Direct3D vs. OpenGL Shader Compiler
Programmability is an exciting and powerful feature of modern GPU's. While shader programs are natively written in assembly language, Microsoft's High Level Shader Language (HLSL) and OpenGL's Shader Language (GLSL) continue to gain acceptance in the developer community and are supported by CineFX 3.0. The shader compiler plays a key role as it translates high level shader instructions to assembler while hardware specific optimizations can be achieved by providing compiler hints.
FX Composer
NVIDIA continues to support developers by releasing tools such as NVShaderPerf, which reports on DirectX and OpenGL shader performance for GeForce FX GPU's. Shader development tools like FX Composer are geared towards simplifying shader development by incorporating real-time preview options and optimization features. NVIDIA's Software Development Kit (SDK) is a valuable resource that contains sample code, demos, tools, technical papers, and tips for DirectX and OpenGL.
VERTEX SHADER 3.0
The following table summarizes the key differences between vertex shader 2.0 and vertex shader 3.0 features. A vertex shader performs tasks that include transformation, texture coordinate generation, lighting, and vertex level texture access.
Feature
Vertex Shader 2.0
Vertex Shader 3.0
Instruction Slots
256
≥ 512
Max Instructions
65535
≥ 65535
Dynamic Branching
No
Required
Texture Lookup
No
Up to 4
Stream Divider
No
Yes
The GeForce 6800 Ultra contains six vertex processing units that are managed by an instruction scheduler. The vertex processing units are based on a Multiple Instruction, Multiple Data (MIMD) parallel architecture, which is characterized by each processor having its own copy of program instructions while performing different operations on different data streams.
GeForce 6 Series Vertex Processing Unit
Within each vertex processor is a scalar processing unit and a vector processing unit, which operate in parallel. A new feature of the GeForce 6 Series vertex shader is the texture fetch unit, which is capable of retrieving texture data from memory. This technology serves as the foundation for a hardware-assisted displacement mapping technique that can be accomplished using vertex shaders.
Infinite Length Vertex Programs
With previous versions of CineFX, NVIDIA based vertex shader limitations on the limitations Microsoft established in the DirectX API. Under DirectX 9.0, vertex shader versions 2.0 and 2.a allowed a maximum of 256 instruction slots and 65,535 instructions per program, and were incorporated in the CineFX FX 2.0 architecture of the GeForce FX. Although 256 slots are reserved for instructions, the number of instructions that can be executed is higher due to looping.
EverQuest 2
Vertex shader 3.0 has a minimum constraint of 512 instruction slots while the maximum number of instructions is capped by the MaxVShaderInstructionsExecuted variable in D3DCAPS9. Although vertex shader 3.0 documentation suggests that the maximum number of instructions be at least 2^16, CineFX 3.0 and GeForce 6 allow vertex shaders to execute an unlimited number of instructions.
Dynamic Flow Control
Dynamic flow controls are available to vertex shaders, which will provide greater control over program logic. New flow controls consist of new instructions (ifc/breakc, if/break/callnz), an eight-deep stack for return addresses and address registers (branch, call, push, pop), and condition code selection.
Dynamic branching increases the flexibility of vertex operations as conditions can be developed that determine how a specific vertex should be processed. This flexibility can also result in improved performance as unnecessary shader operations can be avoided. The flexibility that dynamic branching provides is a welcome feature, but care should be taken in order to ensure that processing remains efficient.
Texture Lookup
Displacement mapping is a graphics technique used to increase the visual detail of surfaces by incorporating effects like bumps, cracks, and dents. Traditional displacement mapping algorithms typically subdivide the structure of the underlying geometry to achieve a desired geometric level of detail, which can be computationally intensive. For example, the images below originated from a head model. The original mesh is comprised of 1,358 triangles while the displaced mesh is comprised of 48,434 triangles.
Geometry Based Displacement Mapping
With CineFX 3.0, NVIDIA designed texture fetching from memory capabilities into the vertex shader unit. This feature allows textures to be mapped onto vertices, which can be used to create an effect similar to geometry-based displacement mapping. Up to four textures can be retrieved and mip-maps are supported although no texture filtering occurs.
Vertex Frequency Stream Divider
A programmable vertex shader consists of instructions that manipulate vertex element data such as color, position, and texture coordinates. During the execution of a vertex shader, data is sent to an arithmetic logic unit (ALU) to perform the requested arithmetic and Boolean operations. The rasterization process operates on vertex component streams, which are comprised of vertex elements defined in the vertex shader.
Prior to vertex shader 3.0, a vertex shader was called once per vertex. Every time a vertex shader was called, its input registers, which are bound to vertex element data, were initialized with the vertex elements from the vertex streams.
Lord Of The Rings, The Battle for Middle-Earth
Vertex shader 3.0 allows an application to assign a rate at which vertex shader input registers are initialized. The rate determines the number of vertices that are processed before obtaining data from the vertex stream and loading it to the input registers.
The Vertex Frequency Stream Divider will benefit games that frequently make use of objects that are replicas of one another. In many cases, these objects are designed to perform similar actions and therefore can be efficiently controlled by the rate at which the vertex stream data is updated. The developer can also issue a "batch" update, which will affect all of the objects in a scene or limit the effect to a specific group of objects thereby providing them with unique characteristics.
Note that information on pixel shader 3.0 and other new features about the GeForce 6 will be forthcoming. For more information on Shader Model 3.0, please visit Microsoft's WinHec 2004 web site and read the article Shader Model 3.0 - No Limits, which was written by D. Sim Dietrich Jr. of NVIDIA.