Question and Answer Session with NVIDIA's David Kirk
By: Mike Chambers - November 28, 2001
About a month ago, we had an opportunity to send in a list of questions for NVIDIA's Chief Scientist David Kirk to answer. David took time out of his schedule and furnished us with the following answers. But first, here is a brief biography:
David B. Kirk, Ph.D. – Chief Scientist at NVIDIA David Kirk has been Chief Scientist for the Company since January 1997. From June 1996 to January 1997, Dr. Kirk was a software and technical management
consultant. From 1993 to 1996, Dr. Kirk was Chief Scientist, Head of Technology for Crystal Dynamics, a video game manufacturing company.
From 1989 to 1991, Dr. Kirk was an engineer for Apollo Systems Division of Hewlett-Packard
Company. Dr. Kirk has authored seven patents relating to graphics design and has authored more than 50 articles on graphics technology. Dr. Kirk holds B.S.and M.S. degrees in Mechanical Engineering from the Massachusetts Institute of Technology and M.S. and Ph.D. degrees in Computer Science from the California Institute of Technology.
Question: In terms of performance and image quality, how do shadow buffers differ from today's shadow volume and stenciling techniques?
David: Shadow volumes, using the stencil buffer, produce accurate and efficient shadows. Because the stencil based shadows are created using either real or approximate scene geometry, the shadows are true to the actual characters and objects in the scene. Objects can easily and correctly shadow themselves and each other, with no artifacts or errors. There are a couple of drawbacks to stencil shadows, however. Very complex character models composed of extremely high polygon counts can stress the polygon setup part of the graphics engine, although we have not yet seen games that use models this complex. Using only the silhouettes of characters for producing the shadow representations can also reduce this load. All in all, stencil-based shadow volumes are a robust, high performance and high quality technique for rendering shadows, but have a few limits in realistically rendering lifelike shadows.
Stencil-based shadow edges are sharp and clear and, although this is often correct, it’s not always what is wanted for the “feel” of the scene. Sharp shadow edges are also not always correct, when the lighting is softer and more diffuse. In this case, softer edged shadows are desired. Shadow buffers can more easily render softer shadows. In the past, shadow buffers have been problematic for a few reasons, all of which are resolved by GeForce3’s shadow buffer techniques. First, the same graphics pipeline that is doing the normal drawing renders shadow buffers. Previous hardware did not have enough performance and bandwidth to render shadow buffers at high precision and high resolution, leading to jagged shadow edges (rather than soft), and visibility errors (the wrong things in the shadow). Also, limited rendering precision caused problems with objects casting shadows on themselves: the depth precision was not sufficient to decide “what’s in front” - the object or its shadow! GeForce3’s raw pixel rendering power and high precision calculations allow efficient and flawless shadow buffer rendering.
Question: How is the implementation of vertex shaders by ATI and NVIDIA so different that we see a significant discrepancy in performance between the GeForce3 and Radeon 8500 in the number of lit triangles test in 3DMark2001?
David: I’ve heard this mentioned before, and I haven’t really dug into it too much, since it’s not a very realistic real-world example. The lit triangles test in 3Dmark2001 renders a lot of offscreen (invisible) triangles, which results in a lot of what I would call “meaningless computation”. So, I’m not sure exactly what capability is being measured in the test. It’s not visible. Often, benchmarks test Software or Hardware code paths that are not general and are not commonly used. We at NVIDIA don’t make it a practice to optimize our pipeline for specific benchmarks - we want to provide high quality and high performance on a wide variety of useful and entertaining applications, rather than just getting a good score. Ask yourself (or, better yet, ask ATI) why Radeon 8500 performs well on this one test, and poorly on many other 3DMark2001 tests.
Question: Are any attempts being made to standardize OpenGL based vertex and pixel shaders? It was interesting to see in some Radeon 8500 reviews that neither DroneZ nor GLMark could be used for benchmarking since the OpenGL extensions used in those applications are specific to NVIDIA. Will NVIDIA be providing shader extensions that other graphics chipset manufacturers can implement?
David: NVIDIA has offered to license our patented vertex programming technology, royalty free, to other IHVs, in order that they can implement our extension without risk. For some reason, ATI has not chosen to take advantage of this license for Radeon 8500, although other hardware and software vendors have eagerly adopted the technology. It is disappointing to me that they want to create turmoil instead of compatibility. NVIDIA is working very hard to encourage adoption of a single compatible extension. That is why we have generously offered this license for a single, industry-wide extension, and have declined to license our patented technology for a smattering of competing extensions.
As soon as we can build some industry consensus on vertex programming, we will move on to the more difficult problem of defining compatible extensions for pixel programming.
Question: It appears that graphics hardware is leading the development of DirectX and OpenGL. A recent example is that pixel shader 1.4 support was specifically added to DirectX 8.1 for the Radeon 8500. Some feel that the API's should be leading the hardware and not the other way around. Do you think this will change NVIDIA's future plans in the 3D area and NVIDIA's relations with these APIs (read Microsoft and the ARB)?
David: In the past, NVIDIA has provided a lot of innovation in terms of graphics pipeline technology, and we continue to do so. We have licensed our innovated technology both to Microsoft for inclusion in their DirectX8 API, and to members of the OpenGL ARB. I think it’s wrong-headed to believe that APIs can realistically lead the hardware. Only hardware developers can know what’s going to be possible and doable in any given hardware generation. Once those capabilities are present, the API layer can expose them. The Software API does not create features, it is simply a layer on top of the hardware that presents the features to developers in a clean and friendly way.
Question: How are the 3GIO (PCI 3.0) and HyperTransport protocols going to affect the graphics industry as a whole, and NVIDIA specifically? Will these changes affect the AGP standard, or will AGP cards be replaced by 32 and 64-channel PCI 3.0 specific cards?
David: I think that the next exciting milestone for graphics interconnects is AGP8X. Double the bandwidth, double the performance, in a compatible physical and electrical form factor! AGP8X is probably the interconnect of choice for all of next year, and well into the following year. After that, we as an industry will be ready for the next big thing. 3GIO looks like it will be an incompatible form factor, so it will be a difficult transition. We need to really want it. So, my guess is that until 3GIO can deliver well beyond the equivalent of “AGP16X”, adoption will be slow. We’ll see. The great thing about the future is that you never know what’s going to happen :)
Question: Is there an extension or some other means that allows developers to tie into the GeForce3’s occlusion detection? Or is it simply a circuit that generically addresses the issue, without any direct code to take advantage of? For example, would a developer have to use a “traditional renderer-friendly” approach when sorting geometry in order to take advantage of it?
David: The GeForce3’s occlusion culling hardware operates automatically without the knowledge or intervention of developers or users. No special techniques are required. It simply doesn’t draw things that are going to be invisible in the final scene. Neat trick. I like to say that we can “not draw” graphics faster than anyone. However, there is additional benefit to be had if developers are aware of the technology. Since invisible geometry and pixels can be culled faster that they could be drawn, it is advantageous to draw foreground objects first, and background object later. This allows more geometry to be culled. Interestingly, this technical tidbit was ALWAYS faster, even before occlusion culling. Even when doing ordinary Z-buffering, back-to-front rendering causes everything to be drawn, even if it will eventually be occluded. Rendering front-to-back allows invisible objects to not be drawn, although the objects still have their Z values checked against the Z buffer. Occlusion culling simply accelerates this case, and can often avoid that extra Z-buffer read.
Question: Is there a theoretical maximum frequency of the core and memory on the GeForce3? Assuming that heat was being taken away fast enough to prevent overheating, what is the actual physical limit of the card?
David: That’s a hard question. The memory used by the GeForce3 is just DDR SDRAM memory. It goes as fast as it goes, and no faster; we don’t control that. There is a limit inside the DRAM chips that determines how fast data can get in and out. That’s a design limitation, based on the power being used to drive the signals, and the capacitance and inductance of the circuits. In practice, GeForce3 can drive commercially available memory chips as fast as they are capable. We would probably not choose to ship a product amped-up that much - as you say, it would dissipate quite a lot of heat.
So, let’s assume that the DRAM is perfect, you’re taking way all of the heat, then what are the next limits? This heat that you’re taking away has to come from somewhere, and that somewhere is the power applied to the chip. Are you relaxing those limits, too? So, let’s see, no limits to heat, power… that must mean that we’re only limited by the speed of light.