StealthHawk
05-29-03, 02:43 AM
I thought this would be interesting. Please vote for the choice that best represents your opinion.
In case you are not well versed in the situation, here is some relevant information:
ATI's statementThe 1.9% performance gain comes from optimization of the two DX9 shaders (water and sky) in Game Test 4 . We render the scene exactly as intended by Futuremark, in full-precision floating point. Our shaders are mathematically and functionally identical to Futuremark's and there are no visual artifacts; we simply shuffle instructions to take advantage of our architecture. These are exactly the sort of optimizations that work in games to improve frame rates without reducing image quality and as such, are a realistic approach to a benchmark intended to measure in-game performance. However, we recognize that these can be used by some people to call into question the legitimacy of benchmark results, and so we are removing them from our driver as soon as is physically possible. We expect them to be gone by the next release of CATALYST.
Tim Sweeney's commentsLeaving aside the vulnerability of "game demos used for benchmarking" to straight-forward manipulations by IHV drivers, which can be quite different and sometimes impossible to do for an actual interactive gameplay scenario, I'm wondering what your thoughts are as a programmer when IHV drivers detect and substitute your codes with their own "optimized" codes for their hardware. Even if the resultant display quality varies slightly (for the worse, that is) from what you would expect and that the variation is usually not detectable upon any cursory look at image quality (i.e. it needs to be pointed out to a viewer before he sees the differences).
Pixel shaders are functions, taking textures, constants, and texture coordinates as inputs, and producing colors as outputs. Computer scientists have this notion of extensional equality that says, if you have two functions, and they return the same results for all combinations of parameters, then they represent the same function -- even if their implementaions differ, for example in instruction usage or performance.
Therefore, any code optimization performed on a function that does not change the resulting value of the function for any argument, is uncontroversially considered a valid optimization. Therefore, techniques such as instruction selection, instruction scheduling, dead code elimination, and load/store reordering are all acceptable. These techniques change the performance profile of the function, without affecting its extensional meaning.
Optimization techniques which change your function into a function that extensionally differs from what you specified are generally not considered valid optimizations. These sorts of optimizations have occasionally been exposed, for example, in C++ compilers as features that programmers can optionally enable when they want the extra performance and are willing to accept that the meaning of their function is being changed but hopefully to a reasonable numeric approximation. One example of this is Visual C++'s "improve float consistency" option. Such non-extensional optimizations, in all sane programming systems, default to off.
3D hardware is still at a point in its infancy that there are still lots of nondeterministic issues in the API's and the hardware itself, such as undefined amounts of precision, undefined exact order of filtering, etc. This gives IHV's some cover for performing additional optimizations that change the semantics of pixel shaders, though only because the semantics aren't well-defined in the base case anyway. In time, this will all go away, leaving us with a well-defined computing layer. We have to look back and realize that, if CPU's operated as unpredictably as 3D hardware, it would be impossible to write serious software.
John Carmack's commentsRewriting shaders behind an application's back in a way that changes the output under non-controlled circumstances is absolutely, positively wrong and indefensible.
Rewriting a shader so that it does exactly the same thing, but in a more efficient way, is generally acceptable compiler optimization, but there is a range of defensibility from completely generic instruction scheduling that helps almost everyone, to exact shader comparisons that only help one specific application. Full shader comparisons are morally grungy, but not deeply evil.
The significant issue that clouds current ATI / Nvidia comparisons is fragment shader precision. Nvidia can work at 12 bit integer, 16 bit float, and 32 bit float. ATI works only at 24 bit float. There isn't actually a mode where they can be exactly compared. DX9 and ARB_fragment_program assume 32 bit float operation, and ATI just converts everything to 24 bit. For just about any given set of operations, the Nvidia card operating at 16 bit float will be faster than the ATI, while the Nvidia operating at 32 bit float will be slower. When DOOM runs the NV30 specific fragment shader, it is faster than the ATI, while if they both run the ARB2 shader, the ATI is faster.
When the output goes to a normal 32 bit framebuffer, as all current tests do, it is possible for Nvidia to analyze data flow from textures, constants, and attributes, and change many 32 bit operations to 16 or even 12 bit operations with absolutely no loss of quality or functionality. This is completely acceptable, and will benefit all applications, but will almost certainly induce hard to find bugs in the shader compiler. You can really go overboard with this -- if you wanted every last possible precision savings, you would need to examine texture dimensions and track vertex buffer data ranges for each shader binding. That would be a really poor architectural decision, but benchmark pressure pushes vendors to such lengths if they avoid outright cheating. If really aggressive compiler optimizations are implemented, I hope they include a hint or pragma for "debug mode" that skips all the optimizations.
In case you are not well versed in the situation, here is some relevant information:
ATI's statementThe 1.9% performance gain comes from optimization of the two DX9 shaders (water and sky) in Game Test 4 . We render the scene exactly as intended by Futuremark, in full-precision floating point. Our shaders are mathematically and functionally identical to Futuremark's and there are no visual artifacts; we simply shuffle instructions to take advantage of our architecture. These are exactly the sort of optimizations that work in games to improve frame rates without reducing image quality and as such, are a realistic approach to a benchmark intended to measure in-game performance. However, we recognize that these can be used by some people to call into question the legitimacy of benchmark results, and so we are removing them from our driver as soon as is physically possible. We expect them to be gone by the next release of CATALYST.
Tim Sweeney's commentsLeaving aside the vulnerability of "game demos used for benchmarking" to straight-forward manipulations by IHV drivers, which can be quite different and sometimes impossible to do for an actual interactive gameplay scenario, I'm wondering what your thoughts are as a programmer when IHV drivers detect and substitute your codes with their own "optimized" codes for their hardware. Even if the resultant display quality varies slightly (for the worse, that is) from what you would expect and that the variation is usually not detectable upon any cursory look at image quality (i.e. it needs to be pointed out to a viewer before he sees the differences).
Pixel shaders are functions, taking textures, constants, and texture coordinates as inputs, and producing colors as outputs. Computer scientists have this notion of extensional equality that says, if you have two functions, and they return the same results for all combinations of parameters, then they represent the same function -- even if their implementaions differ, for example in instruction usage or performance.
Therefore, any code optimization performed on a function that does not change the resulting value of the function for any argument, is uncontroversially considered a valid optimization. Therefore, techniques such as instruction selection, instruction scheduling, dead code elimination, and load/store reordering are all acceptable. These techniques change the performance profile of the function, without affecting its extensional meaning.
Optimization techniques which change your function into a function that extensionally differs from what you specified are generally not considered valid optimizations. These sorts of optimizations have occasionally been exposed, for example, in C++ compilers as features that programmers can optionally enable when they want the extra performance and are willing to accept that the meaning of their function is being changed but hopefully to a reasonable numeric approximation. One example of this is Visual C++'s "improve float consistency" option. Such non-extensional optimizations, in all sane programming systems, default to off.
3D hardware is still at a point in its infancy that there are still lots of nondeterministic issues in the API's and the hardware itself, such as undefined amounts of precision, undefined exact order of filtering, etc. This gives IHV's some cover for performing additional optimizations that change the semantics of pixel shaders, though only because the semantics aren't well-defined in the base case anyway. In time, this will all go away, leaving us with a well-defined computing layer. We have to look back and realize that, if CPU's operated as unpredictably as 3D hardware, it would be impossible to write serious software.
John Carmack's commentsRewriting shaders behind an application's back in a way that changes the output under non-controlled circumstances is absolutely, positively wrong and indefensible.
Rewriting a shader so that it does exactly the same thing, but in a more efficient way, is generally acceptable compiler optimization, but there is a range of defensibility from completely generic instruction scheduling that helps almost everyone, to exact shader comparisons that only help one specific application. Full shader comparisons are morally grungy, but not deeply evil.
The significant issue that clouds current ATI / Nvidia comparisons is fragment shader precision. Nvidia can work at 12 bit integer, 16 bit float, and 32 bit float. ATI works only at 24 bit float. There isn't actually a mode where they can be exactly compared. DX9 and ARB_fragment_program assume 32 bit float operation, and ATI just converts everything to 24 bit. For just about any given set of operations, the Nvidia card operating at 16 bit float will be faster than the ATI, while the Nvidia operating at 32 bit float will be slower. When DOOM runs the NV30 specific fragment shader, it is faster than the ATI, while if they both run the ARB2 shader, the ATI is faster.
When the output goes to a normal 32 bit framebuffer, as all current tests do, it is possible for Nvidia to analyze data flow from textures, constants, and attributes, and change many 32 bit operations to 16 or even 12 bit operations with absolutely no loss of quality or functionality. This is completely acceptable, and will benefit all applications, but will almost certainly induce hard to find bugs in the shader compiler. You can really go overboard with this -- if you wanted every last possible precision savings, you would need to examine texture dimensions and track vertex buffer data ranges for each shader binding. That would be a really poor architectural decision, but benchmark pressure pushes vendors to such lengths if they avoid outright cheating. If really aggressive compiler optimizations are implemented, I hope they include a hint or pragma for "debug mode" that skips all the optimizations.