Quote:
|
Originally Posted by tertsi
WI haven't seen any problems with "FP16" options on the NV4x and G7x but there could be a problems on the NV3x because I didn't test my Q4 tweaks on my 5800u or 5900u.
|
It's those ubiquitous (in both doom3 and quake4) floor gratings, they look fairly fugly with fp16 on (and that's on my 6800GT, ie NV40). Using short temps though is fine.
Quote:
|
Originally Posted by tertsi
"SUB/MUL/MUL" should be most of cases 1-2fps faster in Doom 3 and Quake 4 than "MAD/MUL" though "MAD/MUL" was one cycle faster in my fragment program analysis.
|
Hmmn? If SUB/MUL/MUL uses one more cycle then MAD/MUL (which seems fairly intuitive), why would it be 1-2fps faster? Especially since one of your other optimizations is to replace a MUL/ADD with a MAD? (Although I do see why it was left as MUL/ADD, it looks neater due to the ADD being the summation of the results of two lots of SUB/DP3/RSQ/MUL's).
Quote:
|
Originally Posted by tertsi
EDIT! Quake 4 standard interaction.vfp is optimized for ATI Vec3D + Vec1D hardware....
|
Heh, well structurally the only noticable difference from doom3's is using the specular maths instead of a lookup, and all those masks, so either way I can't see this being an issue for Nvidia hardware. I did note that a comment mentioned the use of .w was more optimal for ATI then .x, presumably because ATI's ability to do 3 component vector plus one scalar in the same pass requires that the scalar is in the .w component (Nvidia hardware is more flexible in this regard).