PDA

View Full Version : A new shader precision thread :)


Pages : [1] 2

SurfMonkey
06-13-03, 06:18 AM
Digit Life have written an article all about shader precision that isn't too complicated. You can read it here (http://www.digit-life.com/articles2/ps-precision/index.html).

Just for those who keep saying that all DX9 shader precision is *only* calculated in 24bit, here's the result of the DX9 reference shader:


WARNING: Reference device was used!

Registers precision:
Rxx = s23e8 (temporary registers)
Cxx = s23e8 (constant registers)
Txx = s23e8 (texture coordinates)

Registers precision in partial precision mode:
Rxx = s23e8 (temporary registers)
Cxx = s23e8 (constant registers)
Txx = s23e8 (texture coordinates)



s23e8 is FP32. All shaders are calculated at FP32 unless you specify _pp. The R3xx series downgrades the FP32 data into FP24 therefore losing some precision. But it's no big deal as most people wouldn't be able to tell the difference anyway. :)

gstanford
06-13-03, 07:09 AM
From the article:

John Carmack
Yet in 2000 John Carmack mentioned the necessity of floating point numbers in a pixel pipeline in addition to those in a geometry pipeline of graphic cards. Let's quote his words:

4/29/00

-------

We need more bits per color component in our 3D accelerators.

I have been pushing for a couple more bits of range for several years now, but I now extend that to wanting full 16 bit floating point colors throughout the graphics pipeline. A sign bit, ten bits of mantissa, and five bits of exponent (possibly trading a bit or two between the mantissa and exponent). Even that isn't all you could want, but it is the rational step.

Note we are talking color and therefore pixel shaders here, not vertex shaders which is the main area you would use FP32.

jAkUp
06-13-03, 09:49 AM
hmmm they also have a pixel shader test utility... anyone tried this??

http://www.ixbt.com/video2/images/ps-precision/PSPrecision13.zip


/EDIT: hmmm...

================================================== ==================
PixelShader 2.0 precision test. Version 1.3
Copyright (c) 2003 by ReactorCritical / iXBT.com
Questions, bug reports send to: clootie@ixbt.com

Device: NVIDIA GeForce FX 5800 Ultra
Driver: nv4_disp.dll
Driver version: 6.14.10.4403

Registers precision:
Rxx = s0e3 (temporary registers)
Cxx = s0e3 (constant registers)
Txx = s0e3 (texture coordinates)

Registers precision in partial precision mode:
Rxx = s0e3 (temporary registers)
Cxx = s0e3 (constant registers)
Txx = s0e3 (texture coordinates)

Nv40
06-13-03, 12:37 PM
Originally posted by SurfMonkey

s23e8 is FP32. All shaders are calculated at FP32 unless you specify _pp. The R3xx series downgrades the FP32 data into FP24 therefore losing some precision. But it's no big deal as most people wouldn't be able to tell the difference anyway. :) [/B]


only game developers .. :)

Tim Sweeney, Epic Games For games shipping in 2003-2004, the multiple precision modes make some sense. It was a stopgap to allow apps to scale from DirectX8 to DirectX9 dealing with precision limits. Long-term (looking out 12+ months), everything's got to be 32-bit IEEE floating point. With the third generation Unreal technology, we expect to require 32-bit IEEE everywhere, and any hardware that doesn't support that will either suffer major quality loss or won't work at all. -------------------------------------------------------------------
We can live with this in the 2003-2004 timeframe, but after that, if you don't do full 32-bit IEEE floating point everywhere, your hardware is toast.

SlyBoots
06-13-03, 02:38 PM
Originally posted by Nv40
only game developers .. :)

So what you're implying here is the NV3x really sucks since they're forced into FP16 because of hardware limitations:p

extreme_dB
06-13-03, 02:48 PM
Why do all the NV3x chips below NV35 (every FX chip on the market to date) force partial precision no matter what?

Does that mean development work done on the NV30 using FP32 is compromised as well?

Nv40
06-13-03, 02:53 PM
Originally posted by SlyBoots
So what you're implying here is the NV3x really sucks since they're forced into FP16 because of hardware limitations:p

nope ,im not implyng anything.. im simply posting what TOp game developer thinks about the diferences between Fp24 and FP32 in their games.. :)

but now that you ask me for opinions ,i prefer to have the feature and see the effects even if it is slow the performance,than to not see the effect at all. :)

gokickrocks
06-13-03, 03:13 PM
Originally posted by SurfMonkey
Digit Life have written an article all about shader precision that isn't too complicated. You can read it here (http://www.digit-life.com/articles2/ps-precision/index.html).

Just for those who keep saying that all DX9 shader precision is *only* calculated in 24bit, here's the result of the DX9 reference shader:


WARNING: Reference device was used!

Registers precision:
Rxx = s23e8 (temporary registers)
Cxx = s23e8 (constant registers)
Txx = s23e8 (texture coordinates)

Registers precision in partial precision mode:
Rxx = s23e8 (temporary registers)
Cxx = s23e8 (constant registers)
Txx = s23e8 (texture coordinates)



s23e8 is FP32. All shaders are calculated at FP32 unless you specify _pp. The R3xx series downgrades the FP32 data into FP24 therefore losing some precision. But it's no big deal as most people wouldn't be able to tell the difference anyway. :)

i didnt see the reference device being used in the digit-life article...

unless you had the source code to change the D3DDEVTYPE to REF and recompiled the program...

or the D3DDEVTYPE had a fallback to the REF if your card wasnt dx9 compliant...

goofer456
06-13-03, 03:22 PM
Originally posted by Nv40
nope ,im not implyng anything.. im simply posting what TOp game developer thinks about the diferences between Fp24 and FP32 in their games.. :)

but now that you ask me for opinions ,i prefer to have the feature and see the effects even if it is slow the performance,than to not see the effect at all. :)

Please supply an example of an effect that shows with FP32 but does not with FP24 in a game or non NV / Ati demo

Nv40
06-13-03, 04:16 PM
Originally posted by goofer456
Please supply an example of an effect that shows with FP32 but does not with FP24 in a game or non NV / Ati demo

is not that it will not "show" ,its more than it will not work in the way was intented ,with undesired results ,that will force the developer to not support the effect in the hardware ,if the effect doesnt look good enough.

TIM sweeny: we expect to require 32-bit IEEE everywhere, and any hardware that doesn't support that will either suffer major quality loss or won't work at all.


thats why i say ,i Prefer to see the effects the way it was designed ,even if it is slow to not see it at all. there are ways to play with performance ,choosing a better CPU or tweaking REsolutions. or game settings ,but you cannot play with something that is not supported in your hardware.

vandersl
06-13-03, 04:35 PM
is not that it will not "show" ,its more than it will not work in the way was intented ,with undesired results ,that will force the developer to not support the effect in the hardware ,if the effect doesnt look good enough.

Oh come on, stop making things up. While there may be some effects that require FP32, they are few and far between. Unless you feel like playing a mandelbrot demo sometime soon.

I'm going to do something I hate - I'm going to give M$ the benefit of the doubt, and assume that they picked FP24 as a minimum for DX9 for a reason. That being that FP24 would provide enough precision for the useful life of DX9.

Are there some things that can be done with FP32 that can't be done properly with FP24? Probably, but who cares? Don't you think developers have enough new capability to play with over the next couple years to keep them busy? Heck, I'd be happy to have more than 3 games using PS2.0 by the end of the year. I'm not too worried that my games will suffer 'cuz developers just can't make something look good with FP24.

That said, it bothers me a bit that the NV3x series defaults to FP16 even without the _PP hint. Seems to violate the DX9 spec, and makes assuming even FP24 support impossible for developers.

It seems ATI is forcing developers to assume a FP24 precision limit, while NVidia is forcing developers to assume a FP16 precision limit. Which one sucks more?

-=DVS=-
06-13-03, 04:47 PM
Originally posted by vandersl
Oh come on, stop making things up. While there may be some effects that require FP32, they are few and far between. Unless you feel like playing a mandelbrot demo sometime soon.

I'm going to do something I hate - I'm going to give M$ the benefit of the doubt, and assume that they picked FP24 as a minimum for DX9 for a reason. That being that FP24 would provide enough precision for the useful life of DX9.

Are there some things that can be done with FP32 that can't be done properly with FP24? Probably, but who cares? Don't you think developers have enough new capability to play with over the next couple years to keep them busy? Heck, I'd be happy to have more than 3 games using PS2.0 by the end of the year. I'm not too worried that my games will suffer 'cuz developers just can't make something look good with FP24.

That said, it bothers me a bit that the NV3x series defaults to FP16 even without the _PP hint. Seems to violate the DX9 spec, and makes assuming even FP24 support impossible for developers.

It seems ATI is forcing developers to assume a FP24 precision limit, while NVidia is forcing developers to assume a FP16 precision limit. Which one sucks more?

Nvidia does offcourse , Just like 3DFX did with 16bit colors back in old days :rolleyes:

ChrisW
06-13-03, 04:57 PM
Need I remind you guys that the 9700 also has support for 128 bit precision or did you forget? It has 128 bit precision available everywhere except the frame buffer. Anyone remember the ATI light globe demo where they show off 128 bit color precision? And let's not forget the 9800 has 128 bit precision available everwhere.

Nv40
06-13-03, 05:00 PM
Originally posted by vandersl
It seems ATI is forcing developers to assume a FP24 precision limit, while NVidia is forcing developers to assume a FP16 precision limit. Which one sucks more?

the use of FP16 is for -today- games ,when more precision is not needed,that is incomming DIrectx7/Directx8 games with very few (if any) minimun use of PS2.0 .that is for the present and for the very near future.that means the second half 2003-and maybe first half 2004.(where no pure DIrectx9 game will show up). (some even think that will not happen until 2005).

by the time Fp24/FP32 (might) be really usefull that could be by the end of 2004 /2005 NVidia will have many generations of IEEE-32 pixelshaders cards ,with NV40/Nv45/and others NV4X... running at full speed .remember that game development takes years ,not days .and by the time Games need FP24/Fp32.. NVidia will have very fast cards in FP32 performance . the good news is that NV3x lineup already can use FP32 in PS partially without too much performance hit. so the tools are in the hands of developers. what sucks more for developers? having the tools or not having it? not having it ,of course .because performance is something developers and gamers can tweak , but you cannot tweak something is not supported in your hardware. ;)

vandersl
06-13-03, 05:26 PM
by the time Fp24/FP32 (might) be really usefull that could be by the end of 2004 /2005 NVidia will have many generations of IEEE-32 pixelshaders cards ,with NV40/Nv45/and others NV4X... running at full speed

By the same logic, so will ATI.

the good news is that NV3x lineup already can use FP32 in PS partially without too much performance hit. so the tools are in the hands of developers

Actually, I guess that's my real question - was the article mistaken in showing that the NV30,31,34 are currently not supporting FP32 in the drivers even when the _PP hint is not specified? It seemed like they were saying that all currently available NV3x cards, the drivers limit support to FP16, at least in DX. This sounds strange, can anyone confirm?

Luminescent
06-13-03, 05:42 PM
At least it has been confirmed, that NV35 can execute 8 general fp shader ops per clock (with a peak of 12, with fmovs) @ fp32 precision (of course the programmer is limited to 1-4 registers per operation if full performance is expected).

Of course the texture units cannot be used when NV35 is operating with all its shading units. If texture lookups are required, pixel shader ops are limited to 4 per clock (a peak of 8, with fmovs).

Hellbinder
06-13-03, 06:01 PM
Nv40.. I just cant believe that ONE developer used the IEEE-32 term and now you use it in nearly every single post. Dx9, OpenGL none of that exists anymore.. All that matters is some Arbitrary IEEE-32 standard that got tossed out by one develoer in one Q&A session... :rolleyes:
Im reposting this here... by Sireric ATi hardware engineer.

About some misconceptions, and some comments:

1) IE^3 standards do not specify what should be returned for transcendental functions (sqrt, sin, etc...). They specify the format of data (including nans, infinites, denorms) and the internal roundings for results -- This rounding is not the f2i conversion, but how to compute the lsbs of the results. Different HW can return different results. People have learned to live with this. If you need 24b of mantissa precision, FP32 is not enough for you anyway.

2) IE^3 does not guarantee, in any way, that operations are order independant. For example, consider:
result = a + b + c
The above, generally, needs to be broken down into 2 operations. If we select a = 1.0, b = -1.0 and c = 2^(-30), then the result can be 0 or c. Depends on the implementation. IE^3 does not "specify" anything. The programmer needs to specify what he wants (i.e. (a+b)+c).

2.5) IE^3 support for nans, inf and denorms is just not needed in PS. In the final conversion to [0,1.0] range, inf and 2.0 would give the same output. For that matter, it's not needed in VS either.

3) FP24 has less precision than FP32, but has no worst other characteristics (well, range is reduce by 2^63 as well). Order of operations would be no "worst" than fp32, beyond the precision limits.

4) What are the outputs of the shader? There are two: The 10b color or 11b texture address (actually, with subtexel precision for filters, you could expect the texture address to be up 15b). With FP24, you get 17b of denormalized precision, which would allow you to have up to 8 (assuming 1/2 lsb of error per operation) ALU instructions at maximum error rate before even noticing any change in the texture address or texture filtering. Until texture sizes increase significantly (2kx2k right now, will give you a 16MB texture -- I don't know of many apps that even use that now), or that texture filtering on FP texture exists, there really is no need for added precision. On the other hand, it's obvious that FP16 is not good enough, unless you have smallish textures (i.e. < 512x512).

5) The only exception to 4 above is for long procedural texture generation. In those cases you could expect that some fp24 limits would come into play. There's no real use of that out there right now, and we are probably years from seeing it in mainstream. Nevertheless, our Ashli program can take procedural texture generation code from Renderman, and generate long pixel shaders. We've generated shaders that are thousands of instructions long. What we found is that it looks perfect, compared to the original image. No one would ever complain about that quality -- It's actually quite amazing. So, empirically, we found that even in these cases, FP24 is a nearly perfect solution.

6) FP32 vs. FP24 would not only be 30% larger from a storage standpoint, the multipliers and adders would be nearly twice as big as well. That added area would of increased cost and given no quality benefits. We could of implemented a multi-pass approach to give higher precision, but we felt that simplicity was a much higher benefit. Given the NV3x PP fiasco, we even more strongly believe now that a single precision format is much better from a programming standpoint (can anyone get better than FP16 on NV3x?).

At the end of it all, we believe we did the right engineering decision. We weighed all the factors and picked the best solution. DX9 did not come before the R300 and specify FP24. We felt that FP24 was the "right" thing, and DX9 agreed.

gokickrocks
06-13-03, 06:06 PM
Originally posted by vandersl

Actually, I guess that's my real question - was the article mistaken in showing that the NV30,31,34 are currently not supporting FP32 in the drivers even when the _PP hint is not specified? It seemed like they were saying that all currently available NV3x cards, the drivers limit support to FP16, at least in DX. This sounds strange, can anyone confirm?

the digit-life article was saying that the nv30 line of cards were forcing the partial precision everywhere...however, it was fixed with the nv35

Nv40
06-13-03, 06:11 PM
the digit-life article was saying that the nv30 line of cards were forcing the partial precision everywhere...however, it was fixed with the nv35

that latest oficial detonators drivers were forcing FP16 in NV30-NV34 hardware under directx9. there are older drivers that set FP32
in NV3x cards.

a bit off topic..

anyone knows when DIrectx10 will be released ?

gokickrocks
06-13-03, 06:12 PM
im guessing a few months after longhorn, the next windows... so it will be a good long while

Nv40
06-13-03, 06:20 PM
Originally posted by gokickrocks
im guessing a few months after longhorn, the next windows... so it will be a good long while

mmmm.. interesting ,that means the end of 2004 or 2005..
so no DIrectx10 card until Nv50 or R500 at the least.

Hellbinder
06-13-03, 06:23 PM
yeah,

Also that the Nv34/31 seem to force FP16 in drivers no matter what not just the Nv30. And are not Dx9 complient even though nvidia is claiming the Detonators are *WHQL*. whell only the specific code that supports Nv35. And it is STILL evident that they are doing application and Shader detection even on the Nv35 to Force FP16 on a per application bases.

None of those cards have any hope whatsoever of running all those IEEE-32 applications supposedly comming in 12 months.. :rolleyes: Neither does the Nv35, as it cant to full FP32 percision and use any textures in the game at the same time..

So how is that Any differnet or worse than going FP24??? Especially after you read and *understand* the above quote from Sireric.

SurfMonkey
06-13-03, 06:30 PM
Originally posted by gokickrocks
i didnt see the reference device being used in the digit-life article...

unless you had the source code to change the D3DDEVTYPE to REF and recompiled the program...

or the D3DDEVTYPE had a fallback to the REF if your card wasnt dx9 compliant...

Umm, with the DX SDK you get access to the reference model. It's easier to force reference with a none DX9 card. What you are getting is the full standard as M$ encoded into DX9. Means it will be as slow as hell but one hundred percent accurate.

The nasty thing about this is that M$ let the drivers through the WHQL process even though they were breaking specification.

Who exactly is supposed to police this?

gokickrocks
06-13-03, 06:44 PM
Originally posted by SurfMonkey
Umm, with the DX SDK you get access to the reference model. It's easier to force reference with a none DX9 card.

how do you access that?

and yes i am new to the whole sdk usage

SurfMonkey
06-13-03, 06:57 PM
Originally posted by gokickrocks
how do you access that?

and yes i am new to the whole sdk usage

You get the REF model automatically when you install the SDK. In most apps you have to request it, but if your card doesn't support DX9 and you have the SDK it is automagically engaged, I think the same thing happens if you have the debug runtime to. But don't quote me on that ;)