PDA

View Full Version : FP16 Vs. FP24 VS FP32 a Complex issue made "simpler"...


Pages : [1] 2

Hellbinder
10-03-03, 02:10 AM
...By our Friend OpenGL Guy. One of the well known D3D Driver Gurus at ATi.

He posted this tonight in a response to something at Rage3D.

Let me clarify some of this. There are two main components to a floating point number, the mantissa (precision) and the exponent (range). In some cases, one is more important than the other (i.e. you want more precision when sampling textures, you want more exponent when demonstrating high dynamic range).

FP16 = s10e5, which means the mantissa is 10 bits plus 1 implied bit = 11 bits (the "s" represents the sign bit). The exponent is 5 bits, which means your maximum exponent is 15.

Since your maximum exponent is 15, that means the largest difference you can have between your dimmest areas and your brightest areas is only a factor of 2^15 = 32768. (I am neglecting non-normalized floating point values as those are uncommon.) Since your mantissa is 11 bits (note that this is lower than the precision of FX12), you can sample 2048 different positions from a texture map, but note that you'd want about 4-5 bits of subpixel precision, so you end up only being able to accurately sample a 256x256 texture.

FP24 = s16e7 = 17 mantissa bits, 7 exponent bits. This means your largest exponent is 63.

With such a large exponent, you can acheive differences in brightness of over 9.2 * 10^18! With 17 bits of mantissa, you can sample 2048x2048 textures and still have 4-5 bits of subpixel precision left over. Very well balanced.

FP32 = s23e8 = 24 mantissa bits, 8 exponent bits. This means your largest exponent is 127.

Now your range goes up to 1.7 * 10^38. Pretty much overkill compared to FP24. With 24 bits of mantissa, you could sample a 524288x524288 texture with 4-5 bits of subpixel precision. Again, overkill.

This doesn't mean that FP32 is bad. What it means is that FP32 is not likely to offer noticable improvements over FP24 on today's shaders. The jump from FP16 to FP24 is very large, because you are getting a large jump in precision and range. The difference from FP24 to FP32 is mostly in the precision... which was already sufficient to begin with.

This is it fellas, does not get any more straight up than this. Lets forget who offers what for a second and just look at..

1. The Straight Math of it all.

2. What games are actually using today and into the Future a couple years.

Looking at the facts of the math and the game engines out there.. what is the best solution?

(Lets not forget Ram limitations)

synk
10-03-03, 02:38 AM
Almost all games today are using some fixed. The newer ones coming out are starting to take advantage of the floating point formats. I think over the next 2 years you'll see FP16 and FP24 star to die out. Everything's moving towards FP32.

TheTaz
10-03-03, 02:39 AM
Makes sense to me.

Why the hell anyone would want to sample, let alone create, a 524288x524288 texture is beyond me! :D

Taz

Ninja Prime
10-03-03, 02:55 AM
I was trying to figure that out earlier, but I couldn't get the numbers to work correctly. The way it looks, there really doesn't seem to be any reason for higher than fp24 for a long time. Perhaps by late 2005 when we have some 1 gigabyte cards out with 80+GB/s bandwidth, but even then, I'm not sure ther difference will be noticeable.

Hellbinder
10-03-03, 03:03 AM
one thing this does do. Its explains pretty Quickly why some of the mixed Mode stuff we see does not have Quite the "Sheen" or whatever seen on Ati cards.

Its simply a case where FP16 sounds good as an idea but just does not offer enough raw "bits" to be a desireable solution. No wonder some of the mixed mode 3dmar03 and Tomb raider look less defined and more "blurry". If you are forced to sample at 256x256 to get the results within the limitations of the hardware... while the application itself may be using a 512x256 or 768x512 or larger textures... Something simply has to Give.

Where in contrast we are not even using 2048x2048 textures in any games yet... are we?? Im thinking 1024x1024 is about the largest.

Kruno
10-03-03, 03:08 AM
I assumed that everyone already knew that FP32 is overkill on todays shaders?

I'm quite sure this was discussed before at Beyond3D.

I remember something along the lines of Chalnoth throwing math around during a FP32 vs FP24 debate and then trying to prove that in future games that things like shadow effects and transperency effects will look more "precise" with FP32 over FP24.

I really should go and re-read the main parts.

I also believe there was a thread here at nVNews (or maybe Beyond3D, I keep forgetting) showing TR: TOA have more precise shadows when using FP32 over FP24, of course this was shown by zooming up 600%-800% on the image.

That was a good example of the difference between FP24/32. In future it will be imperative that FP32 becomes the standard.
Say around the time of DX10-DX11?

Hellbinder
10-03-03, 03:13 AM
yeah.. that did happen a while back at B3D.

There simply has been a lot of talk and ideas tossed around here on this issue the last few days. It seemed very timely that OpenGL Guy just made this nice concise post on the issue at Rage3D.

Seems Very Appropo to put it up here at this time for Discussion.

Hellbinder
10-03-03, 03:15 AM
I also believe there was a thread here at nVNews (or maybe Beyond3D, I keep forgetting) showing TR: TOA have more precise shadows when using FP32 over FP24, of course this was shown by zooming up 600%-800% on the image.

That was a good example of the difference between FP24/32. In future it will be imperative that FP32 becomes the standard.
Say around the time of DX10-DX11?

I dont remember this event. I recall there being minimal differences, but did not seem to be related to FP processing at all.

Got a link for review?

Kruno
10-03-03, 03:20 AM
I dont remember this event. I recall there being minimal differences, but did not seem to be related to FP processing at all.


Didn't it?
I believe Humus/Hanners posted it to show the difference.

I will try and find the thread.

Also the fact there was minimal difference proves that:

FP32 won't matter on todays shaders

and...

FP32 will make a larger impact in the future

http://www.nvnews.net/vbulletin/showthread.php?s=&threadid=16411&highlight=tomb+raider

StealthHawk
10-03-03, 05:38 AM
The reference rasterizer is going to produce different results than any hardware. I'd like to see a comparison between NVIDIA cards using FP32 and ATI cards using FP24 with the shadow in the same place.

I can see some color differences in the reference shot, but that may be due to the different position.

Kruno
10-03-03, 06:07 AM
Originally posted by StealthHawk
The reference rasterizer is going to produce different results than any hardware. I'd like to see a comparison between NVIDIA cards using FP32 and ATI cards using FP24 with the shadow in the same place.

I can see some color differences in the reference shot, but that may be due to the different position.

I guess I'm going to have to take back the comment I made about that being a good example. :)

I'm sure you guys know what I'm talking about when I say that FP32 will be needed in the future.

Greg
10-03-03, 07:53 AM
That floating point precision can represent all kinds of things, not just texture coordinates. It could be color, local or global 3d space cordinates, or some intermediate calculation.

I was hoping the people here would gain from this thread that 16bit float is adequate most of the time, and on the occasion it isn't, 32bit is available. These precisions are mixable within the one shader. nVidia never intended 32bit float to be used all the time, instead the 2x faster 16bit was intended for universal use.

Unforunately due to this very issue, people can benchmark competing brands next to each other, like apples and oranges. Yes, full 32bit will always run 2x slower than 16bit, that is by design, but why on earth would you want to do that? Because up until recently, DirectX9 didn't allow the types to be defined and it was upto video drivers to substitute specific shaders where appropriate. Then there is an argument, that by not using full precision, a video card is suddenly not running in 'full DirectX9 mode', as if a developer must use all the features of a new version in order to get appreciated.

Do you know what the next DirectX version will be? Either DirectX9c or DirectX9.1, because the overall API has matured. Future releases will coincide with future hardware, and exposing of existing features that may be available.... So in a few months, you'll hear the argument, 'Your video card is so DirectX9a, mine is FULL DirectX9c dude!' The next step is Vertex and Pixel shader V3 (sample textures from vertex shader), and after that, primitive programs (create new triangles to feed the other shaders).

synk
10-03-03, 08:44 AM
FP32 is already the standard in DX9 ps_3_0:

"ps_2_0
=====
16-bit floating point precision is minimally required for partial precision
(with _pp hint).
24-bit floating point precision is minimally required for full precision
(without _pp hint).

ps_3_0
=====
16-bit floating point precision is minimally required for partial precision
(with _pp hint).
32-bit floating point precision is minimally required for full precision
(without _pp hint).

Anything other than above is against the spec."

That's coming straight from MS' D3D team.

Originally posted by Kruno
That was a good example of the difference between FP24/32. In future it will be imperative that FP32 becomes the standard.
Say around the time of DX10-DX11?

NKVD2
10-03-03, 08:50 AM
Originally posted by Hellbinder
Where in contrast we are not even using 2048x2048 textures in any games yet... are we?? Im thinking 1024x1024 is about the largest.

Operation Flashpoint uses up to 4096...

This is strange, NVIDIA cards options go up to 4096 (see pic below, took it from GF3-ti200), and ATI cards are only go up to 2048....

http://www.nvnews.net/vbulletin/attachment.php?s=&postid=206768

NKVD2
10-03-03, 08:53 AM
and here is ATI....

http://www.nvnews.net/vbulletin/attachment.php?s=&postid=206769

here is one guy send me this info, he has ATI card also ....dunno what this means though

$0000000000 Description : RADEON 9800
$0000000001 Vendor ID : 1002 (ATI)
$0000000002 Device ID : 4148
$0000000003 Location : bus 2, device 0, function 0
$0000000004 Bus type : AGP revision 3.0
$0000000005 AGP status : enabled
$0000000006 AGP rate : 4x 8x supported, 8x selected

$0800000016 MaxTextureWidth : 2048
$0800000017 MaxTextureHeight : 2048
$0800000018 MaxVolumeExtent : 1024
$0800000019 MaxTextureRepeat : 2048
$080000001a MaxTextureAspectRatio : 2048


I had both FX and ATI card when playing this game and haven't noticed difference, only on ATI cards it looks better, imho...And AA is the best on Radeon....

sxotty
10-03-03, 08:55 AM
Hellbinder why is he a d3d guy? I always assumed with his name he was an openGL guy?

Remi
10-03-03, 09:04 AM
Just a note: Please don't forget that registers aren't used only for texture coordinates, far from it! The math discussion posted focus on only one aspect.

NKVD2
10-03-03, 10:25 AM
Ok, I got the answer from the developer of game - http://www.flashpoint1985.com/cgi-bin/ikonboard311/ikonboard.cgi?;act=ST;f=3;t=34241

Looks like ATI's limitation is to 2048 but textures of that size are not even used in new or older games.....

synk
10-03-03, 10:40 AM
uncompressed fp 4k x 4k ~213MB
uncompressed fp 2k x 2k ~53MB
uncompressed fp 1k x 1k ~13MB

Greg
10-03-03, 11:07 AM
Originally posted by NKVD2
Operation Flashpoint uses up to 4096...

This is strange, NVIDIA cards options go up to 4096 (see pic below, took it from GF3-ti200), and ATI cards are only go up to 2048....


There was some thread about this at the official OFP forum. I don't remember the reason. The actual game doesn't ship with > 512x512 textures, but some of the mods use up to 1024 or 2048, such as the new skys. Anyway, that figure has nothing to do with PS float precision. Most likely its just part of the driver reported capabilities.

TheTaz
10-03-03, 11:18 AM
Originally posted by Remi
Just a note: Please don't forget that registers aren't used only for texture coordinates, far from it! The math discussion posted focus on only one aspect.

Agreed.

Nobody seemed to "get" my smart@ss comment... oh well.

However, the issue is shader performance and shader quality... not geometry processing and other calculations. For shaders, it's fairly obvious that FP32 is overkill, and too much of a performance hit.

For now, it seems FP24 is a better option / balance. And I still say that since it's a STANDARD in DX9, FP32 is uneeded until PS3.0. Until the technology is out there to drive shaders in FP32 fast enough... FP32 will PROBABLY always be an unecessary performance hit compared to FP24.

Taz

Hellbinder
10-03-03, 11:20 AM
Originally posted by StealthHawk
The reference rasterizer is going to produce different results than any hardware. I'd like to see a comparison between NVIDIA cards using FP32 and ATI cards using FP24 with the shadow in the same place.

I can see some color differences in the reference shot, but that may be due to the different position.
Exactly...

SlyBoots
10-03-03, 01:35 PM
Originally posted by synk
FP32 is already the standard in DX9 ps_3_0:

"ps_2_0
=====
16-bit floating point precision is minimally required for partial precision
(with _pp hint).
24-bit floating point precision is minimally required for full precision
(without _pp hint).

ps_3_0
=====
16-bit floating point precision is minimally required for partial precision
(with _pp hint).
32-bit floating point precision is minimally required for full precision
(without _pp hint).

Anything other than above is against the spec."

That's coming straight from MS' D3D team.


A quote from Amar Patal from MS; note his comment on the ps_3_0 spec

"Here's a cut&paste from our spec, with the one typo in it corrected
(noted with **).

[from ps_2_0 section]
---Begin Paste---
Internal Precision
- All hardware that support PS2.0 needs to set
D3DPTEXTURECAPS_TEXREPEATNOTSCALEDBYSIZE.
- MaxTextureRepeat is required to be at least (-128, +128).
- Implementations vary precision automatically based on precision of
inputs to a given op for optimal performance.
- For ps_2_0 compliance, the minimum level of internal precision for
temporary registers (r#) is s16e7** (this was incorrectly s10e5 in spec)
- The minimum internal precision level for constants (c#) is s10e5.
- The minimum internal precision level for input texture coordinates
(t#) is s16e7.
- Diffuse and specular (v#) are only required to support [0-1] range,
and high-precision is not required. ---End Paste ---

For ps_3_0 the requirements are the same, however interpolated input
registers are now defined by semantic names. Inputs here behave like t#
registers in ps_2_0: they default to s16e7 unless _pp is specified
(s10e5).

Note that specifying _pp on an input register only affects how they are
read into temp registers or what precision ALU math might run on an op
reading an input as a parameter. However texld* instructions that take
in unmodified texture coordinates will not be affected by the _pp
modifier, as the texture coordinate iterators are of fixed precision.


amar

Hellbinder
10-03-03, 01:48 PM
Nice post sly...

As I have pointed out here before FP24 is still the minimum or "Common standard" for PS3.0.

DivotMaker
10-03-03, 02:06 PM
Originally posted by Hellbinder
Its simply a case where FP16 sounds good as an idea but just does not offer enough raw "bits" to be a desireable solution. No wonder some of the mixed mode 3dmar03 and Tomb raider look less defined and more "blurry".

Based on everything stated above, FX12 must be awful then....