PDA

View Full Version : FX and DX9 benchmarking vs ATi


Pages : [1] 2 3 4 5 6

dohcmark8
05-28-03, 03:09 PM
ATi cards only support 24bit code, nVIDIA supports 16 and 32 bit. Tell me, if you were playing UT2003 at 100fps, would you really notice the difference between FP16 and 32, you would have to pause and zoom in to notice it, so why bother using FP32 unless absolutely necessary.

I will wait until nVIDIA releases the final Det 50.XX aka FX drivers to make a final judgement, the current FX drivers are just patched together 40.XX series Detonators, i figure they would give a nice boost in PS/VS as thats all they have left to optimize on the FX series now.

I think ATi made the R3XX only support FP24 as a competitive advantage as they will always have faster DX9 scores with better IQ than nVIDIA unless nVIDIA used FP32 to match the IQ and in turn be much slower. (Ack... Canadians are smarter than we give them credit for?.....)

Read my words:
You cannot equally compare any DX9 score from a game or benchmark between an ATi or nVIDIA card since ATi will always be running at FP24 and nVIDIA will either be running at FP16 or FP32

Moose
05-28-03, 03:37 PM
The DX9 spec is 24bit precision.

Why either vcard chip maker chose what they did is anyone's guess but ATI's choice seems to be the best at this point.

digitalwanderer
05-28-03, 03:51 PM
Originally posted by dohcmark8
Read my words:
You cannot equally compare any DX9 score from a game or benchmark between an ATi or nVIDIA card since ATi will always be running at FP24 and nVIDIA will either be running at FP16 or FP32

:lol:

Read your own words, "compare any DX9 score", and realize that DX9 is FP24 so any DX9 bench/game will be running it at FP24 or better and you sort of shoot straight to the heart of nVidia's current woes.

:lol:

Typedef Enum
05-28-03, 04:00 PM
Yeah, I like how everybody likes to somehow blame ATI for using 24-bit...The spec _is_ the spec.

Here's the funny thing...That 32-bit component feature that's marketed by nVidia is the biggest farce, like so many other things marketed by nVidia...

But it's a farce for the simple fact that it will _never_ be a useable feature. When John Carmack says that the performance basically tanks when you run in this mode...you can pretty much be sure that it ain't going to get any faster than what he's able to squeeze out of it.

Of course, nVidia conveniently left all of this out of their marketing literaure when they launched the first FX...

I would be willing to bet that it will probably take the likes of NV50 before you wouldn't think twice about using anything but...

Uttar
05-28-03, 04:15 PM
Originally posted by Typedef Enum
I would be willing to bet that it will probably take the likes of NV50 before you wouldn't think twice about using anything but...

Not too sure about that. I've seen some people saying ( that is, in forums, nothing private ) that they think the NV40 will be fully FP32.

Actually, that makes sense: the VS has always been FP32. So if you want to do that, you obviously need very good FP32 performance. Thus, having a fully FP32 architecture seems required.

Although, yes, in the NV30, FP32 is pretty much a joke. You could use it a little, and only a little, and then it might give acceptable performance. But the cases where it's required are very rare in today's games anyway, and developers will rarely take the time to check exactly which instruction needs FP32.

It's much more of a NV3xGL feature IMO, I'm sure workstation users like it ( to what extent, I've got no idea. ) - but beside that, obviously...


Uttar

Disclaimer: The NV40 information in this post is mostly speculation. It is also based on old facts. I personally wouldn't suggest anyone to take it very reliably.

StealthHawk
05-28-03, 05:57 PM
Originally posted by dohcmark8
I will wait until nVIDIA releases the final Det 50.XX aka FX drivers to make a final judgement, the current FX drivers are just patched together 40.XX series Detonators, i figure they would give a nice boost in PS/VS as thats all they have left to optimize on the FX series now.

The magic 8 ball points to hardware issues regarding shader deficiencies and not driver issues.

Come on guys, we all heard this bull**** argument back with NV30. "Give them some time to get drivers optimized." nvidia already launched a refresh of NV30, and shader performance is still in the gutter(one nvidia's cheats are disabled). How much more time should we give them? They've had working silicon of NV30 for at least 9 months now, right? More than enough time to get driver issues worked out.

Sphinx
05-28-03, 06:21 PM
They've had working silicon of NV30 for at least 9 months now, right? More than enough time to get driver issues worked out.

Complete correct...

Hellbinder
05-28-03, 06:35 PM
The thing with Nv3x cards and FP32.

R350 has 8 FP24 units.

Nv3x has 4 FP32 units

The 4 FP32 units can execute 2 FP16 instructions per clock each. Thus in FP16 they are the logical Equivalent of ATi's 8 FP units.

{simplified}
At the exact same Core speed The R350 and Nv35 break down like this

in FP32 Nv35 is exactly 1/2 the Speed of R350.

in FP16 Nv35 is exactly the Same Speed as the R350.

{/simplified}

Thus the major differences in execution have nothing to do with one being 16/24/32 at all. But the sheer number of execution units coupled with the Core Speed.

That is why

Nv35 FP16 =logical 8 FP units= 450mhz core

R350 FP24 =8 actual FP units=380 MHz core

Thus in FP16 Nvidia will always be faster than ATi's R350. Thats the bottom line. In FP32 they have half the execution units but slightly higher clock speed which is why they are only about 2/3 the speed of the R350.

The Moral of the Story is Nvidia could have and Should have used 8 FP32 units but didnt. At least thats my personal take on it. This is why its pretty obvious that the Nv40 WILL have full FP32 support all they have to do is add 4 FP32 processsing units. In the same way ATi's next big one will likely also have 8 FP32 units. Thus they will in all likelyhood be identical and this whole issue a moot point.

Nutty
05-28-03, 07:07 PM
I think they should use 8 FP32 units, that can do 16 FP16 instructions. That would yield awesome speed, and is sufficiently precise for most stuff.

Hellbinder
05-28-03, 07:44 PM
I think they should use 8 FP32 units, that can do 16 FP16 instructions. That would yield awesome speed, and is sufficiently precise for most stuff.

Undoubtedy this is exactly what you will see in the Nv40.

However it does not change the fact that Dx9 minimum requirement is FP24. Thus any Driver fudging to force the apps to use 16 FP16 will be just as much um.. *cough*improper*cough* as what Nvidia is currently doing.

Thus what you will likely see is just some real Smooth 32bit stuff from both camps.

Hellbinder
05-28-03, 07:54 PM
Here is some more interesting findings,

http://216.239.37.104/translate_c?hl=en&langpair=fr%7Cen&u=http://www.hardware.fr/articles/468/page4.html&prev=/language_tools

Hardware.fr decided to use 2 other Dx9 benchmarks on the 5600 becuase of Nvidias Dislike of 3dmark. As you can see Nvidia in the 44.03 drivers are plaininly Using the Driver to Subvert all DX9 calls into either Fp16 or even FX12 (fixed Function).

Which is also the same thing that is happening in Doom-III with the Nv30 path and why the Nvidia cards are Faster. When they Run the ARB2 path they use FP32 and are 1/2 the speed of the Radeon cards. When they run the Nv30 path they use FP16 and are 10-15 FPS faster than Radeon cards. As you can see the same pattern of execution units and Core speed holds true for both Dx9 and OpenGL.

The falacy made by the Nvidia camp is that Its not Fair for Nvidia to have to use FP32 becuase its *harder* than FP24. Which I have shown you is completely Ridiculous. The Truth is Nvidia made the Design choice to go with 4 FP32 Units instead of 8. Just like they are not a true 8 pipeline card, just like they went with 128bit bus origionally. see the pattern?

Had Nvidia put 8 FP execution units in the Nv30/35 they would not be in the predicament they are now. And it has nothing to do with FP32 being more hardware intensive/Slower than FP24.

digitalwanderer
05-28-03, 08:32 PM
Originally posted by Hellbinder
Here is some more interesting findings,

http://216.239.37.104/translate_c?hl=en&langpair=fr%7Cen&u=http://www.hardware.fr/articles/468/page4.html&prev=/language_tools


Dude, that is SO worthy of a thread all it's own! :lol:

Great find Hellbinder! It looks like nVidia's cheesiness just knows no bounds, eh?

Ady
05-28-03, 08:33 PM
Thanks Hellbinder.. very interesting stuff.

jAkUp
05-28-03, 08:35 PM
i wouldnt call that article accurate... such as this:
http://www.hardware.fr/medias/photos_news/00/06/IMG0006347_1.jpg

they show that rendering bug... now that is the same exact bug i used to have with the first fx drivers... now that is fixed.
of course the drivers cheat and all... but that part is inaccurate.

Ady
05-28-03, 08:42 PM
This is an obvious example of low precision.
http://www.hardware.fr/medias/photos_news/00/06/IMG0006345.jpg

micron
05-28-03, 08:59 PM
That was a cool article HB put up. I think that with the excellent reviewers armed with excellent tools we have these days, Nvidia is hatin' life......

g__day
05-28-03, 11:36 PM
I am unsure whether NV40 and R400 will both equally support fp32 - its seems a big ask and way before its time unless you use it incredibly sparingly.

I do certainly expect NVidia to either add serious fp24 capability or else at least double or triple their fp32 execution units.

I would love someone to make more transparent how well NVidia and ATi drivers achieve frequent maximum parallelism in the excution units for modern shaders.

I think benchmarks should be clearer divided between today games / synthetics and those that test future positioning for PS2.0 / DX9 / OpenGL 2.0 etc utilities.

Deathlike2
05-29-03, 02:58 AM
Last I checked... UT2003 is a DX8.1 game.. and it wouldn't have anything to do with shader floating point... there have been shader support since DX8, DX9 is when that feature was a requirement... for a DX9 compliant card.... but I guess that's not important.. heh

I think it's hard enough to be able to realistically compare the difference between shader performance.. because it is dependant on the architecture.. of course we can try to judge quality in DX9-OpenGL (FP shader) games.... when they come..

If NVidia can make FX12 and FP16 close (or exact) to FP32 rendering.. that would be something.. we shouldn't exactly complain about that... it would be a good thing... as long as NVidia doesn't say they are rendering in FP32 natively....

NVidia should spend their time on optimizing their shader code... spend less time on 3DMark.. but whatever.. :P

The Radeon 9800 supports FP32 (I guess lower quality as well)... The Radeon 9700 will always render in FP24 though... this bit of info wasn't mentioned anywhere.. oh well

You have to remember.. DX9 is not the same as OpenGL... even if DX9 code must support FP24 or greater... OpenGL doesn't hold the same standard...

Carmack's statements clearly fit exactly what this whole thing is about...

When sites like Anandtech got a chance to benchmark the game (as it were at the time)... it was clear NVidia was faster BECAUSE they were running via the proprietary paths. Nothing regarding quality was mentioned though (or I may have missed that point).

I still won't trust "company prepared benchmarks" as NVidia tried to pull (as was mentioned in Anandtech's benchmark session of Doom3)

My two cents.

Chalnoth
05-29-03, 04:30 AM
Originally posted by Typedef Enum
Yeah, I like how everybody likes to somehow blame ATI for using 24-bit...The spec _is_ the spec.
There's also an option for partial precision (16-bit fp), which will work great for the FX 5900. I still think they should also offer integer precision, though (for the rest of the FX line).

Here's the funny thing...That 32-bit component feature that's marketed by nVidia is the biggest farce, like so many other things marketed by nVidia...
Um, no. 32-bit precision can quite easily be used on the FX. It just has to be used sparingly. Approximately 1/3 of the instructions can use 32-bit precision for free. It may provide for more accurate dependent texture operations, in particular.

From a programming perspective, I really do not see why so many companies seem against offering different precision computing. It's been done on CPU's for ages! The potential increase in performance from offering reduced precision formats is just too great to ignore.

And it especially makes sense to offer lower-precision computing when the output is lower precision than any of the computing modes!

Chalnoth
05-29-03, 04:31 AM
Originally posted by Deathlike2
If NVidia can make FX12 and FP16 close (or exact) to FP32 rendering.. that would be something.. we shouldn't exactly complain about that... it would be a good thing... as long as NVidia doesn't say they are rendering in FP32 natively....
It's not about nVidia making them look the same. Some instructions will need to be run at 32-bit for optimal quality. Some will only need 16-bit fp. Others will work great at just 12-bit int. It should be up to the developer to decide which instructions in his/her shader will need what precision, and the FX architecture will work best when a mix of different precisions is used in each shader.

But Microsoft doesn't seem to like any PS 2.0 code using integer precision.

AnteP
05-29-03, 04:45 AM
Originally posted by Chalnoth
It's not about nVidia making them look the same. Some instructions will need to be run at 32-bit for optimal quality. Some will only need 16-bit fp. Others will work great at just 12-bit int. It should be up to the developer to decide which instructions in his/her shader will need what precision, and the FX architecture will work best when a mix of different precisions is used in each shader.

But Microsoft doesn't seem to like any PS 2.0 code using integer precision.

What's the point of fooling around using different precision everywhere when you could just run it all with FP24 at better performance in any case hehehehe

g__day
05-29-03, 04:49 AM
Hi Chanloth - Fp32 can be used - just very sparingly - exactly - its great I understand for reflective surfaces, oily films on water, waves - heat refractions in air - materials that have very complex light reflection and refraction models (for the four types of mathematically modelled light).

In beyond3d it was mentioned 1) Dawn demo uses fp32 mainly for Dawn's eyes and 2) FP32 on NV35 can operate at between half to 2/3 the performance of fp24 on R350.

In summary NV3x is a brillant DX 8.1 and earlier architecture, whilst R3x0 and NV3x are both entry cards for DX9. By the time DX9 games are common - so too will be NV45 and R450 - so don't get too hung up on NVidia's and ATi's shortfalls on its leading cards today - they'll be < $100 cards by the time DX9 games are common.

Uttar
05-29-03, 04:50 AM
Hellbinder: No.
Your understanding of the NV3x architecture is completely incorrect.

The NV30 is 4 FP32 units who can also do texturing work ( speculated that it uses ddx/ddy functionality for that )

The difference between FP32 and FP16 is *ONLY* a register usage one.
That means using FP32 instructions with FP16 registers ( yes, that's useless... ) is completely free, and using FP16 instructions in FP32 registers is as slow as FP32 in FP32 registers.

The NV35, when using four "half" registers, is capable, in Vec4 situations without texture, of being 50% faster per-clock than the R350. Of course, when there's scalars, the R350 get a big boost, and when there's textures and no scalars, they're theorically equal per-clock.

However, if you use more than four "half" registers or two "float" registers, the NV3x gets slower with every additional register.

It's all very icky stuff, which, in summary, mean that beside for FP16 shaders taking great care about reusing registers as much as possible, the NV35 is slower per-clock than the R350.


Uttar

DSC
05-29-03, 05:03 AM
Man, why did Nvidia choose such a ****ed up, overly obsfucated and complicated as hell architechture in the first place..... :eek: :confused:

Seems to me that ATI chose all the right choices, FP24, 256bit mem bus and full 8 pipelines, while Nvidia did all the bad ones, FX12/FP16/FP32, 128bit mem bus(NV30)/256bit mem bus much later(NV35), 4 pipelines.... :eek:

Very disappointing, Nvidia. It seems that US$400mill just went down the drain for nothing. No sales at all for NV30, and NV35 doesn't stack up well to the R350.....

edit by StealthHawk: don't circumvent the swear filter.

Hanners
05-29-03, 07:48 AM
Originally posted by Deathlike2
The Radeon 9800 supports FP32 (I guess lower quality as well)... The Radeon 9700 will always render in FP24 though... this bit of info wasn't mentioned anywhere.. oh well

That's not correct - R350 supports exactly the same precisions as R300, no more and no less.