PDA

View Full Version : Current generation NVIDIA DX 9 performance


Pages : [1] 2

jimmyjames123
10-15-03, 09:46 PM
So, what exactly do you guys think is the biggest bottleneck in DirectX 9.0 games with the current generation NVIDIA FX cards?

Lack of support for the DirectX 9.0 minimum standard FP24 precision?
Lack of Pipelines and/or Vertex Shader Engines?

Also, what is preventing them from adding Pipelines to a card like the FX 5900 Ultra?

Finally, what would be the advantage of going with 8x2 vs 12x1 for the future generation, and vice versa?

Sazar
10-15-03, 09:57 PM
Originally posted by jimmyjames123
So, what exactly do you guys think is the biggest bottleneck in DirectX 9.0 games with the current generation NVIDIA FX cards?

Lack of support for the DirectX 9.0 minimum standard FP24 precision?
Lack of Pipelines and/or Vertex Shader Engines?

Also, what is preventing them from adding Pipelines to a card like the FX 5900 Ultra?

Finally, what would be the advantage of going with 8x2 vs 12x1 for the future generation, and vice versa?

I think its the architecture more so than drivers per se... though I am positive the shader replacement and usage of more pp in rendering will make it easier to produce decent IQ with decent speed...

concerning adding pipelines... that is not so easy to do... you have to understand... nvidia spent a crap load of money on the current lineup and they share features throughout... changing the architecture from 4x2 to 8x1 may not be as easy as it sounds...

8x2/12x1 == more pixels processed in one pass therefore better efficiency/performance...

as for which is better.. more vertex processors as opposed to pipelines... I'll let the bigwigs answer that.. I don't have the savvy to :)

GlowStick
10-15-03, 11:05 PM
Well it seems that shaders are a big holdup, and that seems to be a popular thing to do with dx9, even though dx9 has diffrent stuff too.

anyhoo it seems that to do shader stuff it cant do some other operation at the same time, someone mentioned if i recall.

ragejg
10-15-03, 11:07 PM
y'know, I'd kinda like to know what else besides shaders DX9 has to offer vs it's predesessors... Can anyone give a rundown?

StealthHawk
10-16-03, 12:30 AM
Originally posted by jimmyjames123
So, what exactly do you guys think is the biggest bottleneck in DirectX 9.0 games with the current generation NVIDIA FX cards?

Lack of support for the DirectX 9.0 minimum standard FP24 precision?
Lack of Pipelines and/or Vertex Shader Engines?

How about none of the above? I vote for the register use penalty.

Also, what is preventing them from adding Pipelines to a card like the FX 5900 Ultra?

Design costs, R&D, time, efforts better spent elsewhere, etc.

Finally, what would be the advantage of going with 8x2 vs 12x1 for the future generation, and vice versa?

Different memory bandwidth requirements, different single texturing speed, different multitexturing speed, different transistor counts, et al.

Yes. I know this post is vague, but....I already made a huge huge post. See Detonator Drivers forum, you'll know it when you see it :)

Hanners
10-16-03, 02:50 AM
Originally posted by StealthHawk
How about none of the above? I vote for the register use penalty.

Agreed. Register usage, together with the simple fact that current ATi cards have more shader units pretty much accounts for the performance differences.

It looks like the Detonator 50s should go quite some way to helping with the register usage, but they'll still be trailing for raw shader power.

-=DVS=-
10-16-03, 03:37 AM
Id go with doubleing the core of the successor R300/350 core is .15 8x1 256bus also includeing internal junk :p and performs very well in DX9 games , doubleing it to .11 or .13 16x1 512bus and we would get uber card :D

Uttar
10-16-03, 10:16 AM
Currently exposed functionality, may change slightly in future drivers for BOTH ATI and NVIDIA.

R300/R350: 8xVec3 + 8xScalar + 8xTextures, all FP24
---
NV30: ( 4xVec4(FP32) OR 4xCOS/SIN(FP32,lookup tables shared between VS & PS) OR 8xTEX ) + ( 8xMAD(FX12) OR 16xMUL(FX12) when independant )
---
NV35: ( 4xVec4(FP32) OR 4xCOS/SIN(FP32,lookup tables shared between VS & PS) OR 8xTEX ) + ( 8xADD/MUL(FP32) OR 4xMAD(FP32) OR 4xMAD(FP16) )

NVIDIA however has register usage penalties, which in extreme cases, can result in 10x worse performance. The effect of FP16 on register usage is halfing the number of used registers ( or I should rather say the opposite: it's really FP16 registers uniting to become FP32 ones ).
Halfing the number of used registers does NOT result in doubled performance. It can result in much less, or much more.

The more complex the PS program, the more IQ will be degraded by using FP16 instead of FP32, and the more registers will be required. This means the more pixel shader intensive a game is, the most the gap will widen.


Hopefully that'll help you all understand the two architectures a bit better :)


Uttar

Hellbinder
10-16-03, 10:48 AM
Originally posted by Uttar
Currently exposed functionality, may change slightly in future drivers for BOTH ATI and NVIDIA.

R300/R350: 8xVec3 + 8xScalar + 8xTextures, all FP24
---
NV30: ( 4xVec4(FP32) OR 4xCOS/SIN(FP32,lookup tables shared between VS & PS) OR 8xTEX ) + ( 8xMAD(FX12) OR 16xMUL(FX12) when independant )
---
NV35: ( 4xVec4(FP32) OR 4xCOS/SIN(FP32,lookup tables shared between VS & PS) OR 8xTEX ) + ( 8xADD/MUL(FP32) OR 4xMAD(FP32) OR 4xMAD(FP16) )

NVIDIA however has register usage penalties, which in extreme cases, can result in 10x worse performance. The effect of FP16 on register usage is halfing the number of used registers ( or I should rather say the opposite: it's really FP16 registers uniting to become FP32 ones ).
Halfing the number of used registers does NOT result in doubled performance. It can result in much less, or much more.

The more complex the PS program, the more IQ will be degraded by using FP16 instead of FP32, and the more registers will be required. This means the more pixel shader intensive a game is, the most the gap will widen.


Hopefully that'll help you all understand the two architectures a bit better :)


Uttar
Actually that is True for the Nv35 only.. (well partially true for all the FX)

On all FX hardware other than the Nv35 Nvidia is Replacing or reducing shaders/Intructions all the way down to to FX12. Not even Floating point. Completely out of DX9 back into something just past Dx8. The Nv40 is going to be similar apparently with Fx16 support.

Remi
10-16-03, 02:46 PM
Originally posted by Uttar
The more complex the PS program, the more IQ will be degraded by using FP16 instead of FP32, and the more registers will be required.
Just one quick note to say that I have a 500+ instructions shader (done for interactive framerates, not realtime) that is 88% FP16/FX12 and 12% FP32. The resulting image, when compared to one done with full FP32, differs (difference between pixel values) in less than 0.5% with a variance of less than 0.5% too. In other words: even with the two pictures side by side and all the time you want, you'll have a hard time to try to spot any difference...

FP16 hold waters quite well. :)

gmontem
10-16-03, 03:09 PM
Originally posted by Remi
Just one quick note to say that I have a 500+ instructions shader (done for interactive framerates, not realtime) that is 88% FP16/FX12 and 12% FP32. The resulting image, when compared to one done with full FP32, differs (difference between pixel values) in less than 0.5% with a variance of less than 0.5% too. In other words: even with the two pictures side by side and all the time you want, you'll have a hard time to try to spot any difference...

FP16 hold waters quite well. :)
And what about a 500+ instructions shader where half of 'em is FP32, and another where over 75% of 'em is FP32? Most researchers will try to experiment with different numbers before making a conclusion. :p

Remi
10-16-03, 03:33 PM
>And what about a 500+ instructions shader where half of 'em is FP32, and another where over 75% of 'em is FP32? Most researchers will try to experiment with different numbers before making a conclusion. :p

1. What makes you think I haven't?
2. What does that have to do with the fact that when the shader is programed correctly, the error propagation with FP16 is small enough to be perfectly acceptable?

Hellbinder
10-16-03, 04:32 PM
Originally posted by Remi
Just one quick note to say that I have a 500+ instructions shader (done for interactive framerates, not realtime) that is 88% FP16/FX12 and 12% FP32. The resulting image, when compared to one done with full FP32, differs (difference between pixel values) in less than 0.5% with a variance of less than 0.5% too. In other words: even with the two pictures side by side and all the time you want, you'll have a hard time to try to spot any difference...

FP16 hold waters quite well. :)
Not if you actually use High Dynamic Range Color values.

Remi
10-16-03, 05:21 PM
>Not if you actually use High Dynamic Range Color values.

First, it's all a matter of range. All depends of the range.

Second, it also depends on how "simply minded" is your implementation. Did you knew there are actual HDRI viewers implemented on 8 bits hardware?

Third, and more importantly... I don't see that a lot of people are using it today, at least in games?

Thanks however for having avoided the classical "but you can't do a nice mandelbrot set!"... Natural shading isn't precisely well known for its chaotic nature, so that's of little relevance.

But for the current mainstream of game prods... FP16 seems very relevant to me.

gmontem
10-16-03, 05:51 PM
Originally posted by Remi
>And what about a 500+ instructions shader where half of 'em is FP32, and another where over 75% of 'em is FP32? Most researchers will try to experiment with different numbers before making a conclusion. :p

1. What makes you think I haven't?
Because you failed to have mentioned doing something of that nature? And if you did, why did you not bother mentioning it, or perhaps it did not support the statement you wanted to make?

hithere
10-16-03, 06:18 PM
Originally posted by Remi
>Not if you actually use High Dynamic Range Color values.

First, it's all a matter of range. All depends of the range.

Second, it also depends on how "simply minded" is your implementation. Did you knew there are actual HDRI viewers implemented on 8 bits hardware?

Third, and more importantly... I don't see that a lot of people are using it today, at least in games?

Thanks however for having avoided the classical "but you can't do a nice mandelbrot set!"... Natural shading isn't precisely well known for its chaotic nature, so that's of little relevance.

But for the current mainstream of game prods... FP16 seems very relevant to me.

Well, then why aren't developers illustrating this advantage? Seems to me that most games and benchmarks involving shaders show ATI with a clear lead...barring a massive conspiracy, what, then, is the problem?

theultimo
10-16-03, 06:26 PM
HDR Rendering needs FP24 bit or higher for the colors used. This is why the DX9 Spec is FP24-up, with hacked FP16 built in.

Remi
10-16-03, 06:58 PM
Originally posted by gmontem
Because you failed to have mentioned doing something of that nature? And if you did, why did you not bother mentioning it, or perhaps it did not support the statement you wanted to make?

Well, if you take a look at my date of registration on this forum and at the number of posts I made, you'll see that I don't post often. First, because I rarely have the time to, and second, because to use hours to argue on the net isn't precisely on my priority list, far from it. That's why my post started by...

"Just one quick note".

Sure, I could take my notes, write a nice detailled analysis, make a nice report with cool excel charts, and post it here... I don't really see that happening soon however. For a very simple reason.

Here each year with the arrival of spring there's one night when everyone making music is invited to do so freely in the streets. There's a lot of little bands, concerts, etc. I once asked to one friend of mine, a professional bass player, if he intended to do something. He looked at me with incredulity, then said slowly "Nope. Not the slightest thing. I'm a pro, which means I have to live from what I do. All these good people in the street, will they help me to live? Not at all, all they want is free concerts, they don't care at all for me. Why should I do my job for free for them? Do you work for free for people you don't even know?"

Would you work for free for people you don't know, frankly? Probably not.

I'm not different. Sorry if that disapoint you!

Have a good day! :)

Remi
10-16-03, 07:08 PM
Originally posted by hithere
Well, then why aren't developers illustrating this advantage? Seems to me that most games and benchmarks involving shaders show ATI with a clear lead...barring a massive conspiracy, what, then, is the problem?
My bad, I should have anticipated that and added something in my post.

I'm not saying that nvidia's cards are faster than ATI's.

All I'm saying is that the FP24 vs FP32 debate isn't that relevant because today's DX9 shaders can be done with little trouble with about 90% of FP16, with a truly insignificant and unnoticeable IQ drop.

NickSpolec
10-16-03, 07:10 PM
Nope. Not the slightest thing. I'm a pro, which means I have to live from what I do. All these good people in the street, will they help me to live? Not at all, all they want is free concerts, they don't care at all for me. Why should I do my job for free for them? Do you work for free for people you don't even know?"

Would you work for free for people you don't know, frankly? Probably not.


Hope I don't become that jaded.

I have been working on a RPG for the PC for about 5 years (2 years of that being real, focused work). I've been doing everything myself --- Design, art, music, story, interface, logic...

What happens when I finally finish it (mayeb sometime in the next year and half :P)? I'll release it for free on the net...

Remi
10-16-03, 07:26 PM
Originally posted by theultimo
HDR Rendering needs FP24 bit or higher for the colors used. This is why the DX9 Spec is FP24-up, with hacked FP16 built in.

Well, as I said already, you can render HDRI on 8 bits hardware. But yes, of course, that takes more work than on FP24 or FP32 hardware...

I haven't worked much on HDRI (not yet), but again, to face this we have three good allies.

The first one (I'll repeat myself, sorry) is that all is a matter of range. You don't need anything to be full precision because your colors are high precision...

The second is scaling. It can be delicate because of the roundings, but it should be abble to help to lower the number of full precision registers needed.

The third one is that... it's graphics, and a small (I emphasis small) loss of precision is perfectly acceptable.

With all three combined, the situation looks really less bad that just thinking "my full precision number won't fit in a FP16 register..."

But yes, in the case of HDRI, it's of course more delicate... The ease of coding without thinking about it that full precision allows is rather comfortable.

But to have comfort and to make a good work are two distinct things...

StealthHawk
10-16-03, 07:37 PM
What does the "I" stand for in HDRI?

Remi
10-16-03, 07:43 PM
Originally posted by NickSpolec
Hope I don't become that jaded.

I have been working on a RPG for the PC for about 5 years (2 years of that being real, focused work). I've been doing everything myself --- Design, art, music, story, interface, logic...

What happens when I finally finish it (mayeb sometime in the next year and half :P)? I'll release it for free on the net...

Cool!

That must be a very good experience.

You just did what you wanted to do, and frankly I'm happy for you that you can do that. And it's totally possible that you have reached a professional level while doing it.

But... to do something for another reason than to live from it doesn't qualify as a professional job... Which, again, doesn't mean you don't have the qualities for it! It's just not the same goal.

I also work for free from time to time, but for that I give priority to my relatives and a few old friends... and of course that have the consequence that nothing remains for the other ones. Ah well... Nothing (and nobody) is perfect I'm afraid! :)

Have a good day! And happy design / coding / painting / composition / writing / etc... :)

Remi
10-16-03, 07:49 PM
Originally posted by StealthHawk
What does the "I" stand for in HDRI?
HDRI stands for High Definition Range Images.

In 3D synthesis, it's used mainly with IBL, image based lighting. Paul Debevec had done a lot of good things with them... But you probably know all that already. :)

It's already close to 3am here, so if nobody complains about it, I think I'm going to get some sleep now...

See you later!

Remi
10-17-03, 10:09 AM
To go back to the topic... Originally posted by jimmyjames123
So, what exactly do you guys think is the biggest bottleneck in DirectX 9.0 games with the current generation NVIDIA FX cards?From what we know, it looks like the timings of the access to the register file is to blame.Also, what is preventing them from adding Pipelines to a card like the FX 5900 Ultra?I'm not that well qualified to answer that, being more a software guy than a hardware one. But to just add pipes would require more transitors. You can choose to use more silicum space but then the chip's price is going to increase probably too much, or to use a smaller process (like .11 instead of .13) but you then need to master the process and I believe there is still a significant rework needed on the chip to do that. In addition it would probably have impacts on other parts of the chip, such as the anti-aliasing, the memory system, etc. so there will be work needed there too to rebalance the chip. Therefore it's probably worth to use the opportunity to redesign a few things... ...and finally, you're ending up doing a new chip rather than just adding pipes (even if your new chip is based on a previous one).Finally, what would be the advantage of going with 8x2 vs 12x1 for the future generation, and vice versa? I'll leave that one, as I prefer dynamic pipelines (ie mixed precision) rather than fixed ones... But maybe that's just me! :)