PDA

View Full Version : NV30 fragment inner workings speculation


Uttar
04-10-03, 07:42 AM
EDIT: After some thinking, it just doesn't make sense.

Hey everyone,

Just thought I'd post my latest theory on the NV30 fragment inner workings system...

The following is my guess:
The NV30 got no more pipes to GPUs. Just "units".

- 32 Z Check units
- 32 Fragment Shading units

The Fragment Shading units got the following:
- 1FP op OR 1 texture dependent fetch OR 2 normal texture fetches OR 1 FX12 op ( using FP16 and simulating FX12 ) in 8 cycles
- 1 FX12 ADD op OR 2 FX12 MUL ops in 4 cycles

Then there's also a pool of at least 32 FP32 registers ( probably more like 64 or 128 ) and 1024 instructions.
If there aren't enough registers, then you got associativity at work. I've got no idea how that associativity functions, but looking at performance numbers with a high number of registers, I guess it ain't very optimized.

Feedback/comments/flames on that insane, strange speculation?


Uttar

SnakeEyes
04-10-03, 10:39 AM
Originally posted by Uttar
If there aren't enough registers, then you got associativity at work. I've got no idea how that associativity functions, but looking at performance numbers with a high number of registers, I guess it ain't very optimized.
I look forward to hearing input on this from the other more knowledgable people here (definitely not including myself in this group). But I'd guess that this would definitely be an area that could have been improved in the nv35 core, resulting in some of the estimated performance increases over the nv30 at a given clock rate (not discounting memory bandwidth effect though).

jolle
04-20-03, 08:36 AM
Remember the NV35 benches posted on NVnews a while back?
was there any truth in that?

"While running a quick benchmark in Quake3, 1600x1200, 4XAA and 8xAF it got 111 FPS, compared to a GeForce FX Ti5800 Ultra that got 48 FPS. However both chips were clocked at 250MHz because it was still a prototype, and they wanted to make an exact comparision to the NV30. So basically both were set to 250 in order to make a fair benchmark."
http://www.chip.de/news/c_news_10209939.html

With both clocked at 250Mhz Core the NV35
is over 50% faster at 1600*1200 4xAA 8xAF.
If this is true then they have atleast improved
AA and AF significantly..
Can this increase be credited to the higher bandwidth
alone?
Curious on how the nr would look without AA and AF
tho, might be alot less differance..

jolle
04-20-03, 08:36 AM
crap, did a quote instead of edit hehe

ragejg
04-20-03, 01:46 PM
The speed improvement speculation?

I didn't believe it with the nv30 :( but I believe it with this card... for some reason:)...

cool... all the extra enhanced features the GFFX is supposed to offer PLUS teh bandidth/AF/AA improvements we've all been waiting for?? MUHAHAHAHA:p:p....

*me droolz for the midrange variant...*

StealthHawk
04-20-03, 10:53 PM
sure it might be true, but that hardly means that the two chips will compare the same way with retail clocks.

Uttar
04-21-03, 06:01 AM
Hmm, after rereading this, it does seem kinda strange. 8 cycles? I don't think there's any way to loopback and do something like that in 8 cycles... Don't know for sure, though, I'm obviously no engineer.

I've actually got another idea now - sometype of deeply pipelined system. Still didn't think much about it, though...


Uttar

jolle
04-21-03, 06:42 AM
Uttar Im not as schooled on this as you guys seem to be,
but what do you mean with "deeply pipelined system"?
Not like the P4? longer pipelines, higher clock and less
work done per clock i hope?
what do you mean with 8 cycles?

StealthHawk, you would guess that if those benches
are true the difference might be smaller at retail clock frequencys?
I assume NV35 will end up at around 500Mhz core too?
How would that work? is it the memory that would even out
the NV35 lead?
I was skiming a tread here where its said DDR1 has lower
latency, so wouldnt that give NV35 a edge over NV30 if
they push the RAM to 1Ghz DDR?
and adding the 256bit buss to that.

Those benches shows improved FSAA and AF, but it really
doesnt say how fast is would be overall.
Might be a very small differance without FSAA&AF

StealthHawk
04-21-03, 07:01 PM
Originally posted by jolle
StealthHawk, you would guess that if those benches
are true the difference might be smaller at retail clock frequencys?
I assume NV35 will end up at around 500Mhz core too?
How would that work? is it the memory that would even out
the NV35 lead?

yes, the core will be somewhere around 500MHz. but the memory clock is lower than the clock found on NV30. at least that is what current rumors indicate. I'm not sure exactly how the comparison was made, the core was mentioned, but not the memory frequency.

this can lead to several situations:
1) they clocked the memory the same, so the NV35 has twice the raw bandwidth of the NV30
2) they used retail memory clocks, so that the NV35 has ~60% more.
3) since the core was only clocked at 250MHz, the memory on the NV35 may also have been clocked lower. In this case we don't know how the mem clk of NV30 compares unless they underclocked it.

of course the memory controller of NV35 should be optimized compared to NV30 too, adding more effective bandwidth.

Those benches shows improved FSAA and AF, but it really
doesnt say how fast is would be overall.
Might be a very small differance without FSAA&AF

indeed that is the case. scores will barely improve if they do at all in non-bandwidth situations.

unless the AF algorithms are fundamentally changed I'm not sure how the NV35 would compare with AF to the NV30. AF uses primarily fillrate, so I don't really see why it would drastically speed up unless used in conjunction with FSAA.

2x FSAA was already very fast, with only a 10% hit, it probably won't improve much.

4x FSAA should be substantially faster due to the increased bandwidth of NV35.

marcocom
05-01-03, 04:32 PM
Originally posted by jolle
Im not as schooled on this as you guys seem to be,
but what do you mean with "deeply pipelined system"?


You are in their school.

digitalwanderer
05-01-03, 04:44 PM
Are you guys speaking english?


It LOOKS like english, and I understand most of the words and such....but the way you put them together makes absolutely NO sense to me. :confused:


You guys should put a warning on these kind of threads:

"WARNING: Bigbrain thread, enter at your own risk!"

;)


(Then again, with a title like "NV30 fragment inner workings speculation" mebbe I should have expected to come out of this thread feeling really stupid....) :bleh: