PDA

View Full Version : ATI has problem with R600


Pages : 1 [2]

SpiffMistroII
11-16-06, 07:36 AM
Someone got owned badly didn't he?

http://www.rage3d.com/board/showthread.php?t=33873447&page=2



Someone got owned??? What is this now? The World Of WarCraft forum??

Sorry boy no one got "OWNED".


Originally Posted by DemoCoder
First of all, the X1950XTX does not perform a branch in 500 microseconds (0.5 ms). Are you insane? This chip runs at 650Mhz, and 500 microseconds would mean 650 x 10^6 * 0.5 * 10^-3 = 325 * 10^3 = 325,000 cycles latency! What you've done is look at GPUBench scores and fail to understand what was being plotted.

What is being plotted is the average time for a single branch across all pixels. This includes the latency of waiting for free shader units so the code can actually be run. It is not a measurement of a single branch on a single shader unit. That would complete much more quickly. This benchmark is a fairly accurate measurement of performance on branch-intensive shaders.

Quote:Secondly, the comparisons you link to are not "dynamic branching" tests, they are tests are z-cull functionality. Even the GeForce3 has had this, well before DirectX9. It is not more a test of shader branching functionality than early stencil reject, or alpha-kill.

This is exactly the kind of technique formerly used to do branching. As it requires more overhead, it should not perform better than a true PS 3.0 branch. However, early z branching on ATI completes 20 times more quickly than either early z branching or PS 3.0 branching on the G80.

Quote:Third, traversing a BSP with dynamic branching is not so much testing DB performance, but gather operations as well. There are a gazillion variables to consider in any BSP traversal technique, so unless you are prepared to post sample code that reproduces the problem, or atleast explain in pseudo-code detail the algorithm, data layout, et al you are using, the claims are kinda meaningless.

Needless to say, we do not have a "gazillion" variables in our efficient implementation. The representation of our tree in the texture is very cache coherent for our purposes. I am not going to release information on our algorithm But suffice it to say, this is definately a test of dynamic branching.

Quote:Fourth, anyone doing pointer-chasing algorithms would do well to sign up to the CUDA program, as CUDA claims to expose a linear on-chip local storage model with a C programming model that allows gather/scatter "pointer chasing" style code to run alot faster, as well as offering inter-thread communication and synchronization.

Inter-thread communication and synchronization is not needed and not desired. We do have a registration sent in to nVidia to get further information on CUDA. But performance increases are not really expected.

Quote:Maybe if Mike Houston claimed that G80 DB performance was 20x worse than an R580, people might take it more seriously, but you've made a post where you misinterpreted GPU bench figures, and then claimed you have some private benchmark test, without providing any details.

I misinterpreted nothing. Perhaps you misinterpreted what I was saying about the GPUBench figures. I have already given far too much information out regarding our engine. I cannot really give out much more. The GPUBench figures stand on their own.

-Raystonn

NoWayDude
11-16-06, 07:57 AM
Are you actualy reading that thread, and looking at the post that are there or just being a WUM?


To make a long story short the GPUBench branching test is broken on ATI cards. Yes I mean the Z variant too. I am wonder that no one have seen the straight horizontal line and come to the right conclusion. This is a simple case of the driver is to smart like we seen at the RCP instruction issue all the time. The big math block that should only run for some pixels can mathematically reduced to simple instructions. ATI does this nVidia not. Therefore nVidia is much slower here.
http://www.rage3d.com/board/showthread.php?t=33873447&page=3

Now tell me something, who do you give more weight too? DemoCoder and Demirug, or the person making this bold claims without even disclosing what the code is doing, and by the looks of it, knowing what he is doing? He seems to be very assured of himself, and at the same time very arrogant to other sites doing branching tests.

I ask you to loom once again at the link I posted, here http://www.gpgpu.org/sc2006/slides/10.houston-understanding.pdf , and tell me why did they not mentioned this problem from the outset?

natan
11-16-06, 11:57 AM
ATI wouldnt locked R600 at 500MHz while MSAA is used but it had to be disabled in A0 silicon. It look like ATI had similar problem with R520.

Here some more info about R600, it sporting MSAA or probably disabled it and use same FSAA feature as R580 if ATI cant fix it in A2 revision, it has 16 ROPs and 64 shaders 4 way SIMD.

http://www.theinq.com/default.aspx?article=35707

ATI is aiming at 700 to 800MHz clock speed for R600, I dont think they will achieve it because of complex chip with a bug, it probably achieve 600MHz but it will mean less fillrate than Geforce 8800GTX at 575MHz with 24 ROPs.


you dont think they will achieve that ferquency ?
but, let me ask you that ...

are you an TSMC ingeneer or something like that ? are you an expert, or just nvboard geek ? :D
we can all speculat in this way. i can too tell 4 way SIMD alu unit is more powerfull than 1 way scalar unit , but it doesnt matter because i'm just a geek you know.

now, i'm happy to know you don't think they can, simply because it make me laugh :D

NoWayDude
11-16-06, 01:14 PM
SpiffMistroII

Someone got owned??? What is this now? The World Of WarCraft forum??

Sorry boy no one got "OWNED".

Ok boy, just to end the debate, so you can see what misinformation from someone can do:

He's misinterpreting some of the GPUBench results. The graphs only show square branch regions, but the branching test can handle rectangular. We measure G80 with 100 series drivers at 16x4. (Personally, I find long thin rectangular branch regions a little insane...). That puts g80 at a 64 pixel branch via our measurements and test setup. Using similar tests under DX (not currently working, seems to be a DX spec change as both vendors no longer work...), we see ~96 for R580, but I can't remember the branch pattern off the top of my head, but it was reasonably square.

Nvidia does have a branch penalty. It's NOT in milliseconds, that is the total measured overhead of many iterations. G80 is *much* better than G7X for branch overhead, but there is still a cost to a branch that you don't see in the early-Z graph. Once we can get a patched DX test, we can take a look at ATI performance and overheads again.

Another interesting side note is that Nvidia G80 now has better latency hiding than R580. You only need 4 float4 ops to cover a fetch from cache (you couldn't cover all the latency on G7X), whereas R580 needs 12 (3 ALUs per pipe).

That is a quote from the person that develepod the GPGPU bench

http://www.beyond3d.com/forum/showthread.php?p=874511#post874511

Now, can we go back to your opinions about R600?

TG01
11-16-06, 03:59 PM
Not too much diferent from yours in regards to G80....

Peace (nana2)

Are you sure on that 3Dmark06 score of yours with that 8800GTS...??

NoWayDude
11-16-06, 04:34 PM
Are you sure on that 3Dmark06 score of yours with that 8800GTS...??
That's with mine 7900GT. It does not let publish new scores at the moment :(
New one is 8754

http://service.futuremark.com/compare?3dm06=615324

retsam
11-16-06, 05:36 PM
Ok boy, just to end the debate, so you can see what misinformation from someone can do:



That is a quote from the person that develepod the GPGPU bench

http://www.beyond3d.com/forum/showthread.php?p=874511#post874511

Now, can we go back to your opinions about R600?

Razor1 their is no point in discussing in NVnews anymore. We can talk here.


see he wants to beleive what he wants to beleive, i can smell a fanboy a mile away.... hes been like this the day he join...

retsam
11-16-06, 05:41 PM
after reading threw all the threads, it seems like some sort of viral marketing on ati's part or some other party to put doubt in peoples mind about the g80. i mean the guy that origanlly came up with this has been called out at b3d...i think this really ends the debate.

NoWayDude
11-16-06, 05:43 PM
see he wants to beleive what he wants to beleive, i can smell a fanboy a mile away.... hes been like this the day he join...

Oh I know that. Does not change the fact that people are interested on what R600 will be.
It does however bother me the attitude of "I know better than...."

FFS, the guy that posted this is being eaten alive by the GPGPU writer, Democoder,Demirug and et all. He is being slanted at B3D, and still, there are a few fan boys that are swallowing all the garbage just because it slanders their pet hate...Talk about don't letting insinuations get in front of facts.... I really don't get it...:(

retsam
11-16-06, 07:47 PM
Oh I know that. Does not change the fact that people are interested on what R600 will be.
oh, i cant wait for r600 i think its going to be a very good card but the crap some of these guys are shoveling is just insane. i like nvidia just becuase of what jen suan said about "engineering there way out of any trouble they get into" i really like that attitute in a company but im not going to lie and steel for these guys like some of the kids are doing /cough spiffmister.

rwolf
11-17-06, 12:01 AM
Have you checked die sizes out lately? at 96 Vec 4 units like that of the r580 with mmad instructions would be huge even at .80, it probably wouldn't be clocked more then 600 let alone 750, with the power usage of 250 watts, ATi doesn't use domain clocks like nV, I dont' see them using it for the r600 either, so far no hints to that effect at all. And the 96 ALU's explains the rumors of the higher power usage even at 600, not to mention it performance will be on par with the g80, so the performance delta's of the extra bandwidth won't show much of anything other then 10% difference at most if even that.

R580 has a ring bus that runs through the chip at the same speed as the memory bus. ATI even has a patent on synchronizing circuits with different clock speeds from 2003. Another hint would be a technology development agreement that they signed with another company which makes high clocked circuits on standard fab processes. Although I doubt this will make a showing until 65nm.

rwolf
11-17-06, 12:03 AM
oh, i cant wait for r600 i think its going to be a very good card but the crap some of these guys are shoveling is just insane. i like nvidia just becuase of what jen suan said about "engineering there way out of any trouble they get into" i really like that attitute in a company but im not going to lie and steel for these guys like some of the kids are doing /cough spiffmister.

Was he talking about NV30 and the 16/32 precision issue when he made those statements?

Come on, put away your own shovel and admit your biased like the rest of us because your brand loyal. :)

Razor1
11-17-06, 06:49 AM
R580 has a ring bus that runs through the chip at the same speed as the memory bus. ATI even has a patent on synchronizing circuits with different clock speeds from 2003. Another hint would be a technology development agreement that they signed with another company which makes high clocked circuits on standard fab processes. Although I doubt this will make a showing until 65nm.


Yeah was talking more about the core components too :). I'm not sure about ATi's contract (the other one not fast14, I think we can forget about fast14 now though) with the other company you mentioned, seems a bit too early for them to use it on the r600, I agree something for the future to look at though.

Actually I didn't know about this patent, could ya give me a link?

Well my comments were based on a 1024 bit ring bus with 512 external and 96 shader arrays, thats exactly double the r580, not to mention no dx10 stuff or unified shaders, the chip would be more then 2 times the transistor count, even on .80 that would make it bigger then the g80

retsam
11-17-06, 07:57 AM
Was he talking about NV30 and the 16/32 precision issue when he made those statements?

Come on, put away your own shovel and admit your biased like the rest of us because your brand loyal. :)
i guess reading is fundamental for ya..why dont you go back and reread what i said.