View Full Version : ATI has problem with R600
AthlonXP1800
11-15-06, 09:30 AM
ATI wouldnt locked R600 at 500MHz while MSAA is used but it had to be disabled in A0 silicon. It look like ATI had similar problem with R520.
Here some more info about R600, it sporting MSAA or probably disabled it and use same FSAA feature as R580 if ATI cant fix it in A2 revision, it has 16 ROPs and 64 shaders 4 way SIMD.
http://www.theinq.com/default.aspx?article=35707
ATI is aiming at 700 to 800MHz clock speed for R600, I dont think they will achieve it because of complex chip with a bug, it probably achieve 600MHz but it will mean less fillrate than Geforce 8800GTX at 575MHz with 24 ROPs.
Then again....
http://www.theinquirer.net/default.aspx?article=35708
Which article do you believe? If thats true, that card sounds like its smoking crack.
AthlonXP1800
11-15-06, 02:32 PM
Then again....
http://www.theinquirer.net/default.aspx?article=35708
Which article do you believe? If thats true, that card sounds like its smoking crack.
It seem both article are most true. According to VR-Zone, the length of original R600 board was 12 inch long, rough about 2 inch longer than Geforce 8800GTX card. ATI are currently redesign the board trying to reduce the card length by 1 inch and probably lighter cooler. Retail boards will have 3 power connectors I think with 2's 4 pin and 6 pin power connectors, it will use more power than a Geforce 8800GTX with 2's 6 pin power connectors.
http://www.vr-zone.com/index.php?i=4293
SpiffMistroII
11-15-06, 04:03 PM
your assumptions are so far off based from the actull context it's not even funny.
ok Spiff give ya a chance here, if the Inq stuff correct then? Why don't you tell what parts are right and what parts are wrong?
NoWayDude
11-15-06, 05:11 PM
your assumptions are so far off based from the actull context it's not even funny.
Not too much diferent from yours in regards to G80....
SpiffMistroII
huh? what's this Qpx? and where did someone say R600 got trumped by it?
First off, Im getting really confused. I thought Nvidia was with Havok on physics. If they have a ppu on board, that would explain the extra 128bit bus and 128mb of memory. Those add on's are not for the GPU, but the PPU. So in reality, G80 actully has 512mb of memory, and a same old 256bit bus. I hope it dose not have a PPU, that would suck. I thought Nvidia was making three PCIe mobo's?
http://www.nvnews.net/vbulletin/showthread.php?t=77990&page=10
Peace (nana2)
SpiffMistroII
11-15-06, 05:48 PM
ok Spiff give ya a chance here, if the Inq stuff correct then? Why don't you tell what parts are right and what parts are wrong?
I don't think you understand me.
I'll start off by saying that the 500mhz lock was with the A0 silicon AKA egineering samples. It is still unknown that MSAA was resolved with the A1 silicon, but most likely is. If not, for sure the A2 final silicon. The 500mhz lock was resolved with the A1, and I can bet you some good money that ATI will be hitting 750mhz if not 800mhz to get their pixel fill rate to G80's and beyond.
I have no clue on where the inquire is getting 16 ROP's. Dose not matter much, but I can say that sounds about wrong. ROP's is not something Nvidia and ATI are going to be increasing much, but to say that R600 will have the same 16 pixels a clock as R580 is a NO NO. ( granted that the clock on R600 will be 125mhz to 150mhz faster ). So in other words AthlonXP, to think that ATI will release a chip with 16 ROP's and a lower clock than R580, would be very foolish indeed. That would give R600 a lower pixel fill rate than R580 and R520, and ATI dose not take steps backwards.
64 4 way SIMD ALU's sounds about right. If implemented correctly, R600 will will fair better than G80 in shader intensive opps.
It seem both article are most true. According to VR-Zone, the length of original R600 board was 12 inch long, rough about 2 inch longer than Geforce 8800GTX card. ATI are currently redesign the board trying to reduce the card length by 1 inch and probably lighter cooler. Retail boards will have 3 power connectors I think with 2's 4 pin and 6 pin power connectors, it will use more power than a Geforce 8800GTX with 2's 6 pin power connectors.
Indeed the board was long but for good reason. ATI have already mananged to shrink it down shorter than G80's PCB, and is planning on doing it yet again.
The R600 board is HUGE but funnily enough, biot in length. Even though the very first revision of the board was as long as the 7900GX2, back in late August/early September engineers pulled a miracle and significantly reduced the size of the board. Right now, they are working on even further optimisations of components, but, from what we saw, this is the most packed product in history of 3D graphics.
I can't comment on the two 4 pin connectors and one 6 pin as the power draw is still roughly the same as G80. Don't know what ATI plans on doing with that.
I can say the cooler for R600 is going to be big just like everyone says. But it's going to be a very good thing as it's still a 2 slot cooler. If you you think that x1950xt's cooler and 8800gtx's coolers look sick, brace your self for R600.
I don't think you understand me.
I understand you very well
I'll start off by saying that the 500mhz lock was with the A0 silicon AKA egineering samples. It is still unknown that MSAA was resolved with the A1 silicon, but most likely is. If not, for sure the A2 final silicon. The 500mhz lock was resolved with the A1, and I can bet you some good money that ATI will be hitting 750mhz if not 800mhz to get their pixel fill rate to G80's and beyond.
Starting off good here then ya just loose it
I have no clue on where the inquire is getting 16 ROP's. Dose not matter much, but I can say that sounds about wrong. ROP's is not something Nvidia and ATI are going to be increasing much, but to say that R600 will have the same 16 pixels a clock as R580 is a NO NO. ( granted that the clock on R600 will be 125mhz to 150mhz faster ). So in other words AthlonXP, to think that ATI will release a chip with 16 ROP's and a lower clock than R580, would be very foolish indeed. That would give R600 a lower pixel fill rate than R580 and R520, and ATI dose not take steps backwards.
Oh its going to hit 750 for sure, but the only reason is because it has to hit 750 with 64 vec 4 ALU's, and at 750 it will have a fillrate close to the g80, but not the shader power to go ahead of the g80, it will be close but no cigar.
64 4 way SIMD ALU's sounds about right. If implemented correctly, R600 will will fair better than G80 in shader intensive opps.
The only way a 64 4 way SIMD can match the g80 128 scaler processors is if its clocked 750 or above. Of course it will need do be able to also do 1 mul and add instruction on top of the Vec4 unit.
Indeed the board was long but for good reason. ATI have already mananged to shrink it down shorter than G80's PCB, and is planning on doing it yet again.
I can't comment on the two 4 pin connectors and one 6 pin as the power draw is still roughly the same as G80. Don't know what ATI plans on doing with that.
I can say the cooler for R600 is going to be big just like everyone says. But it's going to be a very good thing as it's still a 2 slot cooler. If you you think that x1950xt's cooler and 8800gtx's coolers look sick, brace your self for R600
The board better be able to brace it better then us waiting on bracing it :D
Again, you can't make statements like you did with the information you supposedly have since there is no factual basis behind it......
Edit: and don't forget the g80 will have much more utilization of its units, so either ATi isn't going to be able to out perform the g80 according to the Inq's numbers or the Inq's numbers are completely wrong. Which one will you pick? Now you stated pretty much the 64 ALU's seem reasonable to you, but you have no information on that front, but also in previous posts you stated the r600 will out perform the g80 by god knows how much. And in this thread you supported the Inq's statements, your posts are contradictory.
SpiffMistroII
11-15-06, 06:42 PM
Oh its going to hit 750 for sure, but the only reason is because it has to hit 750 with 64 vec 4 ALU's, and at 750 it will have a fillrate close to the g80, but not the shader power to go ahead of the g80, it will be close but no cigar.
I know darn well as you do that R600 needs to hit 750, did I say other wise??
The only way a 64 4 way SIMD can match the g80 128 scaler processors is if its clocked 750 or above.
In which it should and will.
Of course it will need do be able to also do 1 mul and add instruction on top of the Vec4 unit.
I don't see a problem with that.
and don't forget the g80 will have much more utilization of its units, so either ATi isn't going to be able to out perform the g80 according to the Inq's numbers or the Inq's numbers are completely wrong. Which one will you pick? Now you stated pretty much the 64 ALU's seem reasonable to you, but you have no information on that front, but also in previous posts you stated the r600 will out perform the g80 by god knows how much. And in this thread you supported the Inq's statements, your posts are contradictory.
Indeed I said 64 sounds reasonable. I never said it was fact. Many evidence also points to 96 which also sounds reasonable.
I never said R600 was going to to outperform gods amount over G80, if I came out that way, then I'm sorry. Thats' not what I'm trying to get at here.
It might also be important that roumor I posted from the Inquire was not made by the Inquire. The Inquire was playing copy cat from a Taiwan website that first posted the information. Don't act like I preach from the Inquire, because I don't.
god, i think this guy jerks himself off to the ati logo every night.
SpiffMistroII
11-15-06, 06:51 PM
god, i think this guy jerks himself off to the ati logo every night.
if you don't wan't me posting here then just come out and say it. I don't jerk off to the logo or anything like that. I don't see what I'm doing wrong, but if it pains you, I will leave.:(
I know darn well as you do that R600 needs to hit 750, did I say other wise??
In which it should and will.
I don't see a problem with that.
Indeed I said 64 sounds reasonable. I never said it was fact. Many evidence also points to 96 which also sounds reasonable.
64 4 way SIMD ALU's sounds about right. If implemented correctly, R600 will will fair better than G80 in shader intensive opps.
The r600 with 64 vec 4 with a madd ability (mul + add) will need to be clocked close to 900ish to get past the g80 theoretically (this includes the higher utilization of 20% of the g80). Again that isn't a possibility, thats why I said if its 64 it will be close, but no cigar.
I never said R600 was going to to outperform gods amount over G80, if I came out that way, then I'm sorry. Thats' not what I'm trying to get at here.
It might also be important that roumor I posted from the Inquire was not made by the Inquire. The Inquire was playing copy cat from a Taiwan website that first posted the information. Don't act like I preach from the Inquire, because I don't.
And are you going by Taiwaneese websites also? I don't see how you can make a statement with the back up of rumors off a website. Never said you preach the Inq, just said you are saying two different things, the Inq knows jack sh*t about anything technical, let alone thier crappy write ups. If they copied the stuff off another website, the other website knows jack sh*t too, And you seem to like these write ups when its pro-AMD/ATi, even when they are completely wrong within their own information.
SpiffMistroII
11-15-06, 07:27 PM
And are you going by Taiwaneese websites also? I don't see how you can make a statement with the back up of rumors off a website. Never said you preach the Inq, just said you are saying two different things, the Inq knows jack sh*t about anything technical, let alone thier crappy write ups. If they copied the stuff off another website, the other website knows jack sh*t too, And you seem to like these write ups when its pro-AMD/ATi, even when they are completely wrong within their own information.
No. I'm not going by these asian sites at all either. I was simply playing messenger boy and stating my opinion on the given information. I'm not excited about Info just because it's about AMD/ATI..:eek: I think anybody that sees possible info about a 512bit bus and 1gb of memory will get excited?
512 bit bus and 1 gb of memory doesn't mean jack when the chip is going be shader bottlenecked comparied the competition because the g80 is vary rarely bandwdith bottlenecked.
SpiffMistroII
11-15-06, 08:00 PM
512 bit bus and 1 gb of memory doesn't mean jack when the chip is going be shader bottlenecked comparied the competition because the g80 is vary rarely bandwdith bottlenecked.
ah but that's only if you believe it has 64 ALU's. Again 64 sounds reasonable but not as much as 96 decoupled with 24 ROP's IMO!
NoWayDude
11-15-06, 08:08 PM
ah but that's only if you believe it has 64 ALU's. Again 64 sounds reasonable but not as much as 96 decoupled with 24 ROP's IMO!
Ok, so what is it? 64 or 96? You have been trough all the numbers possible.
I'm all up for info and discussion on unreleased HW, but you are contradicting yourself more often than not.
And yes, we read B3D also. That info has been discussed and the consensus is that is probably wrong. Not the nrs, but what they may be doing with the units
ah but that's only if you believe it has 64 ALU's. Again 64 sounds reasonable but not as much as 96 decoupled with 24 ROP's IMO!
Have you checked die sizes out lately? at 96 Vec 4 units like that of the r580 with mmad instructions would be huge even at .80, it probably wouldn't be clocked more then 600 let alone 750, with the power usage of 250 watts, ATi doesn't use domain clocks like nV, I dont' see them using it for the r600 either, so far no hints to that effect at all. And the 96 ALU's explains the rumors of the higher power usage even at 600, not to mention it performance will be on par with the g80, so the performance delta's of the extra bandwidth won't show much of anything other then 10% difference at most if even that.
NoWayDude
11-15-06, 08:33 PM
And we are still to find out what MUL is doing on G80 also....
Rys
Well, I've not been too busy that I didn't run a few tests with 97.02. MUL rate** is ~92% of peak now, not using the thing for anything other than basic shading, up about 5% from 96.94. Others are up a wee bit too, in terms of theoretical throughputs, but nothing massive (and mostly near peak as before).
Generally, I think the guts of their upcoming compiler/assembler work will be to optimise certain mixes in terms of throughput, remove some bottlenecks, stop bubbling, etc, since the basics seem to work fine. Profiling some shipping game shaders is next on my list anyway, even if NVIDIA aren't bothered
But yes, Uttar is hinting that I'm off on holiday next week. Hope you all enjoy the madness while I lie on a beach in Portugal
** this is with a short shader and it going up over a driver revision, the hardware gets more efficient generally as shader program lengths go up, too.
http://www.beyond3d.com/forum/showthread.php?t=34676&page=20
SpiffMistroII
11-15-06, 08:34 PM
Ok, so what is it? 64 or 96? You have been trough all the numbers possible.
I'm all up for info and discussion on unreleased HW, but you are contradicting yourself more often than not.
And yes, we read B3D also. That info has been discussed and the consensus is that is probably wrong. Not the nrs, but what they may be doing with the units
I never said 64 for sure!!! I said it's a possiblity!!! And a sour one when compared to 96.
so it's not contridiction at all.
SpiffMistroII
11-15-06, 09:01 PM
I am extremely disappointed in the dynamic branching on the G80 (GeForce 8800 GTX). The marketing had hyped "improved dynamic branching". Yes, it is a bit improved. But it's nowhere near the performance offered on any of ATI's recent cards. The X1950 XTX performs a dynamic branch in about 0.5 ms. The X1800XL performs a dynamic branch in about 0.95ms. The GeForce 8800 TX (a.k.a. G80) takes a whopping 11ms to do a dynamic branch! It's getting beat by an order of magnitude by a card released 1 year ago!
This has grave consequences for nVidia's line of cards for any complex shaders, including the entire field of GPGPU. nVidia's best card can do 90 branches per second for each pixel. ATI's best card can do 2000 branches per second for each pixel. The G80 isn't in the same class here. It's not even in the same county.
graphics.stanford.edu/projects/gpubench/results/X1900XTX-5534/earlyz.pdf
graphics.stanford.edu/projects/gpubench/results/8800GTX-0003/earlyz.pdf
graphics.stanford.edu/projects/gpubench/results/8800GTX-0003/ps30.pdf
graphics.stanford.edu/projects/gpubench/results/X1800XT-5340/earlyz.pdf
-Raystonn
:o
:o
sorry dude but that is just incorrect probably a bug, check shadermark's shaders with branching performance, its quite a bit higher, plus, I've tested it out with shaders that I have written also. Actually the g80 branching performance is just a bit faster then the r580 right now, but it should get faster as more of the functions are unlocked or fixed through drivers.
Also where was this posted?
http://www.digit-life.com/articles2/video/g80-part2.html
That's another proof of the evident fact - G80 architecture is the architecture of the future. The harder a task, the more flexible shaders, the better this chip performs, breaking further away from competitors of the previous generation. As usual, branching is a weak spot of ATI's vertex unit. Let's hope that R600 will be a truly unified chip in this respect and the situation will change. As in case of G7X, G80 prefers dynamic branches to static ones.
Conclusions on geometry tests: G80 is an evident leader. Burdened with no SLI overheads and capable of directing all its 128 ALUs (operating at doubled frequency) to solve geometric tasks, this chip demonstrates excellent flexibility of the unified architecture and excellent capacities for working with complex dynamic code of vertex shaders. More that two-fold advantage - bravo! Let's see what awaits us in real applications. And we are looking forward to the release of DX10 that will help reveal full potential of this chip.
Aha, here is food for thought. Firstly, the unified architecture of G80 DOES NOT depend on precision of calculations and storage of intermediate results. At last you don't have to save on quality - you can always use 32-bit floating point calculations that guarantee excellent results without any rounding artifacts. Like in case with ATI, the results are absolutely identical for any precision. Besides, GX2 slightly outperforms G80 in texturing-dependent Water test (48 versus 32 units and the total of 512-bit buses in SLI mode versus 348), we can even speak of parity. But G80 takes up the lead in a more computation-intensive lighting test. Excellent computational capacity, ALU, ALU, and again ALU :-).
As we can see, G80 is always victorious (it's especially good at Frozen-Glass). GX2 noticeably lags behind due to SLI overheads and less flexible architecture. G80 performance does not depend on precision again. Now the same tests modified for texture sampling:
G80's advantage is less pronounced here, including absolute results. This chip certainly likes computations more than texturing. 32 texture units are necessary here. If the bus could be wider, there might have been 48 of them. But now GX2 leads in some tests. Too much depends on context and developers' preferences here. In order to reveal full potential of the G80, they will evidently have to choose (create) computation-intensive variants of their algorithms - in this case G80 will be able to gain 50% of performance.
And now the most flexible test - PS3. The test contains intensive dynamic branches in pixel shaders:
We have no doubts as to what architecture is the most advanced now and the best at working with dynamic branches in pixel shaders. It's G80. The second place is taken by unified RADEON, followed by GX2 (even SLI is of little help here for the old non-unified architecture).
Conclusions: Out of doubt, G80 is a new powerful computational architecture, well suited for executing the most complex pixel shaders. The more complex a task, the more computations it has, the larger is the gap between G80 and its competitors. In some cases programmers can get noticeable performance gains by optimizing their algorithms for computations instead of texture sampling. We can predict that there are some games, where the chip can gain advantage due to its 48 texture units and 512-bit memory bus. But the company makes a compromise here - it chooses flexibility and computational capacities for future applications.
G80 is the model platform for shaders with dynamic branching. We'll see what DX10 will bring us. We'll also see how it will change the layout of forces in real applications, especially in modern and outdated ones.
The last quote was for a shader with dynamic branching and it shows where the g70's fail compared to the x1950xtx and the g80 outperforms all the other cards.
Moolicious
11-15-06, 09:44 PM
Also where was this posted?http://rage3d.com/board/showthread.php?t=33873447
NoWayDude
11-16-06, 07:56 AM
Am I missing something in here?
http://www.gpgpu.org/sc2006/slides/10.houston-understanding.pdf
Why was this discussed at B3D and no one found any problems with branching at those 1st results?
NoWayDude
11-16-06, 07:59 AM
:o
There are so many things wrong in your post, basic things, that I have trouble believing you've written any code at all, and that this appears to be a troll attempt.
First of all, the X1950XTX does not perform a branch in 500 microseconds (0.5 ms). Are you insane? This chip runs at 650Mhz, and 500 microseconds would mean 650 x 10^6 * 0.5 * 10^-3 = 325 * 10^3 = 325,000 cycles latency! What you've done is look at GPUBench scores and fail to understand what was being plotted.
Secondly, the comparisons you link to are not "dynamic branching" tests, they are tests are z-cull functionality. Even the GeForce3 has had this, well before DirectX9. It is not more a test of shader branching functionality than early stencil reject, or alpha-kill.
Third, traversing a BSP with dynamic branching is not so much testing DB performance, but gather operations as well. There are a gazillion variables to consider in any BSP traversal technique, so unless you are prepared to post sample code that reproduces the problem, or atleast explain in pseudo-code detail the algorithm, data layout, et al you are using, the claims are kinda meaningless.
Fourth, anyone doing pointer-chasing algorithms would do well to sign up to the CUDA program, as CUDA claims to expose a linear on-chip local storage model with a C programming model that allows gather/scatter "pointer chasing" style code to run alot faster, as well as offering inter-thread communication and synchronization.
Maybe if Mike Houston claimed that G80 DB performance was 20x worse than an R580, people might take it more seriously, but you've made a post where you misinterpreted GPU bench figures, and then claimed you have some private benchmark test, without providing any details.
Someone got owned badly didn't he?
http://www.rage3d.com/board/showthread.php?t=33873447&page=2
vBulletin® v3.7.1, Copyright ©2000-2012, Jelsoft Enterprises Ltd.