PDA

View Full Version : ATI Redwood 300 mm˛?


josiahsuarez
08-05-09, 12:42 AM
http://itbbs.pconline.com.cn/diy/10432745.html
----- AMD RV8XX gossip on some of the actual situation

September 17 in San Francisco issued a formal version of a full range of evergreen.
AMD DX11 graphics chips into high-end, performance, mainstream, entry-四档, high-end, code-named Cypress, the performance of code
Codenamed Redwood, are the mainstream of the two code-named Juniper and Cedar, entry, code-named Hemlock.

And flagship product Cypress uses the MCM design, is composed of 2 Redwood, Redwood chip size 300 square millimeters,
Juniper chip size of 181 mm2, Cedar die size 120 mm2. Currently, AMD DX11 performance and mainstream Products has proceeded smoothly, but the flagship product in the chip package Cypress encountered a little problem, Cypress
May be deferred to the AMD DX11 performance, after the mainstream market

=O

Toss3
08-05-09, 12:03 PM
http://itbbs.pconline.com.cn/diy/10432745.html


=O

Hmm that can't be right as the Cypress card(5870x2) is supposed to be built out of two Junipers(rv870). :|

josiahsuarez
08-05-09, 01:21 PM
yeah, the code names are mixed up but the die sizes are apparently correct. at least, the guys at B3D seem to think so. this is what they're saying in this thread (http://forum.beyond3d.com/showthread.php?t=49120&page=57):

Hemlock(R800, this is the x2 card?), Cypress(RV870 300mm˛), Cedar(RV840/~225mm˛), Juniper(RV830/180mm˛) and Redwood(RV810/120mm˛)

Toss3
08-05-09, 05:39 PM
yeah, the code names are mixed up but the die sizes are apparently correct. at least, the guys at B3D seem to think so. this is what they're saying in this thread (http://forum.beyond3d.com/showthread.php?t=49120&page=57):

Hemlock(R800, this is the x2 card?), Cypress(RV870 300mm˛), Cedar(RV840/~225mm˛), Juniper(RV830/180mm˛) and Redwood(RV810/120mm˛)

Guru3d must have gotten it wrong then.. (http://www.guru3d.com/news/ati-directx-11-family-codenames/):o

pakotlar
08-06-09, 03:01 AM
If it's 300mm^2, assuming the 40nm process is giving us ~50% density improvements for major features compared with 55nm (obviously some things scale much better than others), and adding DX11 doesn't require a significant investment from DX10.1, then I'm expecting something special with this mid-range part. That's bigger than rv770 by ~44mm^2 on a significantly smaller process, although with an increased feature set. I wonder what the extra logic cost is for adding DX11 support. Although considering that ATI seems to have some of the major features already implemented to a degree (tesselator for example) that won't require a complete addition, and nothing like required dual precision ALU's or anything crazy is going on...

I've heard over on B3D that the hardware arbiter is significantly more complex compared to the rv770. What could be causing the increase in complexity. Is this indicative that ATI is moving away from VLIW? I mean what could it need a hugely beefed up arbiter for unless its moving towards finer-grained ALU handling?

Heinz68
08-07-09, 12:11 PM
@josiahsuarez
I checked your source link (http://itbbs.pconline.com.cn/diy/10432745.html) and even with my very limited Chinese language :) I noticed there was something missing there, mainly this bold text in your post.
And flagship product Cypress uses the MCM design, is composed of 2 Redwood, Redwood chip size 300 square millimeters,
Looks like the source was edited, Google translate (http://translate.google.ca/translate?u=http%3A%2F%2Fitbbs.pconline.com.cn%2Fd iy%2F10432745.html&sl=zh-CN&tl=en&hl=en&ie=UTF-8).

There are few rumors about this MCM design but nothing confirmed. If it is MCM design that would be great plus I also like to see shared memory both of it should make the production cost cheaper.

EDIT
Never mind it looks like it just didn't show in the translate.

pakotlar
08-07-09, 06:12 PM
If it's 300mm^2, assuming the 40nm process is giving us ~50% density improvements for major features compared with 55nm (obviously some things scale much better than others), and adding DX11 doesn't require a significant investment from DX10.1, then I'm expecting something special with this mid-range part. That's bigger than rv770 by ~44mm^2 on a significantly smaller process, although with an increased feature set. I wonder what the extra logic cost is for adding DX11 support. Although considering that ATI seems to have some of the major features already implemented to a degree (tesselator for example) that won't require a complete addition, and nothing like required dual precision ALU's or anything crazy is going on...

I've heard over on B3D that the hardware arbiter is significantly more complex compared to the rv770. What could be causing the increase in complexity. Is this indicative that ATI is moving away from VLIW? I mean what could it need a hugely beefed up arbiter for unless its moving towards finer-grained ALU handling?

Just to elaborate, based on the Siggraph 2009 papers, ATI's "cores" have 16 ALU's, and each "ALU" has 5 computing engines (which can handle, I'm not sure, but say 1 MAD op per cycle). I say ALU in quotes because nvidia and ATI call all of their processing bits different things. But in the GT200, you have 30 cores, 8 "ALU" each, for a total of 240 "ALU"s. Each ALU contains 2 computing engines, but they are really different from ATI's, in that they do not contain the same functionality. One is MAD+MUL, the other is just MUL. Howeve, utilizaion of the MUL unit should be much better than G80, which, I believe, used the MUL unit to texture as well (or had it so that if the associated texture unit was working you couldn't dual issue).

The details about dual issuing with regards to G80 are probably not completely accurate.

The point is, that the compiler on nVidia GPU's doesn't have much to worry about. The instructions get sent to the cores, and hardware decides how to schedule the instructions between the ALU's (which each have the ability to cache 16 words).

ATI's on the otherhand (very long instruction word), caches many more words for each "ALU" and the compiler needs to spend a lot of time deciding how the instructions should be organized so you can have penta-issuing with each "ALU" (since each "ALU" has 5 processing units, each capable of supporting say, 1 MAD instruction.) So in effect nvidia processes commands with a depth query of 2, while ATI with one of 5.

Maybe based on the workload they've seen, they've decided that a more scalar architecture, with less "depth" is better.

josiahsuarez
08-08-09, 05:44 AM
@josiahsuarez
I checked your source link (http://itbbs.pconline.com.cn/diy/10432745.html) and even with my very limited Chinese language :) I noticed there was something missing there, mainly this bold text in your post.
And flagship product Cypress uses the MCM design, is composed of 2 Redwood, Redwood chip size 300 square millimeters,
Looks like the source was edited, Google translate (http://translate.google.ca/translate?u=http%3A%2F%2Fitbbs.pconline.com.cn%2Fd iy%2F10432745.html&sl=zh-CN&tl=en&hl=en&ie=UTF-8).

There are few rumors about this MCM design but nothing confirmed. If it is MCM design that would be great plus I also like to see shared memory both of it should make the production cost cheaper.

EDIT
Never mind it looks like it just didn't show in the translate.

yeah, that confused me for the longest time. I'm guessing it's just a weird quirk of google translate? if you put in part of the source "Cypress采用MCM设计,由2颗Redwood组成,Redwood芯片尺寸300平方毫米," you get "MCM designs using Cypress, Redwood by two components, Redwood chip size of 300 mm2," but if you put in the whole line "其中旗舰产品Cypress采用MCM设计,由2颗Redwood组成,Redwood芯片尺寸300平方 毫米," it doesn't return any result at all! very strange ?_?

pakotlar
08-08-09, 07:05 PM
yeah, that confused me for the longest time. I'm guessing it's just a weird quirk of google translate? if you put in part of the source "Cypress采用MCM设计,由2颗Redwood组成,Redwood芯片尺寸300平方毫米," you get "MCM designs using Cypress, Redwood by two components, Redwood chip size of 300 mm2," but if you put in the whole line "其中旗舰产品Cypress采用MCM设计,由2颗Redwood组成,Redwood芯片尺寸300平方 毫米," it doesn't return any result at all! very strange ?_?

Yeah redwood looks to replace 4890, with a 300mm2 die, which is huge considering 4890 is like 250 on 55nm and 40nm is a 50% size reduction of features not including memory cells. Although dx11 is an unknown quantity, I stll believe the changes needed for dx11 aren't significant enough to account for a large amount of the difference. It's not like they need to implement a transition like dx8 to dx9 or pixel/vertex to unified for dx10. Should be the smallest iteration yet from a hw standpoint. This is all opined by an uninformed man. But i expect great things from ATI.

My guess is we wont be seeing a VLIW 5 instruction deep ALU design. I think ATI is moving on, a more complex arbiter above and beyond whats needed for on paper increased ALU #'s is where this is going. I bet ATI has slimmed down their ALU's further, by removing the VLIW bits, making them thin but tons of them, and beefed up its arbiter to near nvidia proportions, but with some extra spice to make it more efficient.

Oh yeah, and shared memory and super efficient SFR approach, but seamless to the software. Great reduction in microstuttering to practically single-gpu standards. You'll fart rainbows upon booting it up.

pakotlar
08-08-09, 07:13 PM
yeah, that confused me for the longest time. I'm guessing it's just a weird quirk of google translate? if you put in part of the source "Cypress采用MCM设计,由2颗Redwood组成,Redwood芯片尺寸300平方毫米," you get "MCM designs using Cypress, Redwood by two components, Redwood chip size of 300 mm2," but if you put in the whole line "其中旗舰产品Cypress采用MCM设计,由2颗Redwood组成,Redwood芯片尺寸300平方 毫米," it doesn't return any result at all! very strange ?_?

The asians have long been a mysterious people to wide eyes.