|
|
#109 | |
|
Registered User
Join Date: Jul 2003
Posts: 1,526
|
Quote:
My main point though is that AMD achieves this with dual precision precision maximum(1 terraflop) with a GPU using "only" 4.3 billion transistors, though it does run at fairly high clocks, and can overclock another 200mhz....I'm focusing on the dual precision figure since it's the one used in mission critical and scientific computing environments where the results have to be as precise as possible, and i do keep in mind these are theoretical maximum figures and not likely to be acheivable in practical terms. Now on Nvidia's side and with the current fermi, it does a little over half as much in dual floating point and already uses 3+ billion transistors, so for AMD to do be pretty much twice as fast in that dept with "only" an extra billion transistors, and still be 50% faster than their previous high end HD6970 in gaming is quite an achievement to say the least, and both Cayman and Tahiti are only running with a 75Mhz difference in clock speeds(850 vs 925Mhz)....Like it or not. it's one efficient chip and makes every transistor spent on it's design count for something, while keeping the overall die size as small as possible. |
|
|
|
|
|
|
#110 | |
|
Registered User
Join Date: Jul 2003
Posts: 1,526
|
Here's another one that's interesting and this time it's regards with the back end side of the GPU wich affects gaming, namely it's fillrate and texturing speed:
![]() What's wrong with this picture when you know that Fermi has 50% more rops than the HD6970 and HD 7970( both have only 32) and also has a 384 bit memory bus(GTX 580 and HD7970), yet in mesured effective fillrate, the HD7970 kills it by 3.5 billion pixels per second, and even more than that relative to the HD6970, wich also has 32 rops and runs 75 Mhz less than the HD7970... If the Rops were of the same capabilites for both Fermi and Tahiti, the fact that Fermi has 50% more of them(48 Rops) would more than offset the clock speed differences relative to Tahiti, and both cards have a 384 bit memory bus, so that isn't it either and memory speed differences alone between both cards isn't enough either.....It's like Nvidia stuffed the Fermi GPU full of Rops, but they're not very efficient and don't get used much in practical real world terms, so they seriously need to be reworked/enhanced independently of the amount used in Kepler and it isn't just about shading power exclusively. Now there's texturing, where they use FP16 textures wich aren't widely used in games, but the results are surprising: ![]() Much faster than even a GTX 590 says everything really, and there's 128 texture units in a single tahiti versus 128 texture units between both GPU's on the GTX590...The same goes for tesselation performance where it was a really strong point and the HD7970 is about 30% faster there... Basically, whatever Kepler ends up being, it has to be improved/completely new in every area for gaming and GP-GPU computing over the Tahiti chip to cover all possible markets, both the gaming and the professional GP-GPU markets, while still comply with the 300 watt PCI-e power limits for a single GPU.....Dual GPU cards from both companies will blow thru that limit like it wasn't even there and that's before they're even overclocked... ![]() |
|
|
|
|
|
|
#111 | |
|
Registered User
Join Date: Jan 2003
Posts: 132
|
Quote:
|
|
|
|
|
|
|
#112 | |
|
007
Join Date: Apr 2007
Location: You were sayin'
Posts: 290
|
^
Are you talking about FP16? Because they doubled that in GF110 compared to GF100. ![]() http://www.anandtech.com/show/4008/n...orce-gtx-580/2 edit: whoops i see you were talking about pixel fillrate.. But still I agree with shadow001, Nvidia needs more pixel fillrate and texture fillrate.
__________________
intel Q9450 @ 3.656Ghz [1.3875v, LLC off]| GA-X48-DS5 [Memory Enhance: Turbo]|MSi N570GTX TwinFrozrIII OC/PowerEdition|Kingston HyperX 4x2GB PC 8500 @ 1097Mhz [5-5-5-18, 2.25v]| Creative X-FI Pro [SB046A]| Tagan PipeRock 600w [48A] |
|
|
|
|
|
|
#113 | |
|
Registered User
Join Date: Jul 2003
Posts: 1,526
|
Quote:
The main point though is that the Rops, and texture units still need to be enhanced not just to match the HD7970, but to beat it in order to have a faster card, so adding more of the same type found in Fermi takes up precious die space where they need a fair chunk of room to also enhance it's GP-GPU ability in single and double floating point math to be better than the HD7970 in that area too. Memory bandwith wise, given that both companies are limited to GDDR5 and it's close to it's limit in terms of maximum clock speeds, the only way to do that is add a 512 bit memory bus, wich means adding 2 more memory controlers in the GPU die too(8 in total since each is usually 64 bits wide), wich will also takes up die space and need more pins on the GPU packaging and a more complex PCB, and even then it gives it a 25% improvement in memory bandwith compared to the 384 bit bus on the HD7970, with the GDDR memory running at the same clock speeds on both cards... |
|
|
|
|
|
|
#114 | |
|
Registered User
Join Date: Jan 2003
Posts: 132
|
No offence, but it is fairly obvious you don't have a very clear technical understanding of the matter. Please answer the following questions:
Exactly, how much die space do the ROPs take on Fermi? What is the percentage of the total area of Fermi? How much die space do the ROPs take on Tahiti, and what percentage of the total area? Exactly, how should the ROPs and texture units be "enhanced?" Obviously, more pixel and texture fill would be great, but there is no evidence I can see that what Nvidia has now is inefficient. |
|
|
|
|
|
|
#115 | |
|
Registered User
Join Date: Jul 2003
Posts: 1,526
|
Quote:
Inefficient is a relative term depending on the what competition has and what it can do given the a certain die size, and keep in mind that tahiti is only a 365mm^ die at 28nm, and it's only using 32 ROPs like the previous generation Cayman used on the HD 6970, and both GPU's aren't that far apart in clock speeds(75 Mhz), yet tahiti beats it by a mile on fillrate and texturing in those charts, so it's obvious that AMD did a lot of improvements on the back end of tahiti and it wasn't just the shaders. Fermi is a 530mm^ die at 40nm as we all know, and simply shrinking that core down to 28nm still yeilds a core clocking in at 371mm^ without adding anything new to it, making it roughly the same size as tahiti on the HD7970....Would a straight Fermi shrink to 28nm, and clocked at the same speed as the core on tahiti used in tthe HD7970, and using the same type of memory clocked at the same speed, using the same 384 bit memory, be enough to beat it in raw fillrate and texturing speed....Short answer is no simply by looking at the what those charts suggest...Especially the texturing one(ouch). The single precision math of a tahiti core is 3.7 terraflops and dual precision is just about 1 terraflop even, while Fermi does 1.56 terraflops single precision and 650 gigaflops dual precision, so simply shrinking the core to 28nm and clocking it another 200 Mhz higher isn't enough to match the Tahiti never mind offering even more performance, wich is has too. Enhancing in this case simply comes down to doing more work for every clock cycle, and it needs it in every major area that affects both gaming performance and professional application performance, so whatever Kepler ends up being, it has to be something new from the ground up basically, not just an enhanced Fermi... |
|
|
|
|
|
|
#116 | |
|
Registered User
Join Date: Jan 2003
Posts: 132
|
And yet... you didn't answer my questions. Quite frankly, it is because you can't.
The problem with Fermi relative to GCN is not texturing efficiency or ROPs efficiency, it is compute unit (shader) efficiency, but I digress... Quote:
|
|
|
|
|
|
|
#117 | |
|
Registered User
Join Date: Jul 2003
Posts: 1,526
|
Quote:
We've only seen it in gaming, not computing performance though....And of course i can't answer how much space does each unit takes up die space wise since only the engineers the designed it would know such details and their capabilities.....Here's a picture of Fermi and just one shader block: ![]() Look at the texture units in blue, so as Nvidia adds more shader blocks they also add more texture units since they're built in, same for the tesselation hardware too while i'm at it. I wouldn't say it's the compute units exclusively as the Rops and texture units are decoupled relative to the shader block wich is a practice AMD has been doing for a while, unlike Nvidia wich for each shader block you automatically add more texturing units...Here's one compute unit on tahiti: ![]() Here's the entire thing and the texture units are seperate from the shader blocks, and so are major components of the graphics portion, such as tesselation ![]() |
|
|
|
|
|
|
#118 | |
|
Registered User
Join Date: Jan 2003
Posts: 132
|
I'm not the one who needs pictures. Either answer the questions or admit you have no actual evidence for your claims.
Exactly, how much die space do the ROPs take on Fermi? What is the percentage of the total area of Fermi? How much die space do the ROPs take on Tahiti, and what percentage of the total area? Exactly, how should the ROPs and texture units be "enhanced?" Quote:
|
|
|
|
|
|
|
#119 |
|
Join Date: Apr 2009
Location: EU
Posts: 1,041
|
|
|
|
|
|
|
#120 | |
|
Join Date: Jul 2003
Posts: 1,719
|
Quote:
If it performs like a GTX580 or slightly better, has 2GB of VRAM, and an MSRP of $299 NVIDIA would sell TONS of them and totally devalue 7970s and 7950s. Would be a pretty amazing turn of events if true, and great news for everyone except ATi. But I can't remember the last timne Charlie was right, so I'll take it with a 40# bag of softener salt.
__________________
Rig1: intel 990X + 2 X EVGA 3GB GTX580 + 3 X Acer GD235Hz 3D Vision Surround Rig 2: intel 2500K + NVIDIA GTX590 + Dell 3007 WFPHC [SIZE="1"]NVIDIA Focus Group Member [B]NVIDIA Focus Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the Members.[/B][/SIZE] |
|
|
|
|
![]() |
| Thread Tools | |
|
|