View Single Post
Old 01-26-12, 02:29 PM   #163
shadow001
Registered User
 
Join Date: Jul 2003
Posts: 1,526
Default Re: next gen kepler to support dx 11.1, also take a year to rollout all cards

Quote:
Originally Posted by ninelven View Post
My calculations are correct (or my degree in mathematics is failing me). Buffering does not affect ROP throughput but it does add additional bandwidth and memory space overhead.

Even then, there is more than enough fillrate for 100 fps+.


The 3 most likely culprits in performance are 1) Bandwidth, 2) being shader bound, 3) being tessellation bound.


It is very likely that the 4XAA is causing you to exceed the available local memory of your card, which then spills over to system ram causing a nosedive in performance.

Sweet, pop one in, downclock to 776/4008 and run 3dmark vantage.

Just in case someone reading this thread might be genuinely curious I looked at the numbers a little more closely:

HD6970: 8780 MP/sec with 176 GB/sec bandwidth
GTX580: 9750 MP/sec with 192.4 GB/sec bandwidth
HD7970: 13300 MP/sec with 264 GB/sec bandwidth

While it is impossible to tell how efficient the ROPs themselves are from this data, we may investigate how efficiently each chip uses its available bandwidth in this bandwidth limited test.

HD6970: 8780/176 = 49.89 MP per GB per sec
GTX580: 9750/192.4 = 50.68 MP per GB per sec
HD7970: 13300/264 = 50.38 MP per GB per sec

Thus, we see that the GTX580 is actually being the most efficient with the bandwidth available to it, while the 7970 and 6970 are not very far behind. In fact, I would say the numbers are close enough together that for all practical purposes the chips are equally efficient.

Now, we might ask ourselves, how much bandwidth do these chips actually need to take full advantage of their ROPs (so that bandwidth is no longer the bottle neck and the ROPs are)? Given the above data, this is not too difficult to calculate.

HD6970: 28,160 MP of Fillrate / 49.89 MP/GB/sec = 564.44 GB/sec of bandwidth required
GTX580: 24,832 MP of Fillrate / 50.68 MP/GB/sec = 489.98 GB/sec of bandwidth required
HD7970: 29,600 MP of Fillrate / 50.38 MP/GB/sec = 587.53 GB/sec of bandwidth required

That is how much bandwidth each chip would need to score the max its ROPs are capable of.

Here, we may ask, "if bandwidth is so great, then why didn't they design the above chips with all the bandwidth they needed to never be bottlenecked?" The answer is that additional memory channels are expensive both in terms of die space and board design. Additionally, the closer you get to the "ideal" bandwidth for the chip, the less additional bandwidth pays off because it is bandwidth limited less and less often. As an example, the HD7970 would still "only" have 528 GB/sec with a 768-bit memory interface (assuming 5.5 Gbps memory).

By this point you may be thinking, "well that sucks!" Yeah, prettymuch. To see just how big of a potential issue bandwidth is, you may want to read the following article: http://research.nvidia.com/sites/def...Micro_2011.pdf

*If you don't trust Nvidia engineers, let's ask some AMD ones by comparing the 6970 and 7970.

HD6970 vs 7970
Fillrate: +5%
Bandwidth: +50%
Texture Fill: +40%
FLOPs: +40%

So bandwidth got the single largest increase of anything in 7970 from 6970, and a 10x larger increase than pixel fill. I'm going to wager the AMD engineers had pretty good reasons for their design choices in this regard.


I don't doubt your calculations at all, and in the case i mentioned in the heaven demo and it overspills to system memory at 4X AA causing performance to drop off a cliff is very likely correct too, but it was also only running between 20~30 FPS at 2X AA wich demanding enough as it is, so it's fair to assume that even if there was enough memory on the card, it would likely be slower still at 4X AA.


Low and behold, here comes AMD with a card packing twice as much memory and more memory bandwith than a base GTX580, and selling for 50$ less than the 3GB GTX580's, wich weren't available when i bought my GTX580's anyhow, not to mention more performance overall in every aspect and being released much earlier than it's intended competition....SOLD!!!....
shadow001 is offline   Reply With Quote