My calculations are correct (or my degree in mathematics is failing me). Buffering does not affect ROP throughput
but it does add additional bandwidth and memory space overhead.
Even then, there is more than enough fillrate for 100 fps+.
The 3 most likely culprits in performance are 1) Bandwidth, 2) being shader bound, 3) being tessellation bound.
It is very likely that the 4XAA is causing you to exceed the available local memory of your card, which then spills over to system ram causing a nosedive in performance.
Sweet, pop one in, downclock to 776/4008 and run 3dmark vantage.
Just in case someone reading this thread might be genuinely curious I looked at the numbers a little more closely:
HD6970: 8780 MP/sec with 176 GB/sec bandwidth
GTX580: 9750 MP/sec with 192.4 GB/sec bandwidth
HD7970: 13300 MP/sec with 264 GB/sec bandwidth
While it is impossible to tell how efficient the ROPs themselves are from this data, we may investigate how efficiently each chip uses its available bandwidth in this bandwidth limited test.
HD6970: 8780/176 = 49.89 MP per GB per sec
GTX580: 9750/192.4 = 50.68 MP per GB per sec
HD7970: 13300/264 = 50.38 MP per GB per sec
Thus, we see that the GTX580 is actually being the most efficient with the bandwidth available to it, while the 7970 and 6970 are not very far behind. In fact, I would say the numbers are close enough together that for all practical purposes the chips are equally efficient.
Now, we might ask ourselves, how much bandwidth do these chips actually need to take full advantage of their ROPs (so that bandwidth is no longer the bottle neck and the ROPs are)? Given the above data, this is not too difficult to calculate.
HD6970: 28,160 MP of Fillrate / 49.89 MP/GB/sec = 564.44 GB/sec of bandwidth required
GTX580: 24,832 MP of Fillrate / 50.68 MP/GB/sec = 489.98 GB/sec of bandwidth required
HD7970: 29,600 MP of Fillrate / 50.38 MP/GB/sec = 587.53 GB/sec of bandwidth required
That is how much bandwidth each chip would need to score the max its ROPs are capable of.
Here, we may ask, "if bandwidth is so great, then why didn't they design the above chips with all the bandwidth they needed to never be bottlenecked?" The answer is that additional memory channels are expensive both in terms of die space and board design. Additionally, the closer you get to the "ideal" bandwidth for the chip, the less additional bandwidth pays off because it is bandwidth limited less and less often. As an example, the HD7970 would still "only" have 528 GB/sec with a 768-bit memory interface (assuming 5.5 Gbps memory).
By this point you may be thinking, "well that sucks!" Yeah, prettymuch. To see just how big of a potential issue bandwidth is, you may want to read the following article: http://research.nvidia.com/sites/def...Micro_2011.pdf
*If you don't trust Nvidia engineers, let's ask some AMD ones by comparing the 6970 and 7970.
HD6970 vs 7970
Texture Fill: +40%
So bandwidth got the single largest increase of anything in 7970 from 6970, and a 10x larger increase than pixel fill. I'm going to wager the AMD engineers had pretty good reasons for their design choices in this regard.