View Single Post
Old 01-26-12, 10:27 AM   #161
ninelven
Registered User
 
Join Date: Jan 2003
Posts: 132
Default Re: next gen kepler to support dx 11.1, also take a year to rollout all cards

Quote:
Originally Posted by shadow001
I know that it's still a truckload of fillrate no matter what, but it isn't enough for the theoretical maximums that the chips are rated for, in terms of available memory bandwith, and the Fps calculations you made aren't quite right, since there's this little thing called triple buffering, so there's 2 extra frames already stored in memory ahead of the one that's being displayed on the LCD that very moment, so cut that by 3.
My calculations are correct (or my degree in mathematics is failing me). Buffering does not affect ROP throughput but it does add additional bandwidth and memory space overhead.

Quote:
Originally Posted by shadow001
Then add antialiasing when each frame has 12 megapixels across 3 screens to begin with, and let's go with SSAA( super sampling AA), to force the GPU's to render at a higher internal resolution than the display resolution, and really push them to their limits fillrate wise.
Even then, there is more than enough fillrate for 100 fps+.


Quote:
Originally Posted by shadow001
Yes, i like to torture hardware, and i enjoy finding the breaking point...For instance, let's try the heaven benchmark at 7880*1440, and at up to 2X antialiasing, it can still play back normally, though the FPS figures are pretty low thru the entire benchmark(20~30 FPS).
The 3 most likely culprits in performance are 1) Bandwidth, 2) being shader bound, 3) being tessellation bound.


Quote:
Originally Posted by shadow001
Jack it up to 4X AA(and it's the MSAA variety to boot, not SSAA), and it displays a new frame every 30 secs....Nope, 3 water cooled GTX580's can no longer handle it anymore.
It is very likely that the 4XAA is causing you to exceed the available local memory of your card, which then spills over to system ram causing a nosedive in performance.

Quote:
Originally Posted by shadow001
My 4 cards just arrived earlier today
Sweet, pop one in, downclock to 776/4008 and run 3dmark vantage.

Just in case someone reading this thread might be genuinely curious I looked at the numbers a little more closely:

HD6970: 8780 MP/sec with 176 GB/sec bandwidth
GTX580: 9750 MP/sec with 192.4 GB/sec bandwidth
HD7970: 13300 MP/sec with 264 GB/sec bandwidth

While it is impossible to tell how efficient the ROPs themselves are from this data, we may investigate how efficiently each chip uses its available bandwidth in this bandwidth limited test.

HD6970: 8780/176 = 49.89 MP per GB per sec
GTX580: 9750/192.4 = 50.68 MP per GB per sec
HD7970: 13300/264 = 50.38 MP per GB per sec

Thus, we see that the GTX580 is actually being the most efficient with the bandwidth available to it, while the 7970 and 6970 are not very far behind. In fact, I would say the numbers are close enough together that for all practical purposes the chips are equally efficient.

Now, we might ask ourselves, how much bandwidth do these chips actually need to take full advantage of their ROPs (so that bandwidth is no longer the bottle neck and the ROPs are)? Given the above data, this is not too difficult to calculate.

HD6970: 28,160 MP of Fillrate / 49.89 MP/GB/sec = 564.44 GB/sec of bandwidth required
GTX580: 24,832 MP of Fillrate / 50.68 MP/GB/sec = 489.98 GB/sec of bandwidth required
HD7970: 29,600 MP of Fillrate / 50.38 MP/GB/sec = 587.53 GB/sec of bandwidth required

That is how much bandwidth each chip would need to score the max its ROPs are capable of.

Here, we may ask, "if bandwidth is so great, then why didn't they design the above chips with all the bandwidth they needed to never be bottlenecked?" The answer is that additional memory channels are expensive both in terms of die space and board design. Additionally, the closer you get to the "ideal" bandwidth for the chip, the less additional bandwidth pays off because it is bandwidth limited less and less often. As an example, the HD7970 would still "only" have 528 GB/sec with a 768-bit memory interface (assuming 5.5 Gbps memory).

By this point you may be thinking, "well that sucks!" Yeah, prettymuch. To see just how big of a potential issue bandwidth is, you may want to read the following article: http://research.nvidia.com/sites/def...Micro_2011.pdf

*If you don't trust Nvidia engineers, let's ask some AMD ones by comparing the 6970 and 7970.

HD6970 vs 7970
Fillrate: +5%
Bandwidth: +50%
Texture Fill: +40%
FLOPs: +40%

So bandwidth got the single largest increase of anything in 7970 from 6970, and a 10x larger increase than pixel fill. I'm going to wager the AMD engineers had pretty good reasons for their design choices in this regard.
ninelven is offline   Reply With Quote