Originally Posted by ninelven
Although Fermi has 48 ROPs, it can only put out 2 pixels per SM per clock, which equals 32 (16 SM x 2 = 32PPC). The stock clock of the GTX 580 is 772 vs 925 for the 7970, which makes the 7970 20% faster. The 7970 also has a 37% bandwidth advantage over the GTX 580. Looking at the numbers from your first graph 13.33 / 9.75 = 1.37 or a 37% advantage for the 7970 indicating that bandwidth is the limiting factor and exactly what one would expect.
The main point though is that the Rops, and texture units still need to be enhanced not just to match the HD7970, but to beat it in order to have a faster card, so adding more of the same type found in Fermi takes up precious die space where they need a fair chunk of room to also enhance it's GP-GPU ability in single and double floating point math to be better than the HD7970 in that area too.
Memory bandwith wise, given that both companies are limited to GDDR5 and it's close to it's limit in terms of maximum clock speeds, the only way to do that is add a 512 bit memory bus, wich means adding 2 more memory controlers in the GPU die too(8 in total since each is usually 64 bits wide), wich will also takes up die space and need more pins on the GPU packaging and a more complex PCB, and even then it gives it a 25% improvement in memory bandwith compared to the 384 bit bus on the HD7970, with the GDDR memory running at the same clock speeds on both cards...