Originally Posted by ChrisRay
Except. Memory bandwith doesnt scale well with the GTX 280. You can add loads of bandwith and get nearly nothing. If the card was extremely bandwith limited. Scaling Core/ROP domains would not increase performance the way it does. Yet it does. Pixel and texture fillrate bottlenecks simply arent constrained by bandwith on the GTX 280 ((or 260 for that matter)).
Do some experiments yourself with Anti aliasing ((the most heavily zfill/bandwith limited tests there)). And watch as you gain more per clock than you do per bandwith. You reach a certain point where bandwith becomes a limiting factor on the texel and pixel fillrates. But currently these bandwith limitations are not that reachable. Since core clocks continue to be the best scaler of performance.
Next experiment with a low end card with little bandwith but high core clocks. You will get the exact opposite performance gains by increasing the bandwith verses the core clocks. Bandwith is only extremely useful if you have the fillrate to make it happen. As I said. Most of the GTX 280's bandwith is wasted. The improvements come with increased pixel/zfill fillrate than actual bandwith. This is why its so easy for GTX 260 cards to hang with GTX 280 cards with some core clock adjustments despite the bandwith disadvantage.
Good reply Chris.
If you raise the core/ROPs clock, the ROPs will be able to blend more fragments per clock, and the texture units will request more texels from memory. If your memory isn't fast enough, there will be stalls. Like you said with the example of the video card with slow memory.
You said that the GTX280 isn't bandwidth starved, but, this might be true in some cases, and false in others. I think that you are only considering new games like Crysis, where the shading power is the limiting factor. My example: Take an old game, with simple shader programs, use SSAA, high AF level, and check the scaling with different resolutions. There is a moment, in which the SPs generate so many fragments, that the ROPs & memory bandwidth can't process without delays. The ROPs might be able to blend the fragments, but if your memory isn't fast enough, they won't be written to the buffers without delays.
During the last 2 years, i have been doing custom experiments with different cards, resolutions, AA modes, ... on the coding side. Some nice guys at B3D have been helping me to analyze the performance in my projects, and this was the result of the analysis:
When using normal textures, and old game like BR2 depends a 42% of the memory bandwidth, a 44% of the core, and a 12% of the shaders. But, when i use high resolution textures (1024x1024 & 2048x2048), the memory bandwidth is a 70% of your framerate, the core a 30%, and the shading power becomes useless (because it cannot be feeded with the data that it needs).
I had to use texture compression to save the day, because, the frame rate was horrible in the beginning. Texture compression uses 1/8 or 1/4 of the real bandwidth. The change is huge.
Check it yourself Chris, i just wrote a little howto:
Try to run my work at 1920x1200 with SSAA 2x2, and then we talk