View Full Version : Nvidia's GTX 295 pics.
ChrisRay
12-16-08, 03:14 PM
Strange... 240 stream processors and 896Mb of memory. More SPs is good, but it seems odd that it'd have the same amount of memory as the GTX 260 with the same SPs as the 280.
Looks like another card I'll definitely be skipping though. GX2 for the lose. :p
*This is a speculative mode and not denying or confirming anything*
This has moreto do with GPU PCBS than actual chip yields. My guess is the GTX 295 cards have fully functional core units. But the ROP/Memory partitions are disabled for PCB costs due to the 260 PCB costing alot less to make. I dont honestly believe the GTX 260 or GTX 280 is all that bandwith limited. Most of its extra performance comes from higher clocks/extra ROPS.
Chris
walterman
12-16-08, 04:15 PM
... I dont honestly believe the GTX 260 or GTX 280 is all that bandwith limited...
If you run your fav games in low res & with low level of AA (or no AA), yes.
ChrisRay
12-16-08, 05:17 PM
If you run your fav games in low res & with low level of AA (or no AA), yes.
Which I dont. Most of the disparities seen between the GTX 280/260 can be accounted for the additional pixel pipes and clocks. Bandwith does help a little. But I think you'd be surprised how much FPS bang you get per bandwith verses how much FPS bang you get with higher pixel fillrate. I do to this day believe the GTX 280 has more bandwith than it needs. This is why high core overclocks of the GTX 260 make it so easily caught up with the GTX 280.
CaptNKILL
12-16-08, 05:52 PM
Which I dont. Most of the disparities seen between the GTX 280/260 can be accounted for the additional pixel pipes and clocks. Bandwith does help a little. But I think you'd be surprised how much FPS bang you get per bandwith verses how much FPS bang you get with higher pixel fillrate. I do to this day believe the GTX 280 has more bandwith than it needs. This is why high core overclocks of the GTX 260 make it so easily caught up with the GTX 280.
I agree. As I've said several times before, pixel fill rate (and even memory bandwidth to an extent) hasn't gotten much attention since we started seeing unified shaders. Everyone talks about the SPs and the shader clock but a ton of fillrate is still going to make for a faster graphics card where as 10 billion shader processors is only going to do as much shader processing as is required.
We haven't yet reached the point where everything can be done with shaders. 1920x1200 with 8xAA in a game with extremely high res textures, foliage and a huge draw distance is still going to bring a huge fill rate load on the GPU, no matter how many SPs it has.
Just wondering but how so? That would be very interesting and enticing as multi monitor is one of SLI's biggest drawback.
but didnt 180.xx enable multi monitor?
Bman212121
12-16-08, 11:06 PM
Which I dont. Most of the disparities seen between the GTX 280/260 can be accounted for the additional pixel pipes and clocks. Bandwith does help a little. But I think you'd be surprised how much FPS bang you get per bandwith verses how much FPS bang you get with higher pixel fillrate. I do to this day believe the GTX 280 has more bandwith than it needs. This is why high core overclocks of the GTX 260 make it so easily caught up with the GTX 280.
Out of curiosity do you know how many layers the PCB is on the GTX 260 / 280?
ChrisRay
12-17-08, 01:31 AM
Out of curiosity do you know how many layers the PCB is on the GTX 260 / 280?
It depends on which versions you get. Honestly I dont know right offhand but some of the newer 260-216 55NM PCBS are simpler and I'd have to check on the 260-280. Cant comment on the possibly newer stuff yet.
walterman
12-17-08, 05:44 AM
Which I dont. Most of the disparities seen between the GTX 280/260 can be accounted for the additional pixel pipes and clocks. Bandwith does help a little. But I think you'd be surprised how much FPS bang you get per bandwith verses how much FPS bang you get with higher pixel fillrate. I do to this day believe the GTX 280 has more bandwith than it needs. This is why high core overclocks of the GTX 260 make it so easily caught up with the GTX 280.
The latest games are bounded by the arithmetic power (SPs), but, the old games that need less arithmetic clock cycles per pixel, became bounded by the bandwidth.
Once you have enough arithmetic power, bandwidth is the constraint.
High resolution textures do need a lot of bandwidth (ex: my BR2 patch uses texture compression to save bandwidth), high AF needs to read more texels from the textures, SSAA does not use buffer compression, pixel blending uses a lot of bandwidth (bloom effects), ...
High bandwidth guarantees that your arithmetic power (your SPs) will be feeded with all the data that it needs.
This is a problem for the CPUs with more than 8 cores, so, try to imagine the memory wall in a GPU with thousands of SPs:
http://arstechnica.com/news.ars/post/20081207-analysis-more-than-16-cores-may-well-be-pointless.html
Again, the memory bandwidth is like the gold ingots of a gfx card.
ChrisRay
12-17-08, 10:23 PM
Except. Memory bandwith doesnt scale well with the GTX 280. You can add loads of bandwith and get nearly nothing. If the card was extremely bandwith limited. Scaling Core/ROP domains would not increase performance the way it does. Yet it does. Pixel and texture fillrate bottlenecks simply arent constrained by bandwith on the GTX 280 ((or 260 for that matter)).
Do some experiments yourself with Anti aliasing ((the most heavily zfill/bandwith limited tests there)). And watch as you gain more per clock than you do per bandwith. You reach a certain point where bandwith becomes a limiting factor on the texel and pixel fillrates. But currently these bandwith limitations are not that reachable. Since core clocks continue to be the best scaler of performance.
Next experiment with a low end card with little bandwith but high core clocks. You will get the exact opposite performance gains by increasing the bandwith verses the core clocks. Bandwith is only extremely useful if you have the fillrate to make it happen. As I said. Most of the GTX 280's bandwith is wasted. The improvements come with increased pixel/zfill fillrate than actual bandwith. This is why its so easy for GTX 260 cards to hang with GTX 280 cards with some core clock adjustments despite the bandwith disadvantage.
Chris
Retrolock
12-18-08, 01:00 AM
I dont know if this is already posted but damn
http://www.guru3d.com/news/tomorrow-18-december-/
Guru3d is probably gonna do a review/test on GTX 295
I dont know if this is already posted but damn
http://www.guru3d.com/news/tomorrow-18-december-/
Guru3d is probably gonna do a review/test on GTX 295
Hey it's the 18th and no review yet... :bleh:
walterman
12-18-08, 09:54 AM
Except. Memory bandwith doesnt scale well with the GTX 280. You can add loads of bandwith and get nearly nothing. If the card was extremely bandwith limited. Scaling Core/ROP domains would not increase performance the way it does. Yet it does. Pixel and texture fillrate bottlenecks simply arent constrained by bandwith on the GTX 280 ((or 260 for that matter)).
Do some experiments yourself with Anti aliasing ((the most heavily zfill/bandwith limited tests there)). And watch as you gain more per clock than you do per bandwith. You reach a certain point where bandwith becomes a limiting factor on the texel and pixel fillrates. But currently these bandwith limitations are not that reachable. Since core clocks continue to be the best scaler of performance.
Next experiment with a low end card with little bandwith but high core clocks. You will get the exact opposite performance gains by increasing the bandwith verses the core clocks. Bandwith is only extremely useful if you have the fillrate to make it happen. As I said. Most of the GTX 280's bandwith is wasted. The improvements come with increased pixel/zfill fillrate than actual bandwith. This is why its so easy for GTX 260 cards to hang with GTX 280 cards with some core clock adjustments despite the bandwith disadvantage.
Chris
Good reply Chris.
If you raise the core/ROPs clock, the ROPs will be able to blend more fragments per clock, and the texture units will request more texels from memory. If your memory isn't fast enough, there will be stalls. Like you said with the example of the video card with slow memory.
You said that the GTX280 isn't bandwidth starved, but, this might be true in some cases, and false in others. I think that you are only considering new games like Crysis, where the shading power is the limiting factor. My example: Take an old game, with simple shader programs, use SSAA, high AF level, and check the scaling with different resolutions. There is a moment, in which the SPs generate so many fragments, that the ROPs & memory bandwidth can't process without delays. The ROPs might be able to blend the fragments, but if your memory isn't fast enough, they won't be written to the buffers without delays.
During the last 2 years, i have been doing custom experiments with different cards, resolutions, AA modes, ... on the coding side. Some nice guys at B3D have been helping me to analyze the performance in my projects, and this was the result of the analysis:
http://img255.imageshack.us/img255/821/bench4ih7.png
When using normal textures, and old game like BR2 depends a 42% of the memory bandwidth, a 44% of the core, and a 12% of the shaders. But, when i use high resolution textures (1024x1024 & 2048x2048), the memory bandwidth is a 70% of your framerate, the core a 30%, and the shading power becomes useless (because it cannot be feeded with the data that it needs).
I had to use texture compression to save the day, because, the frame rate was horrible in the beginning. Texture compression uses 1/8 or 1/4 of the real bandwidth. The change is huge.
Check it yourself Chris, i just wrote a little howto:
http://www.nvnews.net/vbulletin/showpost.php?p=1878768&postcount=159
Try to run my work at 1920x1200 with SSAA 2x2, and then we talk :)
vBulletin® v3.7.1, Copyright ©2000-2012, Jelsoft Enterprises Ltd.