Uttar
02-15-03, 03:42 PM
Hey everyone,
After seeing a lot of NV35 rumors and stuff, I was wondering what most think it'll deliver. And I've also wondered what I really think it'll deliver, too. So, I'd like to have some feedback as to what you expect out of it, too :)
nVidia seems to be sleeping. They seem to be focusing on their mainstream parts ( NV31 and NV34 ) - but are they, really? Well, if you'd seriously think that, you'd be dead wrong.
nVidia has three design teams for its GPUs. Each works in parallel, and everything is scheduled in the goal of having a product out every 6 months. If one team is delayed, the others aren't. That's the beauty of the system.
The first team worked on the NV30, and now works on the NV40
The second team works on the NV31&NV34
The third teams works on the NV35&NV36 ( although they won't be released at the same time )
Efficient, eh? That's the power of nVidia: parallism. They put it in their GPUs, in their motherboards, and even in their job methodology! If one of their engineer said he was against parallelism, I'd bet big time he'd be fired on the spot :)
But back to the topic... The NV35.
There are several things we know about the NV35. Those things have been confirmed by numerous sources on forums. They are:
- Low K process
- 256-bit memory bus
- GDDR-2 ( -> 30GB/s+ bandwidth )
That's already tasty stuff. But, is there more? That's a good question. And we've got to think about what nVidia priorities are, if we want to figure it out. Thus, what follows is speculation. But I'll give a pessimist, a conservative and an optimist version, because it's really hard to be objective with speculation.
What was the main problem with the original product? Well, reviews certainly weren't kind. The main problems? Bad IQ with Aggressive AF, forcing you to use balanced AF. And lower AA quality, still giving lower overall quality even with balanced AF.
To fix the AF problem, the obvious solution is to increase the quality and performance hit of the aggressive algorithm ( or evantually keep the algorithm, and add another one calling it something else ) - fundamentally different algorithms are unlikely to be seen. 16x is also possible, but unlikely.
Thus, for AF, the following is conservative: revised algorithm, with better quality ( comparable to ATI's Performance ) but significantly lower performance.
The following is optimist: revised algorithm, with better quality ( comparable to ATI's performance ) but slightly lower performance. Support for 16x Aniso
And the following is pessimist: nothing new
For the AA problem, it's however slightly more complex. There are, really, two problems. The first one are the worse sampling patterns. The second one is that the maximum true MSAA mode is 4x.
To fully fix that, you'd thus need to use rotated patterns for every mode and have a higher native full MSAA mode. But that also means a higher maximum color compression ratio...
Thus, for AA, the following is conservative: Native 6x AA mode, rotated sampling patterns which are roughly comparable ( sometimes worse ) to ATI patterns
The following is optimist: Native 8x AA mode, rotated sampling patterns which are roughly comparable (sometimes better ) to ATI patterns.
The following is pessimist: Either native 6x AA mode, or rotated sampling patterns which are roughly comparable to ATI patterns
And, what other problems was there? Oh, yes, much lower than theorical fillrate. I really don't think it's possible to predict anything about that. It could come from drivers, from minor hardware bugs, or major design problems. It doesn't make sense to speculate on that without ANY info at all...
On the features front, little is to be expected. CineFX is already very flexible, and more would kinda be overkill for now.
Now, as to what influences performance... The first thing is obviously pipeline number. This will remain identical to the NV30, with some slight optimizations to things like scheduling maybe. Something else is vertex shading per-clock efficiency. Both number of units ( which is still unknown for the NV30 ) and optimizations got to be counted for that. Here's the speculation:
conservative: 5% higher per-clock than NV30
optimist: 20% higher per-clock than NV30
pessimist: same per-clock as NV30
What's left? Clock speeds for the high-end part... Well that's hard to guess. Really hard. The only likely thing is that they aren't trying to get a 1:1 ratio. So... let's get right to the numbers!
Conservative: 550/500
Optimist: 625/600
Pessimist: 500/450
That's great. But it's all a bunch of theorical figures and stuff. What about practical performance figures?
The NV35 true power is with 6x AA and 8x Aniso, with FP32 being used in places needing that kind of precision. Its quality is probably comparable with the R350 using 6x AA, 8x Performance Aniso and FP24 everywhere.
Here is what I expect, in terms of overall performance, for the NV35:
Conservative: 30% faster than the R350
Optimist: 50% faster than the R350
Pessimist: On-par with the R350
My overall guess, right now, is 35% faster. Obviously, when compared to the R400, it should be a quite different matter...
Uttar
After seeing a lot of NV35 rumors and stuff, I was wondering what most think it'll deliver. And I've also wondered what I really think it'll deliver, too. So, I'd like to have some feedback as to what you expect out of it, too :)
nVidia seems to be sleeping. They seem to be focusing on their mainstream parts ( NV31 and NV34 ) - but are they, really? Well, if you'd seriously think that, you'd be dead wrong.
nVidia has three design teams for its GPUs. Each works in parallel, and everything is scheduled in the goal of having a product out every 6 months. If one team is delayed, the others aren't. That's the beauty of the system.
The first team worked on the NV30, and now works on the NV40
The second team works on the NV31&NV34
The third teams works on the NV35&NV36 ( although they won't be released at the same time )
Efficient, eh? That's the power of nVidia: parallism. They put it in their GPUs, in their motherboards, and even in their job methodology! If one of their engineer said he was against parallelism, I'd bet big time he'd be fired on the spot :)
But back to the topic... The NV35.
There are several things we know about the NV35. Those things have been confirmed by numerous sources on forums. They are:
- Low K process
- 256-bit memory bus
- GDDR-2 ( -> 30GB/s+ bandwidth )
That's already tasty stuff. But, is there more? That's a good question. And we've got to think about what nVidia priorities are, if we want to figure it out. Thus, what follows is speculation. But I'll give a pessimist, a conservative and an optimist version, because it's really hard to be objective with speculation.
What was the main problem with the original product? Well, reviews certainly weren't kind. The main problems? Bad IQ with Aggressive AF, forcing you to use balanced AF. And lower AA quality, still giving lower overall quality even with balanced AF.
To fix the AF problem, the obvious solution is to increase the quality and performance hit of the aggressive algorithm ( or evantually keep the algorithm, and add another one calling it something else ) - fundamentally different algorithms are unlikely to be seen. 16x is also possible, but unlikely.
Thus, for AF, the following is conservative: revised algorithm, with better quality ( comparable to ATI's Performance ) but significantly lower performance.
The following is optimist: revised algorithm, with better quality ( comparable to ATI's performance ) but slightly lower performance. Support for 16x Aniso
And the following is pessimist: nothing new
For the AA problem, it's however slightly more complex. There are, really, two problems. The first one are the worse sampling patterns. The second one is that the maximum true MSAA mode is 4x.
To fully fix that, you'd thus need to use rotated patterns for every mode and have a higher native full MSAA mode. But that also means a higher maximum color compression ratio...
Thus, for AA, the following is conservative: Native 6x AA mode, rotated sampling patterns which are roughly comparable ( sometimes worse ) to ATI patterns
The following is optimist: Native 8x AA mode, rotated sampling patterns which are roughly comparable (sometimes better ) to ATI patterns.
The following is pessimist: Either native 6x AA mode, or rotated sampling patterns which are roughly comparable to ATI patterns
And, what other problems was there? Oh, yes, much lower than theorical fillrate. I really don't think it's possible to predict anything about that. It could come from drivers, from minor hardware bugs, or major design problems. It doesn't make sense to speculate on that without ANY info at all...
On the features front, little is to be expected. CineFX is already very flexible, and more would kinda be overkill for now.
Now, as to what influences performance... The first thing is obviously pipeline number. This will remain identical to the NV30, with some slight optimizations to things like scheduling maybe. Something else is vertex shading per-clock efficiency. Both number of units ( which is still unknown for the NV30 ) and optimizations got to be counted for that. Here's the speculation:
conservative: 5% higher per-clock than NV30
optimist: 20% higher per-clock than NV30
pessimist: same per-clock as NV30
What's left? Clock speeds for the high-end part... Well that's hard to guess. Really hard. The only likely thing is that they aren't trying to get a 1:1 ratio. So... let's get right to the numbers!
Conservative: 550/500
Optimist: 625/600
Pessimist: 500/450
That's great. But it's all a bunch of theorical figures and stuff. What about practical performance figures?
The NV35 true power is with 6x AA and 8x Aniso, with FP32 being used in places needing that kind of precision. Its quality is probably comparable with the R350 using 6x AA, 8x Performance Aniso and FP24 everywhere.
Here is what I expect, in terms of overall performance, for the NV35:
Conservative: 30% faster than the R350
Optimist: 50% faster than the R350
Pessimist: On-par with the R350
My overall guess, right now, is 35% faster. Obviously, when compared to the R400, it should be a quite different matter...
Uttar