View Full Version : NV30 disadvantages
Hello everyone,
Since many people are saying:
"Hey, you find a billion problems with the R300, but none with the NV30" , I guess I'll fix that right now :)
Now, let's begin...
From that good ole NV30 vs R300 article at Beyond3D:
- the NV30 only got 16 temps for VS, while both the VS3.0 and R300 got 32 temps ( all higher than VS2.0 12 temps )
Note: Usefull for HLSL optimizations.
- The following R300 instructions are not available on the NV30 ( Everything can be done without them, but it requires more instructions -> slower ) : MADDX2 ( in the VS, reflection vector calculation helper ) , EXPE ( In the VS, fog computations ), CMP/CND ( In the PS, to compare things. Introduced in the R200. )
Note: the NV30 supports a lot of other instructions which can optimize other things. However, if a program uses things the R300 can optimize and not the NV30, the NV30 is obviously slower. Only time will tell which company's instructions will be the most useful.
- No MRT support it seems. Recent nVidia documents do not point to MRT, and I thus suppose it is indeed not supported. It could be emulated using pack/unpack, but I'd guess it wouldn't be as efficient.
Now, all of that information pretty much seems to be confirmed by previews & nVidia launch documents. Should I have made a mistake here, please feel free to correct me.
There's also that whole Displacement Mapping debate. While nothing has been confirmed, it would seem that the R300 and NV30 both decided not to support it ( why? ) - But we could all be mistaken by saying that, and I'm not even sure I'm up to date on that debate.
Now, I've seen a post by Bigus Dickus claiming that the NV30 also doesn't support programmable grid AA and FP cubemaps.
I'd guess that's correct, but I'm interested in knowing where he got that info. Link, please?
Finally, let's finish by saying that, in theory, the NV30's VS is less efficient than the R300's VS ( 0.7/clock instead of 1.0/clock for the R300 ) - but since the NV30 got a higher clock, it seems okay.
Now, if that 1 super-strong-VS architecture is only usefull in cases such as branching, and it doesn't help performance in other cases, then the R350 ( higher clocked R300? ) would obviously beat it in cases where the VS is the bottleneck.
Want to complete my list? Say something in my list is incorrect? Please do so, but bring a link with you :)
Uttar
I wrote out a nice long reply. But the nvnews server crapped out on me just as I hit submit.. and I lost it all.. I can't be arsed to type it all out again.. :(
I actually don't care that much about pixel shaders just give me high quality fsaa and aniso with great performance and you have my money.:cool:
I think nvidia made a nice move with adaptive aniso but having the same blurview tech is bad for my eyes!!!
:D
Originally posted by Fotis
I actually don't care that much about pixel shaders just give me high quality fsaa and aniso with great performance and you have my money.:cool:
I think nvidia made a nice move with adaptive aniso but having the same blurview tech is bad for my eyes!!!
What are you talking about, Fotis? :) I was talking about *Vertex* Shaders being potentially less efficient. Not about the Pixel Shaders :)
As for adaptive Aniso...
*enters the NV30 positive mode, fanatics do not complain I'm doing this on a NV30 disadvantage thread please*
Since the NV30 bottleneck certainly is a lot less the Pixel Shaders than the R300 ( Because the R300 got a faster clock for clock VS & more raw memory bandwidth, but the same clock for clock PS ) , you should expect the NV30 performance hit of Aniso in % to be lower than the R300's performance hit.
*back into NV30 negative mode*
As for FSAA, it's a shame nVidia still uses multisampling; It would have been great if it used the Wu Antialiasing-derived method I'm currently thinking of.
If you need some help figuring out how to implement it, feel free to call me, nVidia j/k ( Note that I do *not* work for nVidia. No such jokes, Kiler, please )
And no "But the performance hit is significantly lower than the R300's!" excuse - I want a better algorithm, not only a faster one!
Uttar
jbirney
11-23-02, 04:29 PM
Can the GF FX do Gamma Correct AA? I now it applies Gamma Correction but I have seen conflicting info on this. Also did you note that the NV30 still relies on OG sampling in some cases of AA? Not really worth aurging about but its a small diff.
Bigus Dickus
11-23-02, 04:30 PM
The programmable grid AA and FP cubemaps (as well as multiple render targets which you didn't mention) have come up in discussions at B3D. I'm not going to go digging though all the threads searching for the relevant discussions, but I will say that I don't think I've seen anything linked to "official" NV statements that they didn't support such things other than the AA methods.
An nVidia PR representative made an unambiguous statement that the AA modes on the NV30 were the same as the GF4, with the addition of the two new mixed SS/MS modes 6Xs and 8X (the first using a skewed grid for the SS portion - most likely - and the latter using ordered grids for both).
It also seems like the GF4's 4X MS is going to be ordered grid as well, but that isn't confirmed. The statement that the NV30 uses the same modes would indicate that, but then there was a comment made that modes other than 8x used a skewed grid... though it wasn't clear if that was referring only to the other mixed modes, or to all modes.
The interview B3D conducted contained many of these references Uttar, which you can find on their front page I believe.
Now, despite all the minor differences, advantages, and disadvantages between the NV30 and R300's pixel shading and vertex shading capabilities, there are really only two things about the NV30 that I am dissapointed about (well, three, but displacement mapping is in the same state on the R300 as well, so just two relative to it):
The first is that it only has 4 Z-sample units per pipe, meaning that it can only do a max of 4x multisampling. I don't understand why they didn't increase that. I was really expecting 8 Z-sample units, which would have exceeded the R300's 6. In combination with this is the possibility that 4X MS is ordered grid. Even if it is rotated grid, it doesn't seem possible that any of the NV30's AA modes (theoretically) can match the quality of the R300's 6X RGMS. I suppose it was a choice driven by the bandwidth available, since the R300 bogs down too much in many games with 6X AA enabled from what I understand. Perhaps 6X and 8X MS on the NV30 would just have been unuseable. I don't see though how 6Xs and 8X SS/MS will be any better in that regard though... :confused:
The second is the maximum 8X AF. In all likelihood, the NV30 will match or exceed the R300's aniso performance (on a percentage drop comparison, and likely on an absolute comparison as well) with the new "adaptive" mode (though all previous nV AF algorithms were adaptive as well :confused: ). The NV30 almost certainly has the speed to run 16X AF, so I don't know why they didn't remove this limitation.
Those two things are what really dissapoint me, especially after the "better pixels, not faster" PR hype. As it turns out, it will likely be faster pixels, not better. For all the pixel shading power the NV30 has, it will probably make no difference to the vast majority of games over the next two years. On the other hand, old school AA and AF would make a visible difference on all game, and why nVidia chose lower quality implementations of both of these will remain a mystery.
Of course, I look forward to screenshot comparisons, because theoretical comparisons don't always tell the true story. Perhaps the NV30 will still impress me.
Uttar, my post was not a remark to what you said about vertex shaders.
I read the interview at beyond3d and I can say I was very surprised at how nvidia completely ignored fsaa!!!:mad: What were they thinking?Small nimfs with 128bit color maybe!!:rolleyes:
gokickrocks
11-23-02, 05:18 PM
there are some instances that seem to suggest that ati uses a sparse grid...
http://www.beyond3d.com/forum/viewtopic.php?t=3233&postdays=0&postorder=asc&start=60
Uttar, GFFX has 32 temp registers for VS.
http://www.beyond3d.com/previews/nvidia/nv30gfx/index.php?p=3
unbiasedfool
11-23-02, 06:15 PM
Nope, it has 16 temp reg.
IF you guys read the nv_float_buffer spec, you'll see it only works with nv_texture_rectangle type textures. Although it does work with 3d textures. So you might be able to arrange the 6 slices of the cubemap as a 3d texture, and with some texture coord jiggery pokery access it.
The point about no floating point cubemaps is moot tho.. Floating point textures, dont have bilinear or trilinear filtering, and no mipmaps. So they're not much use for artwork. Floating point buffers are mainly for intermediate stages of lighting calculations.
Cubemaps are used for normalizing vectors on current hardware. On nv30 and R300, you dont need them, because you can normalize in the fragment processor, using the exact same method as normalizing in the vertex processor.
And yes it has 16 temp registers. But having more temp regs, wont offer any performance increase _if_ the shader can be implemented optimally using 16 or less. Remember that even the P4 now, still only has 6 32bit registers as the base integer register set. (Not including SSE or MMX crap). Glad to see Athlon 64 increasing the register set substantially, but you get the picture. Lots of general purpose registers isn't really necessary, if you make your shaders modular.
Originally posted by Bigus Dickus
The programmable grid AA and FP cubemaps (as well as multiple render targets which you didn't mention) have come up in discussions at B3D. I'm not going to go digging though all the threads searching for the relevant discussions, but I will say that I don't think I've seen anything linked to "official" NV statements that they didn't support such things other than the AA methods.
An nVidia PR representative made an unambiguous statement that the AA modes on the NV30 were the same as the GF4, with the addition of the two new mixed SS/MS modes 6Xs and 8X (the first using a skewed grid for the SS portion - most likely - and the latter using ordered grids for both).
It also seems like the GF4's 4X MS is going to be ordered grid as well, but that isn't confirmed. The statement that the NV30 uses the same modes would indicate that, but then there was a comment made that modes other than 8x used a skewed grid... though it wasn't clear if that was referring only to the other mixed modes, or to all modes.
The interview B3D conducted contained many of these references Uttar, which you can find on their front page I believe.
Now, despite all the minor differences, advantages, and disadvantages between the NV30 and R300's pixel shading and vertex shading capabilities, there are really only two things about the NV30 that I am dissapointed about (well, three, but displacement mapping is in the same state on the R300 as well, so just two relative to it):
The first is that it only has 4 Z-sample units per pipe, meaning that it can only do a max of 4x multisampling. I don't understand why they didn't increase that. I was really expecting 8 Z-sample units, which would have exceeded the R300's 6. In combination with this is the possibility that 4X MS is ordered grid. Even if it is rotated grid, it doesn't seem possible that any of the NV30's AA modes (theoretically) can match the quality of the R300's 6X RGMS. I suppose it was a choice driven by the bandwidth available, since the R300 bogs down too much in many games with 6X AA enabled from what I understand. Perhaps 6X and 8X MS on the NV30 would just have been unuseable. I don't see though how 6Xs and 8X SS/MS will be any better in that regard though... :confused:
The second is the maximum 8X AF. In all likelihood, the NV30 will match or exceed the R300's aniso performance (on a percentage drop comparison, and likely on an absolute comparison as well) with the new "adaptive" mode (though all previous nV AF algorithms were adaptive as well :confused: ). The NV30 almost certainly has the speed to run 16X AF, so I don't know why they didn't remove this limitation.
Those two things are what really dissapoint me, especially after the "better pixels, not faster" PR hype. As it turns out, it will likely be faster pixels, not better. For all the pixel shading power the NV30 has, it will probably make no difference to the vast majority of games over the next two years. On the other hand, old school AA and AF would make a visible difference on all game, and why nVidia chose lower quality implementations of both of these will remain a mystery.
Of course, I look forward to screenshot comparisons, because theoretical comparisons don't always tell the true story. Perhaps the NV30 will still impress me.
About 6xS & 8xS: Well, in theory, they're indeed slower than 6x and 8x.
However... With MSAA, the bottleneck rapidly becomes memory bandwidth. With SSAA, the bottleneck rapidly becomes fillrate, but memory bandwidth is affected just as much as with MSAA.
So, if nVidia found a way to use less memory bandwidth with SSAA than with MSAA, then fillrate would also be higher; but that wouldn't degrade performance too much because memory bandwidth remains the bottleneck.
I really don't think such a tech is even possible. But if nVidia did find such a solution, it would be impressive.
About Aniso: Yep, it's certainly sad there's no 16X Aniso. An explanation would be that their "amazing" adaptive algorithm gets a lot of bugs once you get to 16X Aniso, and they didn't have the time to fix it.
But that's just speculation.
PR "Better, not faster" hype: Well, I just think that PR is still kinda logical.
First of all, with very cheap Aniso, you should able to activate Aniso on ALL games - even new ones.
Secondly, the VS power of the NV30 is less, clock for clock, than the R300's clock for clock power, while the NV30 PS power, clock for clock, is higher than the R300's clock for clock power ( IF you consider all the new instructions for the PS in the NV30, which should be usefull in specific situations )
Also, I really don't think we can talk about AA before seeing the true performance hit of 8xS and 6xS - If they did find a way to do what I described above, we might all be very surprised. But that's unlikely :(
Temp registers: There only are 16 VS temps for the NV30. That document you pointed to says that too, Fotis. As Nutty explained, it doesn't give much performance benefits. However, if the VS program is 256 instruction long, then it might still offer a benefit to use 32 temps.
Uttar
StealthHawk
11-24-02, 07:51 AM
Originally posted by Uttar
About Aniso: Yep, it's certainly sad there's no 16X Aniso. An explanation would be that their "amazing" adaptive algorithm gets a lot of bugs once you get to 16X Aniso, and they didn't have the time to fix it.
But that's just speculation.
it's actually interesting, because i read on 3dgpu that even the gf3 was capable of 16x AF, but nvidia never enabled it in drivers.
Its funny, this talk about aniso filtering reminds me of when the GF3 came out, and people were asking, can you really see such a big difference between 4x and 8x? Well for me, I cant, and all I wish to be able to get from the GF-FX is to play all my games fast, @1280*1024 + 4xFSAA+4xaniso. If I get that on all games, average FPS higher than 50FPS I dont need anything else, anything else is a bonus! Is there not anyone else here that thinks 8x aniso and above is just a waste?
You, guys, are forgeting something a important disadvantage:
THE PRICE
Originally posted by Smokey
Its funny, this talk about aniso filtering reminds me of when the GF3 came out, and people were asking, can you really see such a big difference between 4x and 8x? Well for me, I cant, and all I wish to be able to get from the GF-FX is to play all my games fast, @1280*1024 + 4xFSAA+4xaniso. If I get that on all games, average FPS higher than 50FPS I dont need anything else, anything else is a bonus! Is there not anyone else here that thinks 8x aniso and above is just a waste?
I see very little difference between 4X and 8X - certainly not sufficent to justify the performance hit!
However, with the NV30 adaptive Aniso, quality of 8X Aniso will nearly certainly be lower than conservative 8X Aniso.
Personally, I don't like 1280x1024 - I generally use 1280x960
So, I'm going to consider 6X AA if the performance hit isn't too horrible - But we'll see that soon. If it is, I'll settle for 4XAA or consider switching to 1600x1200...
BTW, I sure hope you aren't hoping for an average FPS of 50 with Morrowind, hehe :)
Uttar
Bigus Dickus
11-24-02, 06:14 PM
In many screenshots I've seen, I can't see any difference between nVidia's 4X and 8X. However, there is an obvious difference between ATi's 4X and 8X, and between 8X and 16X. Perhaps you've never noticed simply because you've always had an nVidia card?
Do some searching on the forums and you'll see what I'm talking about. Some Detonator releases kept pushing 8X closer and closer to 4X until often there was no visible difference.
Personally, I think AF improves the image quality nearly as much as AA does.
Originally posted by Bigus Dickus
In many screenshots I've seen, I can't see any difference between nVidia's 4X and 8X. However, there is an obvious difference between ATi's 4X and 8X, and between 8X and 16X. Perhaps you've never noticed simply because you've always had an nVidia card?
Do some searching on the forums and you'll see what I'm talking about. Some Detonator releases kept pushing 8X closer and closer to 4X until often there was no visible difference.
Personally, I think AF improves the image quality nearly as much as AA does.
Well, with adaptive Aniso, it's quite normal the differences between modes will be more visible
Personally, I think 4X AA increases quality slightly more than 8X AF. It's an interesting claim you got there about nVidia reducing 8X quality.
My bet is that they've just progressively made their AF adaptive algorithm more agressive.
But then again, AF importance depends on the game. With Morrowind, jaggies in the terrain is very visible, so there, FSAA is *very* nice, and AF isn't as nice IMO.
Uttar
StealthHawk
11-24-02, 09:24 PM
Originally posted by Bigus Dickus
In many screenshots I've seen, I can't see any difference between nVidia's 4X and 8X. However, there is an obvious difference between ATi's 4X and 8X, and between 8X and 16X. Perhaps you've never noticed simply because you've always had an nVidia card?
Do some searching on the forums and you'll see what I'm talking about. Some Detonator releases kept pushing 8X closer and closer to 4X until often there was no visible difference.
Personally, I think AF improves the image quality nearly as much as AA does.
i've seen a visible difference between 4x and 8x on every driver revision i've used, up till 30.82. 30.82 being the latest driver i've tried.
of course i don't own a gf4, so nvidia may just be screwing with those. but they didn't touch anything for gf3.
Bigus Dickus
11-24-02, 11:49 PM
Originally posted by StealthHawk
i've seen a visible difference between 4x and 8x on every driver revision i've used, up till 30.82. 30.82 being the latest driver i've tried.
of course i don't own a gf4, so nvidia may just be screwing with those. but they didn't touch anything for gf3.
I thought the drivers affected the GF3 and GF4 in the same way. I'm almost sure they do, as I've seen the screenshots before (but can't seem to find them now). Since about a month after the GF4's release (when people screamed about the huge AF hit) 8X and 4X have looked essentially identical. I think that started with the 29.xx drivers, but could be mistaken.
StealthHawk
11-25-02, 02:34 AM
Originally posted by Bigus Dickus
I thought the drivers affected the GF3 and GF4 in the same way. I'm almost sure they do, as I've seen the screenshots before (but can't seem to find them now). Since about a month after the GF4's release (when people screamed about the huge AF hit) 8X and 4X have looked essentially identical. I think that started with the 29.xx drivers, but could be mistaken.
yes, i've seen such alledged screenshots. i do not suffer from the same "problems" that they do with my gf3, and that is the bottom line.
i believe it was Sharkfood that first brought this up, and quite frankly, i couldn't tell the difference between his screenshots either. i'm not sure there is a difference, if you get my drift.
which means these are the possible scenarios:
1) he's a liar
2) he installed the drivers wrong
3) his hardware is busted
4) the gf4 users got screwed by nvidia //this made under the assumption that Shark used a gf4, my memory tells me this is so.
tazdevl
11-25-02, 12:43 PM
Interesting info guys... the one thing I don't see mentioned as a disadvantage is the estimated price ~$500.
I don't care about .13 (that's nVIDIA's cost to bear not mine), DDRII etc... Given how quickly technology changes, current economic conditions and the (at this point) slightly increased performance differential with the 9700, I think a lot of folks are going to have problems justifying a 5800 Ultra.
tamattack
11-27-02, 05:26 PM
A couple of other, minor, disadvantages:
1. lose a PCI slot.
2. not as groundbreaking vs. R300 as R300 was vs. GF4Ti.
3. still no reviews.
4. still can't get one.
tamattack
11-27-02, 05:27 PM
Originally posted by Bigus Dickus
I thought the drivers affected the GF3 and GF4 in the same way. I'm almost sure they do, as I've seen the screenshots before (but can't seem to find them now). Since about a month after the GF4's release (when people screamed about the huge AF hit) 8X and 4X have looked essentially identical. I think that started with the 29.xx drivers, but could be mistaken.
I thought it started with the 40.xx drivers?
StealthHawk
11-27-02, 06:41 PM
Originally posted by tamattack
I thought it started with the 40.xx drivers?
no, we heard about this "issue" if there is one(some people seem to have problems with their gf4, some don't. i don't recall anyone having problems with a gf3), sometime months ago, with the high 20.xx drivers.
vBulletin® v3.7.1, Copyright ©2000-2012, Jelsoft Enterprises Ltd.