PDA

View Full Version : NV3x: VSA influences & strategical advantages


Pages : [1] 2

Uttar
12-24-02, 02:10 PM
Hey,

Finally finished my latest and longest speculation. Would have posted it here, but it's more than 15000 letters! And I don't really like the idea of making 2 posts for a single message...
So, I posted it at http://www.notforidiots.com/NV3x.html
Okay, so there's nothing on the site yet beside that. But it'll come. Evantually. And then I'll integrate that to the site. Evantually. :p

Anyway, any feedback on that whole crazy idea of mine?


Uttar

netape
12-24-02, 02:59 PM
Wow, that's like a bible! :eek:

Mod
12-24-02, 04:23 PM
"There are two advantages for gamers, and one disadvantage. First, the disadvantage: You're only going to get a radically new ..., while the small increases might sometimes not even exist."

How come there will be major performances improvement from now on than before, if increasing the number of transistors is more difficult ?

Well, there can be some performance improvement from architecture redisign, but optimum results are achieved fast. So I think it will be slower.

There will be big improvements every 18-24 months and several , but steady, improvements. Just what happens to companies that follow moore law like AMD and Intel

I agree with this reasoning "Funny way to do a merger! 3dfx had to defend nVidia, and nVidia ...so frequently."

But this competition won't make things goes faster because physics simply doesn't give it's secrets so easily to TSMC :D .

NVIDIA is going to diversify so much mainly because it's a much bigger company than ATI, so it's easier for than to hire new people, make research. It's not up to ATI powers to do so much things.

But both companies are lucky, because since their products must follow moore law from now on, so they have more time to mature their technologies, and so take more time to have more flexible policies. With this, they will have lareger profit margins.

Uttar
12-24-02, 07:42 PM
Originally posted by Mod
"There are two advantages for gamers, and one disadvantage. First, the disadvantage: You're only going to get a radically new ..., while the small increases might sometimes not even exist."

How come there will be major performances improvement from now on than before, if increasing the number of transistors is more difficult ?

Well, there can be some performance improvement from architecture redisign, but optimum results are achieved fast. So I think it will be slower.

There will be big improvements every 18-24 months and several , but steady, improvements. Just what happens to companies that follow moore law like AMD and Intel

Hmm, I guess I wasn't sufficently clear. Let me try to explain that again...

In the past, you faster freshes after 6 months. In the future, I think we aren't getting that as much. Why?
Because focus after 6 months will be on mainstream. In the past, mainstream after 6 months seemed way too fast.
However, after 12 months, we'll get good performance increases. First of all, because the process will have matured, giving higher possible frequencies and yields. Secondly, because there'll obviously be small optimizations to the design and faster memory with it. Sometimes even a new memory standard, something which can't be done using refreshes.

After 18 months, you probably aren't getting any faster parts either. Partly because there may be bigger focus on mainstream, but also because there's no more real headroom. The process has matured, so the only thing left is faster memory. And only increasing memory wouldn't increase overall performance.
So, what we might see then is a *very* low end solution. Something like a $69 GPU for OEMs. With a good brand, bu horrible performance.
And maybe a refresh with 5% faster clocks and 15% faster memory, if the design was memory limited. Else, a 5% increase really isn't worth it.

After 24 months, we'll get a new process. That means we'll then get a huge boost to performance. Depending on the modifcation importance, performance of the GPU might evantually to more than double to the NVx5. Or even triple compared to the NVx0.

I don't think we'll see that every time, but that's certainly what we'll see with 0.13->0.09.

But this competition won't make things goes faster because physics simply doesn't give it's secrets so easily to TSMC .

NVIDIA is going to diversify so much mainly because it's a much bigger company than ATI, so it's easier for than to hire new people, make research. It's not up to ATI powers to do so much things.

But both companies are lucky, because since their products must follow moore law from now on, so they have more time to mature their technologies, and so take more time to have more flexible policies. With this, they will have lareger profit margins.

Well, it certainly isn't going to evolve as fast as in the past. But let's face it: Moore's law is not sufficent anymore. David Kirk explained it very well in an ExtremeTech article.
Yes, die size is continue to become 2 times smaller about every 18 months. The only thing we've got to remember is that other things ( such as copper, Black Diamond, ... ) are there too. And that means, independently from the die size, frequencies double every 18 months too.
And that means performance quadruples less than every 18 months. So, X2 every 6 months is maybe too optimistic. But X1.44 performance every 6 months is still perfectly possible. And that's significantly more than Moore's law.

The problem with CPUs is that while they use the increasing frequencies, there are no real uses for the extra die size. So performance simply can't increase as fast as with GPUs.

But yes, GPUs have gone too fast those past years and it'll slow down a little.

Anyway, nVidia isn't supposed to hire as much in 2003 as before. Jen Hsun Huang said he's going to keep R&D in 2003, but he does not expect to increase it too much ( meaning they're still hiring, but they only hire real genuises ) - So, my theory actually justifies that decision, too, IMO.

Of course, both are going to get higher profit margins. Or rather, nVidia certainly will. And maybe not ATI. Because, right now, it sounds like they aren't adopting the same strategy. It sounds like they want to do a new architecture every 12 months. But after seeing what nVidia has prepared, maybe they'll decide to slow down a little and enjoy the margins :)


Uttar

Mod
12-25-02, 06:50 AM
"Well, it certainly isn't going to evolve as fast as in the past. But let's face it: Moore's law is not sufficent anymore. David Kirk explained it very well in an ExtremeTech article."

What article is this ?

"Yes, die size is continue to become 2 times smaller about every 18 months. The only thing we've got to remember is that other things ( such as copper, Black Diamond, ... ) are there too. And that means, independently from the die size, frequencies double every 18 months too."

The way you explained the oter factors that have performance hit show that moore law never sufficed. So I think it is not right to say "anymore". But this other things always had some relation to the size of the transistors. There is a limit for a give processor to have a performance hit after some frequency, due to a number of theorecal and practiral reasons etc.

Given this, I now that the increasing of transistors does not correspond to a linear incresing of performance, but there is a relation. For example in one generation from 60Mhz to 260Mhz with 1 million transistors, to other in which it can operate from 900 to 4Ghz with 50 million transistors.

So, either way, performance increasing will not go as fast as before, as we agree.

"The problem with CPUs is that while they use the increasing frequencies, there are no real uses for the extra die size. So performance simply can't increase as fast as with GPUs."

I didn't understand this. Do you mean that there are trash ( no real uses) parts in a CPU, and that they increase ther proportion in the chip with time ?


"but he does not expect to increase it too much ( meaning they're still hiring, but they only hire real genuises ) - So, my theory actually justifies that decision, too, IMO."

Why does that could meand hiring giniuses ? Couldn't it mean that they don't need to hire too much because there isn't that need to make a fast development in the NVxx, but rather, the few people they hire have the know-how to make delopment of new lines of products ?

With this, many people dedicated to the development ov NVxx would now be responsible on the development of this new lines of products, learning with this people nvidia hired. The most experitize people on the NVXX would be left development of NVXXX

nutball
12-25-02, 09:23 AM
Originally posted by Mod

"The problem with CPUs is that while they use the increasing frequencies, there are no real uses for the extra die size. So performance simply can't increase as fast as with GPUs."

I didn't understand this. Do you mean that there are trash ( no real uses) parts in a CPU, and that they increase ther proportion in the chip with time ?


The thing with CPUs is that the complexity which can usefully be achieved is fundamentally limited by the programs they execute.

In a single thread of program code there's only so much instruction-level parallelism, so adding additional functional units to the CPU core doesn't gain you any extra performance. (If you move to an simultaneous multi-threading model, you can usefully add more units, but not many more). Basically pumping up the clock-speed is the most effective way of increasing performance, not adding extra transistors.

So the computational cores of typical CPUs now are about as large and complex as they need to be.

Much of the additional space being made available by smaller manufacturing processes is now being dedictated to on-die L2 caches. The extra space is not dedicated to computation.

With GPUs it's different. There's a high degree of parallelism in rendering, so you can keep adding functional units (ie. pipelines, in the old way of think), and you will get a linear increase in performance (modulo having the memory bandwidth to keep the thing happy). I think this is partly why clock-speeds alone in GPUs aren't the be-all and end-all of the performance race.

Uttar
12-25-02, 04:02 PM
Well, if someone got a lot of know-how, he's a sort of genius anyway.
But my point was that their hiring criteria is going to rise, so we shouldn't hope to find a job there ;)

Here's the link for that ExtremeTech article:
http://www.extremetech.com/article2/0,3973,713591,00.asp


Uttar

Mod
12-25-02, 05:46 PM
Damn, you didn't notice a failure in my arguments (If I take yours as basis to it). GPUs weren't incresing their core frequencies . I mean TNT2 is 175 Mhz, NV30 is now 500 or 1 Ghz. That's not an expressive frequency increasing, although the number of transistors increased enourmously.

But from what you said, now they will increase that frequency at a good pace, so with this only you can expect a performance increasing, almost so fast as before, although architecture evolution will be limited by the number of transistors.

I think David Kirk is making abuse of language, naming different things with similar names. He's refering to Moore law as the performance increasing, and I know it by the increasing of number of transistors in a processor.

PS.: UTTAR, I think you are understimating people's capabilities and creativity (including your own !). :)

borntosoul
12-25-02, 09:24 PM
i think that gpu will still increase in performance just as they have in the past ,there will be many more new tech comin up that we dont know about yet ,so adding additional functional units to the CPU core doesn't gain you any extra performance. (If you move to an simultaneous multi-threading model, you can usefully add more units, but not many more). Basically pumping up the clock-speed is the most effective way of increasing performance, not adding extra transistors. read up on what intel is doing with the prescott ,cause it seems to me that they are going about things a little differently :)

StealthHawk
12-25-02, 09:51 PM
Originally posted by borntosoul
read up on what intel is doing with the prescott ,cause it seems to me that they are going about things a little differently :)

what things are they doing differently with prescott? i only know that it will be a die shrink and the FSB will be increased.

borntosoul
12-26-02, 01:19 AM
hmmmm @ stealth , i mean they wont just be going for super high clock speeds ,and they just dont add transistors just for the hell of it .

StealthHawk
12-26-02, 03:53 AM
Originally posted by borntosoul
hmmmm @ stealth , i mean they wont just be going for super high clock speeds ,and they just dont add transistors just for the hell of it .

the FSB always haas to be increased sometimes. otherwise the gain from increased MHz becomes smaller and smaller. bus speeds have been increased as a way to increase performance without super high clockspeeds for awhile now.

and of course it gives Intel more headroom incase they need more clockspeed. just ramp up the multiplier :)

Kruno
12-26-02, 04:47 AM
Nice article Uther the Lightbringer.
You said it would be long? To my standards that was very short. :p :)

Check out the R300 with a P4 4GHz, still cpu limited. :)

Uttar
12-26-02, 05:19 AM
Originally posted by Mod
I think David Kirk is making abuse of language, naming different things with similar names. He's refering to Moore law as the performance increasing, and I know it by the increasing of number of transistors in a processor.

PS.: UTTAR, I think you are understimating people's capabilities and creativity (including your own !). :)

Actually, I think that "abuse of language" has become an industry standard.
As soon as Intel realized they couldn't keep up with 2 times the number of transistors every 18 months, they had to replace transistors by performance.
Which means that nVidia won't beat the old Moore's law, but they'll beat the new, Intel Inside, one.
And after that, they say 3D Graphics is the art of cheating without getting caught. Eh, well, it sounds like it's also the art of Intel's Marketing!

Hmm, I did underestimate people's capabilities and creativity? Where?


Uttar

borntosoul
12-26-02, 05:40 AM
Check out the R300 with a P4 4GHz, still cpu limited. what r u saying ? that we dont need faster gpu's cause we need more than a p4 4 ghz to run the cards at full potential?

Kruno
12-26-02, 06:25 AM
Who said that? Who implied that? Who are you? :confused:

Originally posted by borntosoul
what r u saying ? that we dont need faster gpu's cause we need more than a p4 4 ghz to run the cards at full potential?

StealthHawk
12-26-02, 07:19 AM
Originally posted by borntosoul
what r u saying ? that we dont need faster gpu's cause we need more than a p4 4 ghz to run the cards at full potential?

no, he's probably just saying that we need faster CPUs to run the card at its full potential, with no strings attached. no hidden implications. and hell, it's true. i want to run my games at 640 and have them be GPU limited damnit!

how fast can you gooooooooooooo :p

tamattack
01-02-03, 02:32 PM
Very interesting speculation, Uttar.

Following your line of thought, though, it occurs to me that it may not be so simple as to just adding more dynamically allocated units to the pool.

While I agree that it is may be much easier to do so than to redesign the entire pipeline, I think there might be tradeoffs that you haven't considered.

Consider that adding additional units means increasing transistor count. And I could be wrong here, but when you increase transistor count doesn't it become more difficult to maintain clock speed (all else equal) unless you specifically consider clockspeed and redesign accordingly?

Also, there is still the issue of balancing size/speed of caches/registers to match the increased number of units. As well as potentially cache coherency issues.

Oh, and your example of diversification is pretty off-base. ATI currently has the more diversified product range of the two. NV only has discrete graphics, chipsets and console (X-Box), whereas ATI has discrete graphics, chipsets, console (GameCube), set-top box and PDA graphics.

All-in-all, an interesting read.

Uttar
01-02-03, 04:17 PM
Originally posted by tamattack
Following your line of thought, though, it occurs to me that it may not be so simple as to just adding more dynamically allocated units to the pool.

While I agree that it is may be much easier to do so than to redesign the entire pipeline, I think there might be tradeoffs that you haven't considered.

Consider that adding additional units means increasing transistor count. And I could be wrong here, but when you increase transistor count doesn't it become more difficult to maintain clock speed (all else equal) unless you specifically consider clockspeed and redesign accordingly?

Also, there is still the issue of balancing size/speed of caches/registers to match the increased number of units. As well as potentially cache coherency issues.


Yep, those issues certainly exist. That's why adding another VS to the NV20 to create the NV25 wasn't that easy.
However, while those issues still exist with the NV3x architecture, I think the cache problem is a lot less signifiant.
There is cache which get info from the VS and keep it until Triangle Setup is ready to retrieve it. The more VS units there are, the more such cache there is.
However, in the NV3x case, more units simply means it's going faster. Not that there are more Vertex Shading units. So that means such cache doesn't have to be added if you add execution units.

I couldn't comment too much on the other things, because so little has been said about how the GFFX VS really works.

Oh, and your example of diversification is pretty off-base. ATI currently has the more diversified product range of the two. NV only has discrete graphics, chipsets and console (X-Box), whereas ATI has discrete graphics, chipsets, console (GameCube), set-top box and PDA graphics.

All-in-all, an interesting read.

Damn, forgot ATI got set-top box and PDA graphics. Okay, so, err...
What about saying nVidia will have an advantage in future similar products, since they can put more engineers on it, and thus will be able to get an edge over ATI fairly rapidly? And anyway, I don't think ATI market penetration is too good in those areas ( not sure about that, however, I'd like to have some info about it )

BTW, thanks for reading it all and saying it was interesting :)


Uttar

tamattack
01-02-03, 05:05 PM
Originally posted by Uttar
However, in the NV3x case, more units simply means it's going faster. Not that there are more Vertex Shading units. So that means such cache doesn't have to be added if you add execution units.

Hrmmm... but then wouldn't each of the 'sub-units' need to be operating on something (ie: data)? That would still imply increases in temporary registers/caches to be able to supply such data to each 'sub-unit' for processing.

Originally posted by Uttar
Damn, forgot ATI got set-top box and PDA graphics. Okay, so, err...
What about saying nVidia will have an advantage in future similar products, since they can put more engineers on it, and thus will be able to get an edge over ATI fairly rapidly? And anyway, I don't think ATI market penetration is too good in those areas ( not sure about that, however, I'd like to have some info about it )

I don't have any specific figures on market penetration. ATI actually have a good foothold in set-top box (although that is more through acquisition than internal development), and their PDA chip (Imageon) is currently shipping in a Toshiba PocketPC.

Originally posted by Uttar
BTW, thanks for reading it all and saying it was interesting :)

You're welcome! :D

Uttar
01-02-03, 06:13 PM
Originally posted by tamattack
Hrmmm... but then wouldn't each of the 'sub-units' need to be operating on something (ie: data)? That would still imply increases in temporary registers/caches to be able to supply such data to each 'sub-unit' for processing.

Probably. But so little is known about how it truly works, so it's hard to say.
But even if it's the case, doubling execution unit number probably wouldn't increase how much cache is used more than doubling the number of pipelines. So, you've got one area with less additionnal transistors, and one with probably as much additional transistors.


Oh, and I just found an interesting quote which might mean that Fusion was single-chip:
http://www.beyond3d.com/interviews/tarolli/index.php?page=page3.inc
If multiple chips are cheaper or even slightly more expensive than a single chip, but offer higher performance, I see no reason why they should be less profitable. If you can reach higher levels of performance with multiple chips, you can usually extract higher margins in these products. You will see scalable solutions from 3dfx in the future. I'm not saying that every product will be scalable, just that we will continue to move forward with product scalability in mind.

Notice how he suddently says "scalable" instead of "multi-chip"?
That could mean Fusion was using the same idea as the NV3x: achieving scalability *without* multi-chip, by making the design more flexible. Sounds interesting! :D


Uttar

tamattack
01-03-03, 12:35 PM
Yes, better and faster graphics is always interesting, irregardless of the methods!

kingsna
01-05-03, 06:29 AM
if you know a thing or two about chip design/programming in general, you should know that as the complexity of the chip grows with each new architechture, you just can't sit down and play with different transistor configurations anymore because if you do, then by the time you finish the chip it's obsolete!

so what's the answer?
how can you add you add extra 100 milion or so transistors to a new chip in 2 years (not considering on-chip memory) when adding 50-60 milion of them took 2 years before?
how about adding extra 250 milion the next 2 years?

the answer is MANY Parallel general purpose functional units to the chip (more cpu like) so you make a few hundred fundemental processors (totalling about 50-60 milion transistors) and once they're done, you multiply them into the desired number of transistors each chip needs (more for high end, less for mainstream) which significantly reduces design times compared to the future designs using conventional design methods.

but as we know this strategy requires MAJOR architechtural changes from the past.

as we already know, the key to photorealistic rendering are PS, VS, TCL, and FSAA so in the future we'll see less FEATURES and more SPEED improvements and from now on we can focus on more speed for making more complex/detailed digital worlds rather than more realistic looking one because frankly at some point you can't get more realistic!!

so now that we have the basics program right, we can break it into milions of parts and make 2,4,8,... copy of every part and by using our PARALLEL(easily scalable) architechture, accelerate the final image 2,4,8,... times faster!

now with this i say that NVIDIA is on the right track and if they can make a good parallel architechture(INTEL haven't or couldn't do it so far!) they're gonna rule the 3D world for Many years to come!

nutball
01-06-03, 04:26 AM
Originally posted by kingsna
now with this i say that NVIDIA is on the right track and if they can make a good parallel architechture(INTEL haven't or couldn't do it so far!) they're gonna rule the 3D world for Many years to come!

Two things to say about this.

Firstly, NVIDIA have a much, much easier job making a relevant paralllel processor than Intel. If you've ever done any parallel programming this should be obvious.

Secondly, from what NVIDIA have said about the GFFX architecture, it's clear that the basic model you outline (a chip composed of N similar functional units) is already what they're doing. I would assume ATi are doing the same, though it's not so apparent.

In my view this approach doesn't represent a major architectural change from the past in the graphics market.

The architectural challenges facing NVIDIA are very different from the architectural challenges facing Intel.

rwolf
01-16-03, 12:59 AM
Originally posted by Uttar
Hey,

Finally finished my latest and longest speculation. Would have posted it here, but it's more than 15000 letters! And I don't really like the idea of making 2 posts for a single message...
So, I posted it at http://www.notforidiots.com/NV3x.html
Okay, so there's nothing on the site yet beside that. But it'll come. Evantually. And then I'll integrate that to the site. Evantually. :p

Anyway, any feedback on that whole crazy idea of mine?


Uttar

I am not sure what point you are trying to make here.

-The radeon 9000 was redesigned to save cost. The core is half the size of the 8500 with slightly less performance. I think this core is the same as the mobility chip as well (not 100% sure).

-I think you are right when you say that nvidia will probably keep this design and tweak it for the next two years. The Geforce 1 through 4 are pretty much the same chip except for the addition of pixel shaders, enhanced AA, and process tweaks. This allowed them to use the same drivers for years and gave them a good reputation for driver quality.

-ATI is a bigger company than nvidia.

ATI has more than 1,900 employees in the Americas, Europe and Asia. -- ATI's web site.

More than 1300 employees worldwide -- nvidia's web site

-Textures are still going to be critical and pixel shaders aren't going to be as beneficial as you might think.