View Single Post
Old 09-26-02, 10:18 AM   #37
Registered User
Uttar's Avatar
Join Date: Aug 2002
Posts: 1,354
Send a message via AIM to Uttar Send a message via Yahoo to Uttar

Okay, since multiple people replied since i asked for info about R300 branching, here's a disclaimer:

As of the time of this writing, reviews and papers seem to indicate R300 branching will be STATIC and NV30 branching be will be DYNAMIC.
However, no reliable source is certain of that. And thus, if this is not the case, the following information is NOT of any use and should be ignored. Thank you.

With my original post, i supposed both NV30 and R300 branching was dynamic, but NV30 had more loops than R300.
According to multiple people around here, this is not the case and the NV30 doesn't have a signifiant increase in instruction and loop number compared to the NV30.

Thus, since i want to make those "NV30 will be better than R300, but you're exagerating it" unable to annoy me, i'm going to suppose the following is true:
R300 (static) instructions: 255
R300 loop number: 255
NV30 (static) instructions: 256
NV30 loop number: 256

Okay, so let's begin...

In DirectX ( and OpenGL too, i assume, but i am not familiar with that API ) , you use DrawIndexedPrimitive (DIP ) to draw polygons.
According to nVidia, a good target is maximum 500 DIP calls, but less is better.

As i said before, several things make you required to use more than 1 DIP call in the entire program:
Texture changes, VS/PS changes, render state changes, and so on.

nVidia is giving an EXCELLENT explanation of what Dynamic Branching has that makes Static Branching look truly inferior in many cases.

And that's Matrix Palette Skinning. I won't explain what the use is exactly, but basically, instead of doing a lot of stuff on the CPU and sending new Vertex/Index Buffer data every frame, you keep a static VB/IB ( faster ) and you do all the work on the GPU ( which is faster if the CPU is the bottleneck, slower if it isn't - according to nV, 80% of games are CPU limited )

Note that X is equal everywhere

With DX8 ( NV20/R200 ), the following happens:

1. Select Vertex Buffer/Index Buffer with X Bytes/Vertex
2. Select Vertex Shader 1
3. DIP
4. Select Vertex Shader 2
5. DIP
6. Select Vertex Shader 3
7. DIP
8. Select Vertex Shader 4
9. DIP

With the R300, the following happens:

1. Select Vertex Buffer/Index Buffer with X Bytes/Vertex
2. Select Vertex Shader
3. Select Branching 1
4. DIP
5. Select Branching 2
6. DIP
7. Select Branching 3
8. DIP
9. Select Branching 4
10. DIP

However, note that Selecting a VS is slower than selecting a branching, so it's still more efficient than NV20 style even if it seems like more is done.

With the NV30, the following happens:

1. Select Vertex Buffer/Index Buffer with *more* ( not a lot more, however ) than X Bytes/Vertex
2. Select Vertex Shader
3. DIP

Need i say more? Well, actually, yes. I got the explain why it takes MORE than X Bytex/Vertex and why that isn't a problem at all.

You see, with the NV20/R200/R300, branching is done per-object.
But with the NV30, branching can be done per-vertex.

So, every vertex got to have more information than before, thus taking more bytes.

But then, why isn't that important? Want it or not... AGP 8X
Yep. An increase in AGP power which actually got a reason to exist.

You see, information is being sent to the GPU with AGP ( or PCI, but certainly not in NV30 case ). The NV28 using AGP 8X is completely insane - it'll never use that much power. It'll hardly even be able to process all that information!

The NV30, on the other hand, with per-vertex branching, could really begin to have some use for AGP! ( Wow... After all those years. *grins* )

If you want any more information or got a proof the R300 got dynamic branching too ( I *am* certain the NV30 got it, but i'm not sure about the R300 ) , just say it - always nice to see people actually reading what i write.

Uttar is offline   Reply With Quote