PDA

View Full Version : Amazing nv30 info from Nvidia


nv30fan
11-08-02, 07:24 AM
Check the follwing url :
http://developer.nvidia.com/view.asp?IO=ArtFutura_Conf

You can grab two ppt files describing in depth some amazing features of the nv30 !

You can also look at :
http://developer.nvidia.com/view.asp?IO=IndieGamesConf

These informations confirm the power of Cg + Nv30.

I wonder if we are gonna have a paper launch or not ?

As far as i know, no informations concerning an HLSL for R300
in opengl has been released from ATI !!!

Uttar
11-08-02, 08:55 AM
There's nothing amazing there.
Yes, i've read it all. 99% of it was already known and comes from other documents. However, one information is new. Can you find it? :)


Uttar

egdusp
11-08-02, 10:00 AM
@ Uttar (or with just 1 t ?):
Do u wanna tease? :)

Uttar
11-08-02, 10:24 AM
Originally posted by egdusp
@ Uttar (or with just 1 t ?):
Do u wanna tease? :)

It's just that i wonder if you people will be able to find it.
And really, I just don't feel like saying what I found in those documents ( actually, it's in the Spanish one )
And beside that, I think you'll be surprised once you'll have figured out what that means if you unite it with other documents.


Uttar

egdusp
11-08-02, 10:46 AM
Well, yeah I guess u wanna tease :)

By the way, I don't intend to search, cause right now I don't have the time. I'll be waiting till somebody tells me (even if it is NV at Comdex).

Philibob
11-08-02, 11:17 AM
Originally posted by Uttar
And really, I just don't feel like saying what I found in those documents ( actually, it's in the Spanish one )
Do I have to read the Spanish to find this out?
And beside that, I think you'll be surprised once you'll have figured out what that means if you unite it with other documents.

Good surprise or bad? :)

Uttar
11-08-02, 02:08 PM
It would have been a bad surprise.

However, I guess I spoke too fast. Good thing I didn't say what I found initially - would have made myself look stupid once the official clock rates would have been announced.

In the spanish document, you find the following quote:
"Hasta 200 M verticés/s"
And that means:
"Up to 200 M vertices/s"

And in a NV30 specification document, you find that the NV30 is, core for clock, 'over' 1.5x faster than the NV25 and 'over' 3x faster than the NV20.

Now, if you take the Ti4600 vertices/s ( which is what nVidia has taken for comparaison in the past ) which is 136 M/s and multiply that by 1.5...

You get 204 M/s - and the Ti4600 clock rate is 300Mhz
Conclusion: The NV30 clock rate would have been 300Mhz instead of the "over 400Mhz" estimate of other rumors.

But I realized something: that 200M vertices/s is speculative. When that was written, nVidia wasn't sure what the clock rate was going to be. So they put the minimum that could have happened to be sure not to give false hopes to developers.

Also, in yesterday investor conference, Jen Hsun Huang said that yields were higher than expected. As nVidia said before, it's yields which determine clock rates. So that means they'll certainly put clock rates higher than their initial expectations.


Uttar

DadGT
11-08-02, 02:21 PM
I was wondering if you meant:
"16 texturas activas, 8 grupos de coordenadas de texturas interpoladas, 2 texturas a máxima velocidad"

Which to me indicates that it will be an 8x2 pipeline. Seems all the rumors seem to be converging on this number.

Uttar
11-08-02, 02:56 PM
Oh, nice, didn't notice that one, DadGT.

BTW, found something else:
"Listas de indices diractemente en DMA"

Sounds like that's what the DMA patent I spotted earlier is about. Faster access to indices? Sounds like an idea, but I doubt that has ever been a bottleneck.

"Posibilidad de reiniciar cintas de triangulos con cualquier indice seleccionado"

That means:
"Possibility of reinitiating strips of triangles with no indice selected"

Now, that's a rather rough translation. From my understanding, that means the NV30 will be able to figure out vertices being reused without the programmer giving it such information.
That's a feature which simply doesn't exist in any current GPU AFAIK. No R300 document talks about such a thing.
I've got no idea if this has any real use... I guess we'll see.


Uttar

Nutty
11-08-02, 03:18 PM
Sounds more like being able to use the vertex cache on non-indexed vertices. Which although sounds easy, when you think about it, it isn't. And is precisely why on currently gpu's the vertex cache is only used in index mode.

Demirug
11-08-02, 07:42 PM
Originally posted by Uttar
Oh, nice, didn't notice that one, DadGT.

BTW, found something else:
"Listas de indices diractemente en DMA"

Sounds like that's what the DMA patent I spotted earlier is about. Faster access to indices? Sounds like an idea, but I doubt that has ever been a bottleneck.


Yes it looks like that NV30 can finaly store the indcies in the Videoram. The DMA patent is IMO not for graficchips. It look like it can be used for chipsets like nforce.


"Posibilidad de reiniciar cintas de triangulos con cualquier indice seleccionado"

That means:
"Possibility of reinitiating strips of triangles with no indice selected"

Now, that's a rather rough translation. From my understanding, that means the NV30 will be able to figure out vertices being reused without the programmer giving it such information.
That's a feature which simply doesn't exist in any current GPU AFAIK. No R300 document talks about such a thing.
I've got no idea if this has any real use... I guess we'll see.

Uttar

If I have understand this right this feature is not new. You can find it in the OpenGL for NV30 document. It simple allows to draw more than one stripeset in one call using an special indice value as separator.

P.S.: excuse my bad englisch it is not my first language.

Uttar
11-09-02, 04:48 AM
Originally posted by Demirug
Yes it looks like that NV30 can finaly store the indcies in the Videoram. The DMA patent is IMO not for graficchips. It look like it can be used for chipsets like nforce.

Hmm, makes sense. The nForce integrated graphics memory is the RAM. So I guess that's what the nForce 2 use...


Uttar

MikeC
11-09-02, 10:21 AM
Originally posted by Nutty
Sounds more like being able to use the vertex cache on non-indexed vertices.

Who defines that a vertex is to be indexed or not indexed? The developer?

Uttar
11-09-02, 10:40 AM
Originally posted by MikeC
Who defines that a vertex is to be indexed or not indexed? The developer?

Yep, it's the developer.
In 90% of cases, indexed primitives are better. So I kinda fail to understand why nVidia would even want to implement that. I guess we'll see that soon...


Uttar

MikeC
11-09-02, 02:46 PM
Originally posted by Uttar
Yep, it's the developer.
In 90% of cases, indexed primitives are better.Interesting. Forgive the questions as I'm getting back into OpenGL programming - again :)

So if each vertex is indexed, then each vertex has a unique identifier or name if you will.

From a database perspective, I understand that indexes are beneficial since they significantly reduce the amount of I/O associated when obtaining a record. On the other hand, an index requires additional storage, which needs to be reorganized on occasion.

Demirug
11-09-02, 03:11 PM
Originally posted by MikeC
Interesting. Forgive the questions as I'm getting back into OpenGL programming - again :)

So if each vertex is indexed, then each vertex has a unique identifier or name if you will.

From a database perspective, I understand that indexes are beneficial since they significantly reduce the amount of I/O associated when obtaining a record. On the other hand, an index requires additional storage, which needs to be reorganized on occasion.

if you work with indexed vertex data you have two lists. The first list contains the vertex data and noting more. The position in the list is equal with the indexvalue for this entry. The second list contains only indexvalues. If the chip now render the vertex it get the data for the next triangle from the indexlist and make a lookup in the vertexlist or the vertexcache if the vertex is allready calculated and still in the cache.

Hope anybody can understand this.

Uttar
11-09-02, 03:13 PM
Indexed primitives only got a disadvantage if a vast majority of the vertices are not the same. Indices simply store a number. And that number represents a vertex in a vertex buffer.

Of course, you can use indexed primitives for part of the scene and non-indexed ones for other parts of the scene.
When you use indices, your DrawIndexedPrimitive calls actually care about the number of indices you want to draw - not vertices.

Now, even if you would increase AGP usage and GPU memory bandwidth usage ( not sure on that one ) by indexing, if you do so, several caches are automatically used. Those CANNOT be used if nothing is indexed. And that's why indexed primitive performance is superior.

If Nutty is right, then this could be usefull in some some nearly worst-case scenarios. If Demiurg is right, err, I don't know. I didn't really look at the complete NV30 OpenGL document because I don't know OpenGL...


Uttar

Nutty
11-09-02, 03:27 PM
I doubt I am. To compare if 2 non indexed vertices are the same, you have to do (potentially) full floating point comparisons on all of the vertex attributes.

But thats just what it looked like.. but I'm prolly wrong. :)

MikeC
11-09-02, 03:37 PM
Originally posted by Demirug
if you work with indexed vertex data you have two lists. The first list contains the vertex data and noting more. The position in the list is equal with the indexvalue for this entry. The second list contains only indexvalues. If the chip now render the vertex it get the data for the next triangle from the indexlist and make a lookup in the vertexlist or the vertexcache if the vertex is allready calculated and still in the cache.

Hope anybody can understand this.

Thanks for the explanation. What kind of data is associated with a vertex? X, Y, and Z coordinates? Anything else?

Looks like I need to create a 3D technology forum :)

Demirug
11-09-02, 03:55 PM
Originally posted by MikeC
Thanks for the explanation. What kind of data is associated with a vertex? X, Y, and Z coordinates? Anything else?

Looks like I need to create a 3D technology forum :)

MikeC, on the input side a vertex contains up to 16 4D Vector. You can store anything you want in this array.

Examples are: coordinates, colors, normals, texturcoordinates.

On the outputside a vertex contains (Direct X D3D)
2 colorvalues (diffuse/specular) as 4d vector
1 fogvalue as scalar
position as 4d vector
1 point side as scalar
8 texture coordinates as 4d vector

at least the vertex programm have to set the position. All other values are options.

Demirug
11-09-02, 04:02 PM
Originally posted by Nutty
I doubt I am. To compare if 2 non indexed vertices are the same, you have to do (potentially) full floating point comparisons on all of the vertex attributes.

But thats just what it looked like.. but I'm prolly wrong. :)

Yes nutty it is very simple to do something like this. But I am asking me:

Why should NVIDIA say every developer that using indexed vertex is better and than implement a feature in the chip that work in the other direction?

IMO this is a waste of size on the die. Makeing the post vertexcache lager will give us more performancen.

Uttar
11-09-02, 04:04 PM
Okay, so just to add a little detail:
Since sending vertices which size are multiple of 32 bit because memory bus width is 32x4=128 ( and 64x4=256 on a R300 ), it may sometimes be useful to put useless components in the vertex. It may also be useful to compress components, but that's harder to achieve.


Uttar

Uttar
11-09-02, 04:05 PM
Originally posted by Demirug
Yes nutty it is very simple to do something like this. But I am asking me:

Why should NVIDIA say every developer that using indexed vertex is better and than implement a feature in the chip that work in the other direction?

IMO this is a waste of size on the die. Makeing the post vertexcache lager will give us more performancen.

Maybe the vertex cache will be significantly larger than NV25's and R300's one. And maybe they thinked that it would be wasted die space if it couldn't be used in all situations?


Uttar

Russell Klenk
11-09-02, 08:22 PM
Originally posted by Demirug
If I have understand this right this feature is not new. You can find it in the OpenGL for NV30 document. It simple allows to draw more than one stripeset in one call using an special indice value as separator.


I would tend to lean towards this as well. One of the major problems with using strips (in Direct3D with PCs at least) is that you can only render a single strip per DrawIndexedPrimitive call. DIP calls in D3D tend to be expensive, so unless you've combined all of your smaller strips into a single strip connected by degenrate triangles, you end up with a lot of DIP calls. Since strips can only give correct results when all triangles in the strip share the same material and smoothing group, you tend to end up with a lot of small strips. So, before this feature was added, you either ended up with a lot of degenerate tris (2 per strip) or a lot of DIP calls. Either way, you end up with sub-optimal performance, since IIRC degenerate tris don't get rejected until the triangle setup stage (though this isn't as big a performance hit as the extra DIP calls.) The XBOX and Dreamcast versions of Direct3D never had this problem, since the API already included this feature.