PDA

View Full Version : Technical details about the 5700


msxyz
09-08-04, 12:15 PM
I am curious.... Didi NVidia release some technical papers (no marketing BS) about the internal layout of the NV36, its block diagram or such ? I was looking for some in-depth info but without luck, so far. Are they available to the public ?

Thanks

Daneel Olivaw
09-08-04, 12:31 PM
all I know is its a 4x1 with 128bit ram.

msxyz
09-08-04, 06:03 PM
It's not even a true 4x1 but rather a 2x2 with limited capacity of writing 4 pixels at once in certain situations and synthetic benchmarks tend to prove this. But I was looking for something more in depth.

Daneel Olivaw
09-09-04, 09:11 AM
I'd like some facts to support it's a 2x2... I'm not convinced.

msxyz
09-09-04, 12:40 PM
Benchmark Date/Time : 09/09/2004 17.27.46

System Information
-----------------------------------------------------------
CPU : Intel(R) Celeron(TM) CPU
GFX : NVIDIA GeForce FX 5700 Ultra
OS : Microsoft Windows XP

Benchmark Result
-----------------------------------------------------------
FrameBuffer Clear : N/A
Color Fill : N/A
Z Fill : 1689,846 M-Pixel/s
Color + Z Fill : N/A
Single Texture : 1105,056 M-Texel/s
Dual Textures : 1579,745 M-Texel/s
Triple Textures : 1141,309 M-Texel/s
Quad Textures : 1032,192 M-Texel/s
1 Floating Poing Texture : N/A
Render to Self : N/A
PS 1.1 Simple : 877,8547 M-Pixel/s
PS 1.4 Simple : 876,8717 M-Pixel/s
PS 2.0 Simple : 875,8886 M-Pixel/s
PS 2.0 PP Simple : 875,8886 M-Pixel/s

Benchmark was done at 640x480x16 @ 60Hz + Z16 to minimize the impact on bandwidth. The card was clocked at 450MHz core / RAM (DDR2).

Judging by the figures:
The card can write up to ~4 Z/Stenicl values per clock (3.75 measured)
The card can output more than two pixel per clock when working in single texturing mode (2.45 measured)
The card works like a plain 2x2 configuration with an odd number of textures (Look at the "triple texture" fillrate: 1141 Mtex means .845 pixel/clock like you would expect from a fixed configuration 2x2 architecture)
The card has only two pixel shader enabled pipelines for a peak throughout of ~2 pixel per clock (1.94 measured)

Conclusion: Nv36 has a strange hybrid architecture: either only half of its pipelines are "fully fledged" meaning that they have loopback capabilities and programmable ALUs (shaders) or the card has only two pipelines but their output stage (ROPs) can write 2 Z/color values in certain situations.

This is puzzling me. That's why I asked help from the resident experts. :)

ChrisRay
09-09-04, 06:26 PM
In my experience its fillrate seemed more in Line with a 2x2 card,But its pipeline structure could even be similar to a 6600 line and have 4 ROPS, But only capable of doing 2 pixels per clock and requiring a loopback for 4 pixels/textures.

ragejg
09-09-04, 06:35 PM
In my experience its fillrate seemed more in Line with a 2x2 card,But its pipeline structure could even be similar to a 6600 line and have 4 ROPS, But only capable of doing 2 pixels per clock and requiring a loopback for 4 pixels/textures.

...plus no fragment crossbar!! :p

Demirug
09-10-04, 02:35 AM
NV36 is a 2*2 Chip that can work like a 4*1 in cases were the loopback is not needed.

There are primary two conditions to make this possible.

1. On the outputside from the pixelprocessor to the ROPs you need 2 datachannel for each pixel.

2. The trisetup need the ability to split one texturcoordinate in two.

If you run this mode the pixelprocessor will fetch values for 2 pixel with TMU 1 and additional 2 pixelvalues with TMU 2. The Trisetup have to modify the second texturecoordinate with the offset for the second pixelblock. After the fetch you have for each "pixel" 2 values in the pixelprocessor. Both values are transfered to the ROP and interpreted as two pixel.

Demirug
09-10-04, 02:39 AM
In my experience its fillrate seemed more in Line with a 2x2 card,But its pipeline structure could even be similar to a 6600 line and have 4 ROPS, But only capable of doing 2 pixels per clock and requiring a loopback for 4 pixels/textures.

If the structure is like a 6600 you will get an other "Triple Textures" fillrate.

msxyz
09-10-04, 03:55 AM
Thanks for the info. :)

I wonder why NVidia decided to add this complexity to their midrange cards. Correct me if I am wrong, but the NV30/35 never output 8 pixel values at once, only 8 Z/stencil. And, considering the limit of this particular setup (no loopbacks, single texturing and no pixel shaders) I wonder if there are REAL situations (aside synthetic benchmarks) were the ability to write 2 pixels at once is used. Dropping this feature would have saved a few transistors without affecting too much the performance in real world apps.

Demirug
09-10-04, 05:02 AM
Thanks for the info. :)

I wonder why NVidia decided to add this complexity to their midrange cards. Correct me if I am wrong, but the NV30/35 never output 8 pixel values at once, only 8 Z/stencil. And, considering the limit of this particular setup (no loopbacks, single texturing and no pixel shaders) I wonder if there are REAL situations (aside synthetic benchmarks) were the ability to write 2 pixels at once is used. Dropping this feature would have saved a few transistors without affecting too much the performance in real world apps.

IMHO nVidia reuse the ROP Design (and everything behind) from NV30. In this case the ROP allready can work with 4 pixel/clock.

The ability to write 4 (simple) pixel per clock is not very usefull for games at all. But In the case of CAD applications it should help much more.

msxyz
09-10-04, 07:18 AM
I believe you are right. :)

But Nvidia should have dropped the buffer compression technology (as they did for the FX5200) and integrate a true 4x1 pipeline design instead. It would have helped a lot in shading intensive apps as well as improving the overall efficiency of the card.

The 14.4 GB/s bandwidth of the Ultra is a little overkill for a 2x2 card running at almost the same speed of the memory. The GeForces 3/4, in comparsion, had twice the pipelines and they were not as efficient with AA/Aniso as the NV3x line is (even the tiny 5200U manages to fill the gap with GeForce4Ti when AA/Aniso are enabled).

How ironic is that the new GeForce 6600GT has almost the same bandwidth of my FX5700U yet it manages to outperform even previous generation high end cards. Honestly, if my old faitfhful GeForce4 hadn't died, I would have waited a few months more before upgrading. ;)

mariuz
09-17-04, 06:36 AM
I believe you are right. :)

But Nvidia should have dropped the buffer compression technology (as they did for the FX5200) and integrate a true 4x1 pipeline design instead. It would have helped a lot in shading intensive apps as well as improving the overall efficiency of the card.

The 14.4 GB/s bandwidth of the Ultra is a little overkill for a 2x2 card running at almost the same speed of the memory. The GeForces 3/4, in comparsion, had twice the pipelines and they were not as efficient with AA/Aniso as the NV3x line is (even the tiny 5200U manages to fill the gap with GeForce4Ti when AA/Aniso are enabled).

How ironic is that the new GeForce 6600GT has almost the same bandwidth of my FX5700U yet it manages to outperform even previous generation high end cards. Honestly, if my old faitfhful GeForce4 hadn't died, I would have waited a few months more before upgrading. ;)

5700 is nice card (from the above details) , my mx440 is dead (when working in 3d) and thinking on replacing it with new card (and 5700 fit my budget)

ps: Need it for dual monitor setup too

Fahim
09-17-04, 07:41 AM
Check here for some more info:

Digit-Life (http://www.digit-life.com/articles2/gffx/nv38-36.html/nv38-36.html)