PDA

View Full Version : New RV350 & R350 info...


Nebuchadnezzar
03-01-03, 01:44 PM
http://www.theinquirer.net/?article=8062

AS WE'VE WRITTEN earlier, ATI's 9600 (RV350) is expected to be introduced during the course of next week, while the Radeon 9800 Pro is also close to release.Let's recap on what we've got so far. The RV350 uses .13µ (micron) technology, uses two vertex pipes, with four pixel pipes (one TMU on each), while its core clock speed with be around 350MHz but there's room for that to grow.

The Radeon 9600 will have 64MB of 128-bit DDR memory, while the Radeon 9600 Pro will have higher clock speeds and quite possibly 128MB of memory.

These cards are much cheaper to produce than for the R300, we understand.

You won't need any additional power supply connection, and it looks like the RV350 will be pin compatible with R200, so use a much rejigged R2XX family deisgn.

As for performance, it appears that the 9600 Pro may only be between 20 per cent to 50 per cent ahead of the latest TI 4200 cards because of its inherent design.

The R350, if you recall, uses .15µ technology, has four vertex and eight pixel pipes with Smartshader 2.1 and Smoothvision 2.1. The core may be 375/350 or possible even 400/350.

Come April, we'll likely see the Radeon 9800 Pro in Europe, which is likely to give around a 20 per cent gain for modern games playing, and will include 256MB of 256-bit DDR memory.

ATI will call the extended set of DX9 features the DX9++, although we suppose it could add just as many ++++++ as it wanted to.
This includes floating point 3D textures, floating point cube maps, multiple render targets (up to 4), displacement mapping and n-patches. That was done in software for the R300 but we'll see what ATI comes up with time round the mulberry bush.
Nvidia should perhaps call its own DX9 extensions DX9## or DX9.NET.

The 350s have a very special F buffer for storing temporal pixel shader data, it allows for "virtually unlimited" pixel shader length, meaning full OpenGL 2.0 fragment shading and possibly also Renderman shaders, all in hardware.

Both have improved adaptive anisotropic filtering, improved memory controller, six times multisampling and up to 6:1 frame buffer lossless colour compression when in MSAA mode. Both have improved HyperZ III+ technology with 8x depth buffer compression.

This battle in spring will be an interesting one.

And PLEASE don't come with 'It's the Inquirer, they're all eating BS' comments :rolleyes:

volt
03-01-03, 02:15 PM
Sorry, but they are also known for stealing information from other sites :rolleyes:

Uttar
03-01-03, 02:56 PM
Yeah. This seems fairly reliable, I even rated its reliability as "Medium" at GPU:RW ( *cough* http://www.notforidiots.com/GPURW.php *cough* )

The Inquirer couldn't have invented the F Buffer stuff.
My question, however, is what's the performance hit with that F Buffer.
The R300 got 32 temps in the PS. So you'd have to store all of them.

Each of those 32 temps got 4 parts, and each of them is FP24.

So... 32*4*24 = 3072 bits :eek:

If ATI managed to give a more reliable performance hit to that, I'm very, very impressed. This thing would pretty much increase memory bandwidth costs by 1000%!

A way to implement this would be to have a very big cache. But that would cost about a billion transistors:rolleyes:
Even if you limited the whole thing to 100 pixels, it would be way too much! It would pretty much be like if you had 512 temps in the each PS!

So, how could ATI have managed this?
Well, first, by only storing what's going to be used later, you could easily reduce this to 2000 bits, maybe less.
However, if you actually do the program with that in mind, you might be able to store only 10 temps per pixel. That's 960 bits/pixel.
Thus, in optimal cases, it really ain't as horrible as it looks.

But wait! What if you need 10 passes?!
Or 1200 Bytes/pixel!

Considering 1152x864 @ 60FPS:
1200*1152*864*60/1024/1024/1024 = 67GB/s

And that's without counting Color/Z/Stencil writes.
So, does this mean it's a bad feature? No.

If you considered that 67GB/s figure, you'd also have to realize something else.
Even with that, the Pixel Shading would *still* be the bottleneck.

Let's imagine the total takes 75GB/s ( without AA/AF )
Then, you've got theorical figures which show the R300 as being able to do 3400 Million Instructions Per Second ( MIPS )
That's at 325Mhz.
Imagine the R350 at 400/375
That means you've got 4200 MIPS & 24GB/s

The R300 got a limit of 160 instructions, but it can only do that in specific cases. In many, it can only do 96, sometimes even less. That's compared to the NV30's 1024.
So, let's imagine 1024 instructions, for every pixel. That's to make a comparaison with the NV30's optimal case. Okay, so this would require +- 10 passes.
As I said above, it would thus require 60GB/s

How many instructions does it require?
509.6MI/frame. That's 30576MIPS at 60 FPS.

Now, we suppose the R350 is capable of 4200MIPS.
So that means the R350 would be 7.28 times over its theorical PS performance requirements to have 60FPS.
But now, is memory bandwidth the bottleneck? NO!

75GB / 24GB = 3.125
So the R350 is 3.125 times over its theorical memory performance requirements to have 60FPS

So, in such a case, Pixel Shading is *still the bottleneck!*
And it might even still be when using 4x FSAA, thanks to lossless compression.

Note however that if ATI wasn't smart and stored the 32 temps anyway, Memory Bandwidth would indeed be the bottleneck. But not by much.
However, this doesn't mean this won't have a performance hit compared to having 1024 instruction slots. There's still an important performance hit coming from the additional Vertex Shading work, as well as other stuff.

On the plus side, once again, Vertex Shading is unlikely to be bottleneck because the PS would obviously be if you used 100 instructions/pass.


So, am I positive on this technology? Yes, I am. If it's what I think it is, it would be a very, very nice thing. nVidia's technological superiority in the PS would then be a lot less signifiant. While I don't think this replaces a high number of instruction slots, it is a very nice addition. In the current performance state, 100 "raw" instructions and this technology is really all we'd need.

Who's going to benefit most from this? Developers. They won't be required to use software emulation when trying a huge 4096 instruction program. The hardware can still do it. Sure, it isn't very fast, but it's still a LOT faster than on the CPU!
Workstation cards could also benefit from this, and a workstation derived product might get very popular.

Another question I've got about this is if the technology could be applied to the VS. In the VS, it's currently impossible to do Multi Pass, not even in specific cases. If it could work in the VS, then ATI sure got an amazing thing with the F buffer.


Uttar

Nebuchadnezzar
03-01-03, 03:26 PM
http://graphics.stanford.edu/projects/shading/pubs/hwws2001-fbuffer/talk-html/

Uttar
03-01-03, 04:18 PM
Hmm, after rethinking about this, I think it might be quite unlikely you'd need 10 temps for the next pass. Slightly less than that should be sufficent. But then again, it might be different for each program.

Also, this seems like a very nice thing, after reading the whole presentation. Minimal memory storage required, and works with transparency. It's a really, really nice thing.

Still not as good as having infinite instruction slots, obviously :) But it's cost effectiveness seems very impressive!


Uttar

sebazve
03-01-03, 04:45 PM
:cool: :eek: :cool:
thats sound like good features...they have improved AA and AF
cough more problems to the gfx****

Bigus Dickus
03-02-03, 01:24 AM
I seriously doubt the F-buffer is intended to render realtime effects, or anything remotely close. After all, both NV30 and R300 are much to slow to run fragment programs of 1000 instructions currently anyway, so the performance penalty of an F-buffer is academic for the most part.

I think this hardware feature is aimed at the professional market, so that shaders of any length can be executed entirely on hardware. Even with any bandwidth penalties (which are swallowed by the overally length of time a tremendously long shader would take to execute anyway) rendering on hardware would still likely be much much faster than rendering on general purpose CPU's, even powerful ones.

That seems to be about the only good thing JC had to say about the current NV30... he didn't run into an instruction limit when playing around with things to see what future engines might be like. The F-buffer would remove essentially all restrictions in this regard.

Nv40
03-02-03, 03:07 AM
for me the Fbuffer is a fix /workaround in the R350
for the problems of the radeon9700pro
cards ,that cannot multipass easily and correctly
pixel shaders ,as some people has pointed in the past.
increasing its capability beyond 96/160 intruction limit
with -> (multi passes.)

however does that means now that the only ATI card
with FP color that will be multipass capable (without errors) will be the R350?

as a side note ,the Nv30 may be too slow
running at its maximun shaders
intruction count/per pass ->1024
(wich i believe can do even more with driver hacks,
the quadrofx can do more than 2000 PS intrs/per pass)
but it was very fast running 1/3 of that number..

the Nvidia TRUck demo use 350+ intructions
pixel shaders and it was running very smooth in real time.. so still the extra programability of the NV30
can be usefull if game developers use wisely its
long shaders + polygon count number in scenes..

i have heard of a few people that already in the CGshaders contest have reached close to 1000 intructions shaders count. so its not something
dificult to do .

btw.. there are people who has told
that the Guy of the Fbuffer presentation works in NVidia.

Hellbinder
03-03-03, 04:55 AM
for me the Fbuffer is a fix /workaround in the R350
for the problems of the radeon9700pro
cards ,that cannot multipass easily and correctly
pixel shaders ,as some people has pointed in the past.
increasing its capability beyond 96/160 intruction limit
with -> (multi passes.)

Give me a break.. WTF are you talking about.. :rolleyes:

The 9700pro does not have any "problems" doing DX9 shaders. Nor does it do anything "incorrectly". The crap That gets posted at this site when people get on their little Nvidia tangents is just Ridiculous.

The R300 can already do multipass shading up to 4 p[asses. Meaning a total instriuction count of just under 500.

Do you flipping homework before you post completely FALSE propaganda.

-=DVS=-
03-03-03, 05:07 AM
LoL

Just becouse Nvidia said GFFX can do 1000PS instructions doesn't mean it actualy does it in single pass , for what we know GeforceFx could easly be doing several passes to calculate so many Instructions :rolleyes:

And truck demo noone knows actual number of instructions and or how fast was it , it could have been fluid 30FPS at only 350+ instructions and that is not fast :rolleyes:


You people should have learned by now that Advertised BullS**t that companys like Nvidia feeds you is never real deal its always exaggerated reality ;)


Geforce Fx is not even in stores :eek: and By the way topic is about RV350 & R350 :)

Dazz
03-03-03, 08:17 AM
Well i am looking into replacing my video card again :rolleyes: either with a Radeon 9600Pro or a GeforceFX 5600. So i will be hanging around to see how they compaire as both nVidia & ATi are announing their cards this week :) All this techno bable is just confusing me so i will just igore that.

tazdevl
03-03-03, 12:26 PM
Originally posted by Hellbinder
Give me a break.. WTF are you talking about.. :rolleyes:

The 9700pro does not have any "problems" doing DX9 shaders. Nor does it do anything "incorrectly". The crap That gets posted at this site when people get on their little Nvidia tangents is just Ridiculous.

The R300 can already do multipass shading up to 4 p[asses. Meaning a total instriuction count of just under 500.

Do you flipping homework before you post completely FALSE propaganda.

LOL Hell, be nice. Given his name, I'm not sure why a biased and misinformed statement from him is a surprise.

Fotis
03-03-03, 02:33 PM
Check this out.
Link (http://www.fic.com.tw/aboutfic/press/press.aspx?pr_id=91)
This is a rip from FIC's cebit press release.
FIC is now producing a huge range of advanced video cards based on ATI technology. New additions to the range include the A98 and A98P based on the ATI Radeon 9800 and 9800 Pro graphics engines respectively. The A98 and A98P both feature 400MHz RAMDAC, 8 pipeline architecture, 256 bit data width, 512Kb of serial flash ROM, AGP 8X/4X/2X compatibility, TV-out, support for simultaneous dual displays and DirectX 9.0/OpenGL support. In addition, the A98 will support 128MB of DDR memory and a core/memory clock speed of 325/310MHz. The A98P will support up to 256MB of DDR memory and has a core/memory clock speed of 400/460MHz.

Gar
03-03-03, 03:04 PM
The A98P will support up to 256MB of DDR memory and has a core/memory clock speed of 400/460MHz.


:afraid:

Wow, if that has 920mhz DDRI I'm going to crap my pants. It doesn't seem likely, but that would be insanely fast. Also, take note at the 9800 non pro. It looks like its basically a 9700pro with some inhancements (as far as core clock/mem clock is concerned). Also, someone want to explain the 512K serial flash ROM?

-=DVS=-
03-03-03, 03:20 PM
Originally posted by Gar
:afraid:

Wow, if that has 920mhz DDRI I'm going to crap my pants. It doesn't seem likely, but that would be insanely fast. Also, take note at the 9800 non pro. It looks like its basically a 9700pro with some inhancements (as far as core clock/mem clock is concerned). Also, someone want to explain the 512K serial flash ROM?


Well maybe ATI will allowe us to update bios of the card , officialy :D who knows maybe its just a marketing mombo jumbo :p

Dang if R350 new Radeon will have 400/460MHZ (920) that like 29+ GB/s of raw bandwith :eek: sweet

Nebuchadnezzar
03-03-03, 03:34 PM
The 460Mhz number is probably a typo, can't be this high. If it truly is that high, then well, :alc:



:D




EDIT: there's another thread about this : http://www.nvnews.net/vbulletin/showthread.php?threadid=8194

Geo
03-05-03, 02:25 AM
Originally posted by Fotis
512Kb of serial flash ROM

Maybe the news there is that it is FLASH? Aren't the 9700's non-flashable? I seem to remember that from right after the original release when there were issues with some 8x AGP mobos.

Dazz
03-05-03, 10:58 AM
It seems the Radeon 9600Pro will run at 400 core and 600MHz DDR while the none pro runs at 325MHz & 400MHz.

The Radeon 9800 runs at 380/680MHz. The good thing about these RV350 & R350 cards is they are Open GL 2.0 compatible :cool: OGL rox over D3D :)

Scource http://pc.ign.com/articles/388/388066p1.html

This is how Stan, Eric and Dan relate to you and me. This is how they get our attention. It should now be possible, despite the game being very processor intensive, to run something like Jedi Outcast at 1600 x 1200 with 4x AA and 8x AF on a 9600 and have it move consistently smooth, hovering above and around 60FPS at all times. Of course, this will depend heavily on what kind of processor and RAM configuration you have working

borntosoul
03-05-03, 07:59 PM
i checked out that story about the new radeons yesterday called THE NEXT RADEONS hehe , i checked today and now theyve taken it down ,yes it did say 380 clock /680 mem ,and it did say that in 3d mark 2003 that the 9700 pro ran at 18 fps and the 9800 ran at 26 fps on one benchmark ( this is from memory so dont quote me ,but its very close) .



edit typos