View Single Post
Old 07-31-09, 11:27 AM   #27
Registered User
ChrisRay's Avatar
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
Default Re: EQ2 Shader 3.0 upgrade

Dynamic Branching on Geforce 6 wasn't optimal but useful. Once again Chris, you're absolutely wrong about its performance impact. It could certainly be beneficial for performance, and was faster than static branching on the hardware. Geforce 6's coarse granularity obviously hampered branching performance, but it wasn't "broken", just less useful than in future iterations. Let's see:
This is getting boring. Please for the love of god do some research.
*Shakes head*

Dynamic Branching is entirely granularity based. You "Could" get a performance benefit from it with very careful usage. What you couldn't do is use it to produce much faster performance on most shaders. And developers would fear the latency would be too great on geforce 6 to make any use of it. On newer hardware. Using Dynamic Branching isn't such a "scare" because said latency is very well hidden.

Far Cry is a perfect example of that. The problem with Nvidia's Dynamic Branching in the Geforce 6 is that it operates at a 64 pixels. Making it very hard to use Dynamic Branching ((with Flow Control)) to bring performance levels up. Notice in this preview that the branching performance doesnt improve at all for the Geforce 7 until 64 pixels are reached.

This is why the geforce 7. ((And consquently the Geforce 6 which is even worse than the geforce 7)) branching isn't useful at all for anything in its lifetime. The branching granularity was way too large for performance to go up using it.

For simple "Fractal Rendering" the geforce 7900GTX would lose up to 60% performance. As you can see. The smaller batches were extremely detrimental to performance. ((Hence The Far Cry Scenerio)) where branching was unable to improve performance because the granularity was simply too large on the Geforce 6800/7800 cards compared to an X1800/X1900 or 8800GTX or better card.

So yes. Dynamic Branching is extremely weak on the Geforce 6/7 cards. And rounds about to points of not very useful. To extremely limited use because you couldn't use it for much. And when you could. Making use of it was more complicated than just using static branching.

The entire point of branching was to improve performance in small areas where the pixel may or may not need softening. ((IE in the shader)) and with the huge branch granularity of the Geforce 6. It's nearly impossible to make optimal use of it. Unlike modern Nvidia/ATI hardware. The Geforce 6 cannot mask its branching granularity.

((since apparently I need links to backup my assertions)).

It's really tough to assemble what you say into a cohesive thread.
The reason you dont understand is because you simply do not understand where I am coming from. My original point was that EQ 2's stuttering problems were not related to SM 3.0. Which at the time. Before everyone started dumping EQ 2's shader code. Was thought to be a SM 3.0 title. Hence I think its rather funny that they are now just talking about EQ 2 being a "SM 3.0" title. Because many. Before that time believed it was.

I was wrong about dynamic flow control with Far Cry, although that was easy to mistake
How on earth you can call this an "Easy" mistake is beyond me. Since you used it to perpetuate your argument that Nvidia needed dynamic branching to be competitive in the first place. Your head is so cluttered with marketing garbage you completely missed the entire point of what Far Cry was trying to do.

All Far Cry shader implementation is run 4 light sources within a single shader using the increased pixel shader instructions available to it from SM 3.0. All ATI"s implementation did was run 3 light sources in a single shader. Reduce due to the fact that it cannot store as many instructions as available in SM 3.0.

Beyond that. Crytek "wanted" to use flow control and dynamic branching for these instructions. But was unable to do so because of the performance impact it had on the Geforce 6 cards. Many at the time even argued that this was not a true SM 3.0 implementation since it could all be done in SM 2.0 with static branching. And the only element of SM 3.0 that it actually used was the increased instruction set. ATI did have a point here as they proved that SM 2.0B could do nearly the same amount of work. Only finding itself limited by the max instructions allowed.

never commented on EQ2 stuttering, just the lower overall performance.
This was what my original comment was based on... and it was based on a very very old discussion that happened 4 and half years ago. Yes its pretty common knowledge now that the game does not use SM 3.0. Back then it wasn't. Hence why I thought it was funny in hind sight.

One moment you say that people are wrong because they blamed EQ2's poor performance on Geforce 6's SM3.0 implementation,
Thats true. People did believe that. And they were wrong.

next you're arguing that Geforce 6 ran SM1.1 - SM2.0 code at the same speed as X800 & X850,
I did not say that. I said compared to the geforce FX. The Geforce 6/7 hardware capable of running simple PS 2.0 at similar speeds to 1.1. I'm not the one here trying to compare the X800 to the Geforce 6. Thats you buddy.

, Geforce 6 wasn't necessarily slower per clock on all shader code. It sometimes lost quite dramatically (once again, up to 2x slower with SM1.1 shaders) but sometimes the loss would incidental to its lower clock speed
The Geforce 6/7 had lower clock speeds than the X800. But it also had more shader units available to it. I didnt want to get into an argument about the Geforce 6/7 verses the X800 because you had 2 largely different approaches to shading models.

With the Geforce 7. You were rewarded for using excess MADD. While the Geforce 6 was a bit less efficient due to its second shader unit only being a MUL. But the Geforce 7/6 series also had to hide its shader latency behind a texture unit. As only half of its shader units were dedicated. While the other half shared latency with texture mapping.

hris, pixel fillrate, texture fillrate, zfill, AA cycles within the ROPS, were quite similar between the two
No. They weren't. Nvidia used a hybrid approach to its TMU's/Shader units as stated above. Which caused latency in heavy texturing scenarios. Hence depending on the workload. You may have had very different throughput.

Also. The Zfill techniques between the cards were very very different. Nvidia used its double Z approach ((which admittedly was less of an advantage with anti aliasing enabled)) which ATI has not had. And the TMUS were most definately not equal during that age. As Nvidia did not decouple its TMUs from its shader units till the Geforce 8 series.

So dont patronize me and tell me that these cards were largely architecturally similar. They weren't.
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080

|CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI

SLI Forum Administrator

NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members
ChrisRay is offline   Reply With Quote