View Single Post
Old 07-31-09, 11:44 AM   #28
ChrisRay
Registered User
 
ChrisRay's Avatar
 
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
Default Re: EQ2 Shader 3.0 upgrade

Quote:
You were accusing me of not understanding something about PP and Geforce 6. It was a reading comprehension issue on your part.
I accused you of making an incorrect statement. Thats it.

Quote:
ompletely pedantic argument. The spec for full precision is a minimum FP24. That's what I clearly wrote in the next post. With regards to the conversation the fact that pp calls were allowed under Direct x 9.0 doesn't change the fact that pp didn't benefit G6 enough to give it an overall advantage in SM1.1 code over X800, which is what is relevant to the question of EQ2 performance.
But thats not what you originally wrote. You wrote PP is not DirectX spec. That was wrong. I have not argued that it not full precision. Because its not. But "is" within DirectX spec. PP isn't really relevant to SM 1.1 so I dont see your point here. I'm not the one discussing the X800. Your the one that seems to care a great deal about the X800.

But your right. PP dont benefit the Geforce 6 that much. They are recommended by Nvidia's documentation as a usage for anytime 32 Bit precision is not neccasarry. Most of the time it wasn't. Now anything less than 32 bit is unacceptable without rendering flaws.

And Again. I'm not the one comparing the X800 here for EQ 2. I am talking about stuttering. You keep bringing up the X800 like it matters.

Quote:
Dynamic branching will always incur a fee. It's just that it become less noticeable with sufficiently complex workloads. Their reasoning was that their workload didn't need dynamic branching so it wasn't used. You're mistaken on their reasoning behind this.
It doesn't have too. Specially if your branch granularity is good. And you have a good way of masking its latency. You can see several cases where Nvidia/ATI lose no performance with dynamic branching even at less than 4 pixels because they have dedicated hardware to masking its usage on modern hardware. It's real easy to argue that X1800 and Geforce 8800 cards were the first hardware to really feasibly use the dynamic branching capabilities of the hardware. Dont believe me? Look at the benchmarks.

Quote:
Who cares about Geforce 8? Why are you mentining that. You really need to read up on thsi stuff Chris, it's mighty annoying talking to a guy without the basic tools. Dynamic branching performance was certainly improved with Geforce 7 but it was far from useless on Geforce 6. The fact that it wasn't used much probably had a lot more to do with the fact that SM3.0 was only beginning to be supported, and in Far Cry 2's case, I really don't think you understand their rationale behind used pre-warmed shader cache and static branching. Read their GDC 05 presentation on it.
Who cares about the X800? Hardware history timelines are important because they give you an idea of what to expect. AMD marketed their hardware "SM 3.0" Done right. Because they had a much better implementation ((actually usable with limited performance loss at sub 16 pixels. I admit I dont entirely know what they use with there DX10 cards. But the Geforce 8 did it with 32.

I dont misunderstand it at all. The workload Crytek wanted to do with their shaders was simply unable to make use of the Geforce 6's dynamic branching capabilities because the latency of doing so on their 4 light shaders was too large. And don't kid yourself. Not every shader is post 128 instructions. Even today small shaders are used for simpler tasks. And branching can benefit them on X1800 + or 8800 + hardware. At least with the X1800/Geforce 8800 + developers don't have to fear using dynamic branching because the latency is so well masked with dedicated units for it.

Quote:
hey don't all behave very differently when operating FP16 or higher calls
Yes they do.

Geforce FX 5800/5200/5600 prefer FX16 integer based operations. Unable to run FP16 at any form of acceptable speeds.

Geforce FX 5900/FX5700 replaced 2 integer units with FP16. However lacked the registry space to fully utilize those units. Did not have the capability of Running SM 2.0 code with any floating point proficiency. FP16 did perform better than on the NV30 but was still largely hampered by its registry space as the new units did not solve that problem. FP32 simply increases the registry space usage the problem just got bigger.

Geforce 6 hardware. Offered a greatly increased registry. Therefore using integer calls in place of Floating Point calls did not offer large performance improvements. Which was a big problem with the FX series. And FP32 was no where near as devestating to performance compared to the Geforce FX. This is where the distinction is clearly being drawn. In games such as Half Life 2, Far Cry, Forcing FP16 or FP32 only caused minor performance deficits. ((usually within the range of 2-3 FPS)) The percentage loss for going from FP16 to FP32 is not that large. But there is still some benefit on the Geforce 6 cards.

The geforce 7 series, further improved it with better registry space. Although this change was minor compared to the Geforce FX/6 changes.

The Geforce 8 just ignores PP and performs all operations at full precision. ((which is FP32 by SM 3.0/DX10 specification)). DirectX 10 does not support partial precision. But Geforce 8 will just completely ignore it.

So yes each one of these hardware behaves differently when requesting FP 16 and FP32 operations.

Quote:
bout your assertion that factors other than shader performance affected performance, sure, and I never argued to the contrary. The fact that you think that's a novel idea is amazing. However, there's very little evidence that at 1280*1024 EQ was fillrate bound, which was the highest comfortable resolution for both cards.
Actually there was some decent evidence to this. I tended to play EQ 2 on a Geforce 6800GT SLI PCIE machine ((shortly after I replaced my AGP Machine)). And using SFR you could still get scaling in EQ 2. The reason for this is Split Frame Rendering will adjust its load based on the pixel fillrate bottleneck. ((And doesnt take into account vertex acceleration)). So you could see in the realm of 20-25% performance increase at 1280x1024 with SFR at the time and 50-60% increases at 1600x1200 with SFR)) But of course this depends "entirely" on the area you are in. Some places were far more leniant to pixel fillrate bottlenecks than others.

Compare Say West Freeport too Commonlands as an example.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080

|CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI

Nzone
SLI Forum Administrator

NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members
ChrisRay is offline   Reply With Quote