View Single Post
Old 07-31-09, 11:10 AM   #26
pakotlar
Guest
 
Posts: n/a
Default Re: EQ2 Shader 3.0 upgrade

Quote:
Originally Posted by ChrisRay View Post

Yes. In a full throughput test where the primary bottleneck was registry performance. The Geforce 6 series could lose somewhere to 30% performance using FP32. However this is a "Synthethic" test. IE. It doesn't take into account that no shader is designed to "Run" like these ones in modern games. Shaders pass through pipeline and then another comes through the pipeline. These specific tests are designed to run the same shader code through the GPU pipeline repeatively as a benchmarks. Its not a real world result. This is why you never saw this kind of performance deficit using FP32 verses FP16. The performance benefits in Half Life 2 were not there.
You have no understanding of shader workloads. It's not less "real world" they're just not the same shader. You're assertion that PP didn't offer speedups was incorrect, that's what I was pointing out. The shaders used in RightMark were not in any way unrealistic. And I haven't taken a look at Half Life 2 shaders, but suffice to say that Half Life 2 used quite a bit of SM1.1 code, and that code ran significantly faster on ATI hardware.

Quote:
Originally Posted by ChrisRay View Post
And your point? I was not arguing that partial precision was good or bad. I said that the performance impact on Geforce 6/7 hardware not large when moving from FP16. Especially compared to Geforce FX. Everyone knows the Geforce FX did not have the registry space to run DirectX 9.0 shaders in FP32 fast. Let alone FP16. The Nv35 in comparison to the NV30 did replace 2 of its integer units with FP units. But was still confined by registry making it unable to cope with heavy registry usage.
You were accusing me of not understanding something about PP and Geforce 6. It was a reading comprehension issue on your part. You were re-stating what I said. Its a waste of time talking to someone who can't spend the time, or have the ability, to understand the topic.

Quote:
Originally Posted by ChrisRay View Post
Thats not what you said. You said PP is not a part of DirectX 9.0 spec. That is wrong. Partial Precision is infact a part of all recent DirectX 9.0 builds. And has been since the debut of the Geforce FX. Yes it may not have originally been intended for DirectX 9.0 specification. It was however updated to be included. I am not saying otherwise either.
Completely pedantic argument. The spec for full precision is a minimum FP24. That's what I clearly wrote in the next post. With regards to the conversation the fact that pp calls were allowed under Direct x 9.0 doesn't change the fact that pp didn't benefit G6 enough to give it an overall advantage in SM1.1 code over X800, which is what is relevant to the question of EQ2 performance.

Quote:
Originally Posted by ChrisRay View Post
There is nothing "non factual" about what I said. SM 3.0 was not initially a part of DirectX 9.0 spec either. The fact is. PP is and has been a supported standard for the majority of time DX 9.0 accelerators have been available. Yes it may have been changed. That doesn't change that your initial comment about PP "Not being DX Spec" was wrong. Because it is a part of baseline DirectX 9.0 spec. Before SM 2.0A, Before SM 2.0B, And Before SM 3.0. You seem to suffer a major point of confusion regarding the Geforce 5/ Geforce 6, Geforce 7, and Geforce 8 series. All which behave very different when operating FP16 or higher calls. You do realise that all your links have done is reinforce my original point?
No Chris, they don't. They don't all behave very differently when operating FP16 or higher calls. Geforce 6 & 7 are extremely similar to one another, both are SIMD/MIMD architectures, running full precision for all calls, and while Geforce 8 is a significantly different architecture, with regards to precision it operates precisely the same, in an array of SIMT processing elements (not a very different programming model compared to nv40 & G70/71). FX5800, VLIW, obviously different with lower precision call handling. True story, but you have no idea of how they process SM2.0 calls.

Quote:
Originally Posted by ChrisRay View Post
This is wrong. There were some drawbacks to Nvidia's SM 3.0 implementation. Firstly its Dynamic Branching performance was insufferably slow and unusable. Hence why anything that used SM 3.0 was using Static Branching. ((See Far Cry SM 3.0 implementation. ATI even did a whole campaign of the HD1900XT to where they stated they did "SM 3.0 Right" because their dynamic branching granularity was much better than Nvidias. As well as supporting HDR with AA. ((Though HDR was not specific to SM 3.0)). It wasn't until the Geforce 8 that Nvidia actually supported AA + HDR via the ROPS.
Who cares about Geforce 8? Why are you mentining that. You really need to read up on thsi stuff Chris, it's mighty annoying talking to a guy without the basic tools. Dynamic branching performance was certainly improved with Geforce 7 but it was far from useless on Geforce 6. The fact that it wasn't used much probably had a lot more to do with the fact that SM3.0 was only beginning to be supported, and in Far Cry 2's case, I really don't think you understand their rationale behind used pre-warmed shader cache and static branching. Read their GDC 05 presentation on it.

Here's an excerpt: "
1. Dynamic indexing only allowed on input registers; prevents passing light data via constant registers and index them in loop
2. Passing light info via input registers not feasible as there are not enough of them (only 10)
3. Dynamic branching is not free"

Dynamic branching will always incur a fee. It's just that it become less noticeable with sufficiently complex workloads and a high degree of granularity. Crytek's reasoning was that their workload wouldn't benefit from dynamic branching because their shaders were short enough that they could unroll and cache them at load. You're mistaken that their reasoning was that dynamic branching offered no performance benefits on Geforce 6.

Nvidia themselves gives a good explanation of G6 dynamic shader performance, and its usage. It certainly is beneficial with careful use. A "broken" implementation it was not: http://techreport.com/articles.x/6627/2
No one is arguing that it wasn't more of a checkmark feature for SM3.0 compatibility, but it is patently wrong to argue that it wasn't at the same time beneficial for performance.

Finally D.branching performance has NOTHING to do with this discussion, beyond showing that SM3.0 was not broken on Geforce 6.

About your assertion that factors other than shader performance affected performance, sure, and I never argued to the contrary. The fact that you think that's a novel idea is amazing. However in terms of fillrate, especially with triple & quad texturing their performance was nearly identical, and while ATI held the advantage with polygon setup, nvidia had 2x higher z-fill, lower AA hit, ATi had lower AF hit... ok but again, reading issues at work Chris. Find me where I said that the only differentiating factor was shader performance. What I actually said, and what I actually meant, was that differences in real-world shader performance correlated well with differences in EQ2 performance. The fact that you say otherwise just shows that you have no understanding of the concept.

As far as drivers go, a year after release, at the time of the 7800 gtx launch, Geforce 6800 lagged quite a bit:

Here's Anand's take: http://www.anandtech.com/video/showdoc.aspx?i=2451&p=11 " Despite the fact that Everquest 2 is an MMORPG, it has some of the most demanding graphics of any game to date. The extreme quality mode manages to tax the system so severely that even at 1280x1024 we aren't able to get above 25 FPS with the 7800 GTX. ATI pulls ahead of the single 6800U by over 100% in the widescreen 1920x1200 resolution, though in more reasonable settings the performance is closer. "

So if you think that comes down to some CPU limitation and a little extra fillrate, that's great. I don't care, it's a waste of effort. You glean nothing new and regurgitate complete nonsense. Outside of your convictions about where EQ2's performance challenges are/were you have literally 0 support, beyond your alleged conversation with nvidia & sony, but like I said authority appeals are pathetic. Smart people don't use them.

It's been a complete waste of time talking to you to be honest. Actually I'd like you to post for all of us this information from Sony saying that SM1.1 performance on Geforce 6 wasn't a factor in its performance profile. Otherwise, let's just agree that you're out of your league.
  Reply With Quote