|
|
#25 | |
|
Guest
Posts: n/a
|
Quote:
|
|
|
|
|
#26 | |||||
|
Guest
Posts: n/a
|
Quote:
Quote:
Quote:
Quote:
Quote:
Here's an excerpt: " 1. Dynamic indexing only allowed on input registers; prevents passing light data via constant registers and index them in loop 2. Passing light info via input registers not feasible as there are not enough of them (only 10) 3. Dynamic branching is not free" Dynamic branching will always incur a fee. It's just that it become less noticeable with sufficiently complex workloads and a high degree of granularity. Crytek's reasoning was that their workload wouldn't benefit from dynamic branching because their shaders were short enough that they could unroll and cache them at load. You're mistaken that their reasoning was that dynamic branching offered no performance benefits on Geforce 6. Nvidia themselves gives a good explanation of G6 dynamic shader performance, and its usage. It certainly is beneficial with careful use. A "broken" implementation it was not: http://techreport.com/articles.x/6627/2 No one is arguing that it wasn't more of a checkmark feature for SM3.0 compatibility, but it is patently wrong to argue that it wasn't at the same time beneficial for performance. Finally D.branching performance has NOTHING to do with this discussion, beyond showing that SM3.0 was not broken on Geforce 6. About your assertion that factors other than shader performance affected performance, sure, and I never argued to the contrary. The fact that you think that's a novel idea is amazing. However in terms of fillrate, especially with triple & quad texturing their performance was nearly identical, and while ATI held the advantage with polygon setup, nvidia had 2x higher z-fill, lower AA hit, ATi had lower AF hit... ok but again, reading issues at work Chris. Find me where I said that the only differentiating factor was shader performance. What I actually said, and what I actually meant, was that differences in real-world shader performance correlated well with differences in EQ2 performance. The fact that you say otherwise just shows that you have no understanding of the concept. As far as drivers go, a year after release, at the time of the 7800 gtx launch, Geforce 6800 lagged quite a bit: Here's Anand's take: http://www.anandtech.com/video/showdoc.aspx?i=2451&p=11 " Despite the fact that Everquest 2 is an MMORPG, it has some of the most demanding graphics of any game to date. The extreme quality mode manages to tax the system so severely that even at 1280x1024 we aren't able to get above 25 FPS with the 7800 GTX. ATI pulls ahead of the single 6800U by over 100% in the widescreen 1920x1200 resolution, though in more reasonable settings the performance is closer. " So if you think that comes down to some CPU limitation and a little extra fillrate, that's great. I don't care, it's a waste of effort. You glean nothing new and regurgitate complete nonsense. Outside of your convictions about where EQ2's performance challenges are/were you have literally 0 support, beyond your alleged conversation with nvidia & sony, but like I said authority appeals are pathetic. Smart people don't use them. It's been a complete waste of time talking to you to be honest. Actually I'd like you to post for all of us this information from Sony saying that SM1.1 performance on Geforce 6 wasn't a factor in its performance profile. Otherwise, let's just agree that you're out of your league. |
|||||
|
|
|
#27 | ||||||||
|
Registered User
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
|
Quote:
Dynamic Branching is entirely granularity based. You "Could" get a performance benefit from it with very careful usage. What you couldn't do is use it to produce much faster performance on most shaders. And developers would fear the latency would be too great on geforce 6 to make any use of it. On newer hardware. Using Dynamic Branching isn't such a "scare" because said latency is very well hidden. Far Cry is a perfect example of that. The problem with Nvidia's Dynamic Branching in the Geforce 6 is that it operates at a 64 pixels. Making it very hard to use Dynamic Branching ((with Flow Control)) to bring performance levels up. Notice in this preview that the branching performance doesnt improve at all for the Geforce 7 until 64 pixels are reached. This is why the geforce 7. ((And consquently the Geforce 6 which is even worse than the geforce 7)) branching isn't useful at all for anything in its lifetime. The branching granularity was way too large for performance to go up using it. For simple "Fractal Rendering" the geforce 7900GTX would lose up to 60% performance. As you can see. The smaller batches were extremely detrimental to performance. ((Hence The Far Cry Scenerio)) where branching was unable to improve performance because the granularity was simply too large on the Geforce 6800/7800 cards compared to an X1800/X1900 or 8800GTX or better card. So yes. Dynamic Branching is extremely weak on the Geforce 6/7 cards. And rounds about to points of not very useful. To extremely limited use because you couldn't use it for much. And when you could. Making use of it was more complicated than just using static branching. The entire point of branching was to improve performance in small areas where the pixel may or may not need softening. ((IE in the shader)) and with the huge branch granularity of the Geforce 6. It's nearly impossible to make optimal use of it. Unlike modern Nvidia/ATI hardware. The Geforce 6 cannot mask its branching granularity. http://www.behardware.com/articles/6...-8800-gts.html ((since apparently I need links to backup my assertions)). Quote:
Quote:
All Far Cry shader implementation is run 4 light sources within a single shader using the increased pixel shader instructions available to it from SM 3.0. All ATI"s implementation did was run 3 light sources in a single shader. Reduce due to the fact that it cannot store as many instructions as available in SM 3.0. Beyond that. Crytek "wanted" to use flow control and dynamic branching for these instructions. But was unable to do so because of the performance impact it had on the Geforce 6 cards. Many at the time even argued that this was not a true SM 3.0 implementation since it could all be done in SM 2.0 with static branching. And the only element of SM 3.0 that it actually used was the increased instruction set. ATI did have a point here as they proved that SM 2.0B could do nearly the same amount of work. Only finding itself limited by the max instructions allowed. Quote:
Quote:
Quote:
Quote:
With the Geforce 7. You were rewarded for using excess MADD. While the Geforce 6 was a bit less efficient due to its second shader unit only being a MUL. But the Geforce 7/6 series also had to hide its shader latency behind a texture unit. As only half of its shader units were dedicated. While the other half shared latency with texture mapping. Quote:
Also. The Zfill techniques between the cards were very very different. Nvidia used its double Z approach ((which admittedly was less of an advantage with anti aliasing enabled)) which ATI has not had. And the TMUS were most definately not equal during that age. As Nvidia did not decouple its TMUs from its shader units till the Geforce 8 series. So dont patronize me and tell me that these cards were largely architecturally similar. They weren't.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080 |CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI SLI Forum Administrator NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members |
||||||||
|
|
|
|
|
#28 | ||||||
|
Registered User
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
|
Quote:
Quote:
But your right. PP dont benefit the Geforce 6 that much. They are recommended by Nvidia's documentation as a usage for anytime 32 Bit precision is not neccasarry. Most of the time it wasn't. Now anything less than 32 bit is unacceptable without rendering flaws. And Again. I'm not the one comparing the X800 here for EQ 2. I am talking about stuttering. You keep bringing up the X800 like it matters. Quote:
Quote:
I dont misunderstand it at all. The workload Crytek wanted to do with their shaders was simply unable to make use of the Geforce 6's dynamic branching capabilities because the latency of doing so on their 4 light shaders was too large. And don't kid yourself. Not every shader is post 128 instructions. Even today small shaders are used for simpler tasks. And branching can benefit them on X1800 + or 8800 + hardware. At least with the X1800/Geforce 8800 + developers don't have to fear using dynamic branching because the latency is so well masked with dedicated units for it. Quote:
Geforce FX 5800/5200/5600 prefer FX16 integer based operations. Unable to run FP16 at any form of acceptable speeds. Geforce FX 5900/FX5700 replaced 2 integer units with FP16. However lacked the registry space to fully utilize those units. Did not have the capability of Running SM 2.0 code with any floating point proficiency. FP16 did perform better than on the NV30 but was still largely hampered by its registry space as the new units did not solve that problem. FP32 simply increases the registry space usage the problem just got bigger. Geforce 6 hardware. Offered a greatly increased registry. Therefore using integer calls in place of Floating Point calls did not offer large performance improvements. Which was a big problem with the FX series. And FP32 was no where near as devestating to performance compared to the Geforce FX. This is where the distinction is clearly being drawn. In games such as Half Life 2, Far Cry, Forcing FP16 or FP32 only caused minor performance deficits. ((usually within the range of 2-3 FPS)) The percentage loss for going from FP16 to FP32 is not that large. But there is still some benefit on the Geforce 6 cards. The geforce 7 series, further improved it with better registry space. Although this change was minor compared to the Geforce FX/6 changes. The Geforce 8 just ignores PP and performs all operations at full precision. ((which is FP32 by SM 3.0/DX10 specification)). DirectX 10 does not support partial precision. But Geforce 8 will just completely ignore it. So yes each one of these hardware behaves differently when requesting FP 16 and FP32 operations. Quote:
Compare Say West Freeport too Commonlands as an example.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080 |CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI SLI Forum Administrator NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members |
||||||
|
|
|
|
|
#29 |
|
Registered User
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
|
Some more dynamic branching benches for you to wrap your head around.
http://www.xbitlabs.com/articles/vid...gf8800_17.html Showcasing the X1800 being up to 200% faster at dynamic branching. And 150% in lower dynamic branching usage. Remember these are just theoretical tests. Another dynamic branching benchmark. Which shows once again. The Geforce 7/6 unable to benefit from it greatly under typical usage. ((Though there seem to be a bug for the Geforce 7. Hard to say if this ever got fixed)((Under openGL this time)) http://forum.beyond3d.com/showthread.php?t=37430
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080 |CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI SLI Forum Administrator NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members |
|
|
|
|
|
#30 | |
|
Join Date: Oct 2005
Posts: 7,998
|
ChrisRay knowledge =
![]()
__________________
• EVGA GeForce GTX 680 • PCP&C 750 Quad • ASUS 12x BD-ROM • DELL U2412M E-IPS • Windows 7 Pro x64 SP1 • Logitech Z5500 5.1 • |
|
|
|
|
|
|
#31 |
|
Registered User
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
|
Now that I am calmed down. I will be fair pako. You are right that I dont always write coherently. Especially when I get heated into a debate. I'll often find myself editing my own posts over and over again because even after I post them. I sometimes have difficulty understanding what I wrote because its usually just a rush of thoughts cominng out as I type them. I do apologize for that. I am in a better mood now that I understand your initial post was not entirely understanding of the timeline I was referring too. So it could have been an easy thing to misunderstand.
Oh his another dynamic branching bench from behardware. Which once again shows the major latency problems with even simple branches pre 64 pixels. http://www.behardware.com/articles/5...800-xt-xl.html
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080 |CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI SLI Forum Administrator NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members |
|
|
|
|
|
#32 | |
|
Guest
Posts: n/a
|
Quote:
Nice find with the benchmark. Wasn't x1800 16pixel granularity and x1900 48? It seems that ~48 - 64 then is the cutoff for acceptable performance degredation. I suppose that's why even nvidia was very careful to suggest its use. About the Sm3.0 shaders in EQ2, they look cool. |
|
|
|
|
#33 |
|
Registered User
Join Date: Jun 2004
Location: Australia
Posts: 820
|
lol they should get you working on eq2 chris
![]() Better yet come check out Fallen Earth, it could use a few performance tweaks. It's in beta atm so see how it goes when its released in about 38 days. It has so much potential to be a great MMO. It reminds me of the way eq2 ran in cities when it first came out, but this time round i'm using a i7 with a gtx260. It's having the same rendering town jerkiness problem, i keep posting for them to use occlusion culling because in beta atm it doesn't seem to be. Fallen earth vid http://www.youtube.com/watch?v=WQctA...eature=related yeah EQ2 flies now on max settings with a i7, totally cpu bound it was. it can still slow down with too many light sources and shadows in guild halls though.
__________________
i7 920 640g/b Raid 0 Corsair 64gig SSD Gigabyte EX58-UD3R 3x 27" Eyefinity 2x5870 crossfire Antec true 750w Logitech G15 6 gig kingston ddr3 1033 Windows 7 x64 Web design:http://www.advancedws.com.au:http://www.nobletrading.com.au:http://www.rackingaudits.com.au:http://www.imhandling.com.au |
|
|
|
|
|
#34 | ||
|
Registered User
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
|
Quote:
Quote:
This is why having alot of players casting would constantly cause alot of performance deficit. I tend to this day try to keep number of spellcasters visible to a lower level because of the animation skinning bottleneck. Either way. I'm Sorry if I got snappy. It just seemed odd to me that you brought up the X800. But after reading your few posts. It does seem clear that we are talking on different timelines about different things in regards to EQ 2's performance. I was commenting entirely on its stuttering problem. Which was a huge issue back then.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080 |CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI SLI Forum Administrator NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members |
||
|
|
|
|
|
#35 | |
|
Registered User
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
|
Quote:
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080 |CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI SLI Forum Administrator NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members |
|
|
|
|
|
|
#36 |
|
Registered User
Join Date: Jun 2004
Location: Australia
Posts: 820
|
Yeah i noticed that AA bug, the water dissapears when its on doesnt it ?
I don't really play eq2 much anymore, i still follow the gfx side of things though. And as for freerealms hmm i got as high as i could in mining without paying extra for the quests, the mining minigame was fun though.
__________________
i7 920 640g/b Raid 0 Corsair 64gig SSD Gigabyte EX58-UD3R 3x 27" Eyefinity 2x5870 crossfire Antec true 750w Logitech G15 6 gig kingston ddr3 1033 Windows 7 x64 Web design:http://www.advancedws.com.au:http://www.nobletrading.com.au:http://www.rackingaudits.com.au:http://www.imhandling.com.au |
|
|
|
![]() |
| Thread Tools | |
|
|