Go Back   nV News Forums > Software Forums > Gaming Central > MMORPGs

Newegg Daily Deals

Reply
 
Thread Tools
Old 07-31-09, 02:17 PM   #37
pakotlar
Guest
 
Posts: n/a
Default Re: EQ2 Shader 3.0 upgrade

Quote:
Originally Posted by ChrisRay View Post
Yes they do.

Geforce FX 5800/5200/5600 prefer FX16 integer based operations. Unable to run FP16 at any form of acceptable speeds.

Geforce FX 5900/FX5700 replaced 2 integer units with FP16. However lacked the registry space to fully utilize those units. Did not have the capability of Running SM 2.0 code with any floating point proficiency. FP16 did perform better than on the NV30 but was still largely hampered by its registry space as the new units did not solve that problem. FP32 simply increases the registry space usage the problem just got bigger.

Geforce 6 hardware. Offered a greatly increased registry. Therefore using integer calls in place of Floating Point calls did not offer large performance improvements. Which was a big problem with the FX series. And FP32 was no where near as devestating to performance compared to the Geforce FX. This is where the distinction is clearly being drawn. In games such as Half Life 2, Far Cry, Forcing FP16 or FP32 only caused minor performance deficits. ((usually within the range of 2-3 FPS)) The percentage loss for going from FP16 to FP32 is not that large. But there is still some benefit on the Geforce 6 cards.

The geforce 7 series, further improved it with better registry space. Although this change was minor compared to the Geforce FX/6 changes.

The Geforce 8 just ignores PP and performs all operations at full precision. ((which is FP32 by SM 3.0/DX10 specification)). DirectX 10 does not support partial precision. But Geforce 8 will just completely ignore it.

So yes each one of these hardware behaves differently when requesting FP 16 and FP32 operations.
After a little crash course -:> Let me know if my terminology is off, or if the general idea is on, or vice versa

From what I see, the only reason that nv40 benefits from FP16 pp or FX12 int pp is that latency associated with executing FP32 operands is higher than wiht FP16 or int12, and maybe smaller memory footprint. NV30's (and 40's) ability to use lower precisions had to do with the fact that they didn't have constant registers, but stored the data in instruction slots instead, and could therefore store either int or variable float precisions... This had some associated penalties for state changes though. From recollection, although CineFx's ALU's were FP32 precision, it only had the ability execute 1/2 the FP32 operands/ cycle due to register limitation. So we have 64 output registers, but 2 count for 1 FP32 instruction. However, simultaneously the 5800 could store 64 FP16 instructions in its temporary registers, or 32 FP32. So its registers were orthogonal in that sense. R300 was limited to 32 temp. registers regardless of the precision used. So maximum throughput should be 64 FP16 instructions vs. R300's 32 FP24. In 5900's case, 1 register counted for 1 FP32 instruction, so therefore should in theory be able to execute 64 FP32 instructions. The R300 also has 32 floating point constant registers. However, as stated above, for flexibility (or due to the fact that 5800 was probably more geared for Direct X 8.1 than 9) FX5800 has no constant registers, and instead uses its copious instruction slots to store fragment program instructions. Can store int instructions as well, for FX12. Carries heavy penalties for instruction state changes.

6800 changed this up by offering 2 FP32 ALU's in 16 independant pipelines, each capable of executing 8 floating point instructions in dual issue, so double the theoretical pixel shader performance of the NV35. Probably faster than that due to use of SIMD programming model vs VLIW (and better utilization as a 16*1 and not 8*2). Not sure whats going on with constant registers, but assuming is the same. FP16 normalize, so has some FP16 specific optimizations still. Registers are once again orthogonal for full FP32 performance. In this case I can't imagine any reason other than space and some slight latency penalty for using FP32 compared to FP16. Assuming it used the same register combiner scheme, you would have 2x the # of FP16 registers as is needed, which could come in handy maybe to ease the state change penalty associated with storing constant registers in instruction slots.


G80 is an obvious extension and evolution of the G70 ALU. Do they do away with register combiners for FP16 ops?

Does this sound right.

Also, with regard to NV35, would it be fair to say that since two registers were still needed for 1 FP32 instruction that NV35 was limited by latency for issuing FP32 instructions. Whereas NV40 had less to arbitrate due to 1 FP32 instr. = 1 register. However, would 2 FP16 instructions fit in 1 register? So with a good driver efficiency can be gained when FP32 instruction fills out the register (wheareas teh FP16 instructions can be more tightly packed).

edit: And btw the "32" constant registers typically reported for NV30, matches nicely with the "512" instruction slots, as this allows for 16 FP32 instructions to be issued concurrently, right? So doesn't this in effect mean that NV30 has 1/2 the FP32 performance that R300 has for FP24? Due to instruction slot limit?
  Reply With Quote
Old 07-31-09, 04:16 PM   #38
ChrisRay
Registered User
 
ChrisRay's Avatar
 
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
Default Re: EQ2 Shader 3.0 upgrade

That does sound more or less accurate to me. The NV35 should have been able to uses upgraded FP16 units and combine them into a single 32 bit float. Obviously this would cause the performance throughput to half. Theres no doubt that the Geforce FX 5900 was heavily engineered with FP16 in mind as it was never really capable of running FP32 at anything but half speed. With the Geforce FX 5800 and the NV30. I dont really think that the card was ever really "Designed" with DirectX 9.0 mind. It had the capability to run FP32 through one of its shaders. But its other shader units were largely integer based as you mentioned. So I dont really see anything wrong with your history of the FX 5900/6800.

((well one minor mistake. FX 5800 was 4x2 and 6800 was 16x1 I dont really consider the Zpixel capabilities of either hardware as "True" pipelines. Because that would make the 6800 32x1.)) But thats a pretty easy thing to forget.

Quote:
6800 changed this up by offering 2 FP32 ALU's in 16 independant pipelines, each capable of executing 8 floating point instructions in dual issue, so double the theoretical pixel shader performance of the NV35. Probably faster than that due to use of SIMD programming model vs VLIW (and better utilization as a 16*1 and not 8*2). Not sure whats going on with constant registers, but assuming is the same. FP16 normalize, so has some FP16 specific optimizations still. Registers are once again orthogonal for full FP32 performance. In this case I can't imagine any reason other than space and some slight latency penalty for using FP32 compared to FP16. Assuming it used the same register combiner scheme, you would have 2x the # of FP16 registers as is needed, which could come in handy maybe to ease the state change penalty associated with storing constant registers in instruction slots.
This was actually one of the hardest things for people to understand at the time. People assumed the Geforce 6 used the same shader setup as the FX cards and it didnt. People assumed FP32 would mean drastic slowdown on the NV40 and NV45. But the reality was there was too much going on in the pipeline for that ever to become a primary bottleneck. The 3Danalyze program was great for toying with this back when FP16/FP32 existed,

Actually if you look at these rightmark3d Benches. You'll notice the 7800/7900 series copes with 16FP-> 32FP even better than the Geforce 6 series. I cant say why the Blinn shader seems to do better. But my guess is its makes more use of MADD type instructions. As well other pipeline improvements. Obviously IMO the Geforce 6 and Geforce 7 still share heritage from the NV35/Nv3x. Something alot of people will not want to admit.

Quote:
Also, with regard to NV35, would it be fair to say that since two registers were still needed for 1 FP32 instruction that NV35 was limited by latency for issuing FP32 instructions.
Perfectly fair. In all my experiences. FP32 was roughly half the speed of FP16 on the Geforce FX5900. Which is why PP did so much good for the FX 5900 in comparison too FX 5800 which really needed FX16 to do anything at acceptable speeds. I'm not sure if you remember early FX drama. But there was a ton of pictures of "Vase rendererings". When the FX 5800 was out Nvidia replaced all FP operations with FX16 operations. This went away when the 5900 came out and they could cope with FP16 better.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080

|CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI

Nzone
SLI Forum Administrator

NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members
ChrisRay is offline   Reply With Quote
Old 08-01-09, 01:06 AM   #39
Atomizer
Registered User
 
Join Date: Jun 2004
Posts: 379
Default Re: EQ2 Shader 3.0 upgrade

Im not about to get into the middle of whatever is going on(sorry I skimmed most of what you guys wrote so you might have already addressed it), but EQ2 I find still suffered from performance issues last time I tried it(with my 8800GTX), which I assume is because all of the games shaders are written in 1.1 and/or not using shaders at all for alot of it, causing the CPU to be a big factor, of course, this is based on my assumption that they ARE using 1.1 shaders exclusively, as I havent actually delved into the EQ2 shaders myself.
Also, the fact they are now rewritting the shaders in 3.0 supports that, and as a result should finally start seeing better performance on any 3.0 capable card(heck they could have left it at 2.0 and you would see almost the same performance boost)

And obviously to show off my point, these are taken on a GeForce 4 Ti, which are only capable of SM1(upto PS1.3, not PS1.4 which only ATI cards did, or rather, DX8 instead of DX8.1)


As you can see they look identical to what any other card would show.
Atomizer is offline   Reply With Quote
Old 08-01-09, 01:48 AM   #40
ChrisRay
Registered User
 
ChrisRay's Avatar
 
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
Default Re: EQ2 Shader 3.0 upgrade

The reason CPU is a big factor is because of the games geometry and animation is offloaded to the CPU. Not because it uses 1.1 shaders.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080

|CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI

Nzone
SLI Forum Administrator

NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members
ChrisRay is offline   Reply With Quote
Old 08-01-09, 02:17 AM   #41
pakotlar
Guest
 
Posts: n/a
Default Re: EQ2 Shader 3.0 upgrade

Quote:
Originally Posted by ChrisRay View Post
This was actually one of the hardest things for people to understand at the time. People assumed the Geforce 6 used the same shader setup as the FX cards and it didnt. People assumed FP32 would mean drastic slowdown on the NV40 and NV45. But the reality was there was too much going on in the pipeline for that ever to become a primary bottleneck. The 3Danalyze program was great for toying with this back when FP16/FP32 existed,

Actually if you look at these rightmark3d Benches. You'll notice the 7800/7900 series copes with 16FP-> 32FP even better than the Geforce 6 series. I cant say why the Blinn shader seems to do better. But my guess is its makes more use of MADD type instructions. As well other pipeline improvements. Obviously IMO the Geforce 6 and Geforce 7 still share heritage from the NV35/Nv3x. Something alot of people will not want to admit.
Yeah, the 6800 was quite a surprise. VLIW took a lot of the blame I remember, althoug that probably had more to do with how it handled registers, constant registers, and combiners. the 6800 was a NV20 repeat pretty much, even more so because it was such a saviour compared to the previous generation (whereas Geforce 2 was great). Btw, as an aside, I wonder if current programs benefit from the knowledge of shader optimizatino to lower precion targets gleaned during this amalgamam.

Last edited by pakotlar; 08-04-09 at 01:06 PM.
  Reply With Quote
Old 08-01-09, 02:19 AM   #42
pakotlar
Guest
 
Posts: n/a
Default Re: EQ2 Shader 3.0 upgrade

Quote:
Originally Posted by ChrisRay View Post
The reason CPU is a big factor is because of the games geometry and animation is offloaded to the CPU. Not because it uses 1.1 shaders.
Chris what about stencil shadows. Those are CPU accelerated volumes not GPU right?
  Reply With Quote
Old 08-01-09, 05:50 AM   #43
ChrisRay
Registered User
 
ChrisRay's Avatar
 
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
Default Re: EQ2 Shader 3.0 upgrade

Quote:
Originally Posted by pakotlar View Post
Chris what about stencil shadows. Those are CPU accelerated volumes not GPU right?
I am not sure with the latest changes, Just got in today and found there are new shadowing options. CPU Shadow Volumes, And GPU shadow volumes. I am pretty sure the GPU shadows are soft shadows/pixel shader. But alas I just crashed and I need more time to mess with it.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080

|CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI

Nzone
SLI Forum Administrator

NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members
ChrisRay is offline   Reply With Quote
Old 08-01-09, 05:59 AM   #44
Atomizer
Registered User
 
Join Date: Jun 2004
Posts: 379
Default Re: EQ2 Shader 3.0 upgrade

So the new changes are already live? Might reinstall then and check it out at some point.

And why they would offload it intentionally to the CPU seems odd when shaders would have worked much better, I assume they were using a FVF pipeline for the most part?
Atomizer is offline   Reply With Quote

Old 08-01-09, 09:47 AM   #45
lognoronon
Registered User
 
Join Date: Jun 2004
Location: Reno, Nevada
Posts: 264
Default Re: EQ2 Shader 3.0 upgrade

Some of the changes are live, primarily the shadow related stuff. The other shader changes are supposed to go in with the next game update. From personal experience the game runs much better now but it still runs like a dog in so many areas.
__________________
System

Cooler Master Storm Trooper case
Asrock Extreme 3 Gen 3
I7-2600k
8 Gig of Corsair Vengeance DDR3 1600
EVGA GTX 560 TI
Lite On 20x DVD +- RW
WD 160 gig 7200 rpm hd
WD 750 gig 7200 rpm hd
Logitech G15 Keyboard
Razer Naga mouse
Corsair GS700
LG 22" L226WTQ

lognoronon is offline   Reply With Quote
Old 08-01-09, 07:39 PM   #46
Atomizer
Registered User
 
Join Date: Jun 2004
Posts: 379
Default Re: EQ2 Shader 3.0 upgrade

Well one thing I noticed with 2 related games last time I tried them both, EQ2 and Vanguard, was that Vanguard was more advanced technically, and ran smooth, but alot of the time due to artwork, looked like crap, EQ2 however looked nice, was less technically advanced, but still I couldnt max the settings, presumably because too much still relies on CPU power and isnt multi-threaded(well I assume so, since I have a Core 2 Duo so really it should have enough power if both cores are being used, but slower then a top end single core).
This should make it perform better then Vanguard, while also providing a slight graphical boost
Atomizer is offline   Reply With Quote
Old 08-02-09, 06:23 AM   #47
ChrisRay
Registered User
 
ChrisRay's Avatar
 
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
Default Re: EQ2 Shader 3.0 upgrade

The new shadows are definately not stencil volumes. They are from all I can tell. A form of SM 2.0 soft shadowing technique. They look really good and are truly dynamic. They also have a CPU shadowing method but it doesnt look anywhere near as good.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080

|CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI

Nzone
SLI Forum Administrator

NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members
ChrisRay is offline   Reply With Quote
Old 08-03-09, 09:35 AM   #48
Sgt_Pitt
Registered User
 
Sgt_Pitt's Avatar
 
Join Date: Jun 2004
Location: Australia
Posts: 820
Default Re: EQ2 Shader 3.0 upgrade

You think ? the CPU shadows are more accurate and the GPU shadows dont have point lights from torches yet.. and they also flicker for me.
__________________
i7 920 640g/b Raid 0 Corsair 64gig SSD Gigabyte EX58-UD3R 3x 27" Eyefinity 2x5870 crossfire Antec true 750w Logitech G15 6 gig kingston ddr3 1033
Windows 7 x64 Web design:http://www.advancedws.com.au:http://www.nobletrading.com.au:http://www.rackingaudits.com.au:http://www.imhandling.com.au
Sgt_Pitt is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 10:59 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.