|
|
#37 | |
|
Guest
Posts: n/a
|
Quote:
From what I see, the only reason that nv40 benefits from FP16 pp or FX12 int pp is that latency associated with executing FP32 operands is higher than wiht FP16 or int12, and maybe smaller memory footprint. NV30's (and 40's) ability to use lower precisions had to do with the fact that they didn't have constant registers, but stored the data in instruction slots instead, and could therefore store either int or variable float precisions... This had some associated penalties for state changes though. From recollection, although CineFx's ALU's were FP32 precision, it only had the ability execute 1/2 the FP32 operands/ cycle due to register limitation. So we have 64 output registers, but 2 count for 1 FP32 instruction. However, simultaneously the 5800 could store 64 FP16 instructions in its temporary registers, or 32 FP32. So its registers were orthogonal in that sense. R300 was limited to 32 temp. registers regardless of the precision used. So maximum throughput should be 64 FP16 instructions vs. R300's 32 FP24. In 5900's case, 1 register counted for 1 FP32 instruction, so therefore should in theory be able to execute 64 FP32 instructions. The R300 also has 32 floating point constant registers. However, as stated above, for flexibility (or due to the fact that 5800 was probably more geared for Direct X 8.1 than 9) FX5800 has no constant registers, and instead uses its copious instruction slots to store fragment program instructions. Can store int instructions as well, for FX12. Carries heavy penalties for instruction state changes. 6800 changed this up by offering 2 FP32 ALU's in 16 independant pipelines, each capable of executing 8 floating point instructions in dual issue, so double the theoretical pixel shader performance of the NV35. Probably faster than that due to use of SIMD programming model vs VLIW (and better utilization as a 16*1 and not 8*2). Not sure whats going on with constant registers, but assuming is the same. FP16 normalize, so has some FP16 specific optimizations still. Registers are once again orthogonal for full FP32 performance. In this case I can't imagine any reason other than space and some slight latency penalty for using FP32 compared to FP16. Assuming it used the same register combiner scheme, you would have 2x the # of FP16 registers as is needed, which could come in handy maybe to ease the state change penalty associated with storing constant registers in instruction slots. G80 is an obvious extension and evolution of the G70 ALU. Do they do away with register combiners for FP16 ops? Does this sound right. Also, with regard to NV35, would it be fair to say that since two registers were still needed for 1 FP32 instruction that NV35 was limited by latency for issuing FP32 instructions. Whereas NV40 had less to arbitrate due to 1 FP32 instr. = 1 register. However, would 2 FP16 instructions fit in 1 register? So with a good driver efficiency can be gained when FP32 instruction fills out the register (wheareas teh FP16 instructions can be more tightly packed). edit: And btw the "32" constant registers typically reported for NV30, matches nicely with the "512" instruction slots, as this allows for 16 FP32 instructions to be issued concurrently, right? So doesn't this in effect mean that NV30 has 1/2 the FP32 performance that R300 has for FP24? Due to instruction slot limit? |
|
|
|
|
#38 | ||
|
Registered User
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
|
That does sound more or less accurate to me. The NV35 should have been able to uses upgraded FP16 units and combine them into a single 32 bit float. Obviously this would cause the performance throughput to half. Theres no doubt that the Geforce FX 5900 was heavily engineered with FP16 in mind as it was never really capable of running FP32 at anything but half speed. With the Geforce FX 5800 and the NV30. I dont really think that the card was ever really "Designed" with DirectX 9.0 mind. It had the capability to run FP32 through one of its shaders. But its other shader units were largely integer based as you mentioned. So I dont really see anything wrong with your history of the FX 5900/6800.
((well one minor mistake. FX 5800 was 4x2 and 6800 was 16x1 I dont really consider the Zpixel capabilities of either hardware as "True" pipelines. Because that would make the 6800 32x1.)) But thats a pretty easy thing to forget. Quote:
Actually if you look at these rightmark3d Benches. You'll notice the 7800/7900 series copes with 16FP-> 32FP even better than the Geforce 6 series. I cant say why the Blinn shader seems to do better. But my guess is its makes more use of MADD type instructions. As well other pipeline improvements. Obviously IMO the Geforce 6 and Geforce 7 still share heritage from the NV35/Nv3x. Something alot of people will not want to admit. Quote:
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080 |CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI SLI Forum Administrator NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members |
||
|
|
|
|
|
#39 |
|
Registered User
Join Date: Jun 2004
Posts: 379
|
Im not about to get into the middle of whatever is going on(sorry I skimmed most of what you guys wrote so you might have already addressed it), but EQ2 I find still suffered from performance issues last time I tried it(with my 8800GTX), which I assume is because all of the games shaders are written in 1.1 and/or not using shaders at all for alot of it, causing the CPU to be a big factor, of course, this is based on my assumption that they ARE using 1.1 shaders exclusively, as I havent actually delved into the EQ2 shaders myself.
Also, the fact they are now rewritting the shaders in 3.0 supports that, and as a result should finally start seeing better performance on any 3.0 capable card(heck they could have left it at 2.0 and you would see almost the same performance boost) And obviously to show off my point, these are taken on a GeForce 4 Ti, which are only capable of SM1(upto PS1.3, not PS1.4 which only ATI cards did, or rather, DX8 instead of DX8.1) ![]() ![]() As you can see they look identical to what any other card would show. |
|
|
|
|
|
#40 | |
|
Registered User
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
|
The reason CPU is a big factor is because of the games geometry and animation is offloaded to the CPU. Not because it uses 1.1 shaders.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080 |CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI SLI Forum Administrator NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members |
|
|
|
|
|
|
#41 | |
|
Guest
Posts: n/a
|
Quote:
Last edited by pakotlar; 08-04-09 at 12:06 PM. |
|
|
|
|
#42 | |
|
Guest
Posts: n/a
|
|
|
|
|
|
#43 |
|
Registered User
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
|
I am not sure with the latest changes, Just got in today and found there are new shadowing options. CPU Shadow Volumes, And GPU shadow volumes. I am pretty sure the GPU shadows are soft shadows/pixel shader. But alas I just crashed and I need more time to mess with it.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080 |CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI SLI Forum Administrator NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members |
|
|
|
|
|
#44 |
|
Registered User
Join Date: Jun 2004
Posts: 379
|
So the new changes are already live? Might reinstall then and check it out at some point.
And why they would offload it intentionally to the CPU seems odd when shaders would have worked much better, I assume they were using a FVF pipeline for the most part? |
|
|
|
|
|
#45 |
|
Registered User
Join Date: Jun 2004
Location: Reno, Nevada
Posts: 264
|
Some of the changes are live, primarily the shadow related stuff. The other shader changes are supposed to go in with the next game update. From personal experience the game runs much better now but it still runs like a dog in so many areas.
__________________
System Cooler Master Storm Trooper case Asrock Extreme 3 Gen 3 I7-2600k 8 Gig of Corsair Vengeance DDR3 1600 EVGA GTX 560 TI Lite On 20x DVD +- RW WD 160 gig 7200 rpm hd WD 750 gig 7200 rpm hd Logitech G15 Keyboard Razer Naga mouse Corsair GS700 LG 22" L226WTQ |
|
|
|
|
|
#46 |
|
Registered User
Join Date: Jun 2004
Posts: 379
|
Well one thing I noticed with 2 related games last time I tried them both, EQ2 and Vanguard, was that Vanguard was more advanced technically, and ran smooth, but alot of the time due to artwork, looked like crap, EQ2 however looked nice, was less technically advanced, but still I couldnt max the settings, presumably because too much still relies on CPU power and isnt multi-threaded(well I assume so, since I have a Core 2 Duo so really it should have enough power if both cores are being used, but slower then a top end single core).
This should make it perform better then Vanguard, while also providing a slight graphical boost |
|
|
|
|
|
#47 |
|
Registered User
Join Date: Mar 2003
Location: Tulsa
Posts: 5,101
|
The new shadows are definately not stencil volumes. They are from all I can tell. A form of SM 2.0 soft shadowing technique. They look really good and are truly dynamic. They also have a CPU shadowing method but it doesnt look anywhere near as good.
__________________
|CPU: Intel I7 Lynnfield @ 3.0 Ghz|Mobo:Asus P7P55 WS Supercomputer |Memory:8 Gigs DDR3 1333|Video:Geforce GTX 295 Quad SLI|Monitor:Samsung Syncmaster 1680x1080 3D Vision\/Olevia 27 Inch Widescreen HDTV 1920x1080 |CPU: AMD Phenom 9600 Black Edition @ 2.5 Ghz|Mobo:Asus M3n HT Deluxe Nforce 780A|Memory: 4 gigs DDR2 800| Video: Geforce GTX 280x2 SLI SLI Forum Administrator NVIDIA User Group Members receive free software and/or hardware from NVIDIA from time to time to facilitate the evaluation of NVIDIA products. However, the opinions expressed are solely those of the members |
|
|
|
|
|
#48 |
|
Registered User
Join Date: Jun 2004
Location: Australia
Posts: 820
|
You think ? the CPU shadows are more accurate and the GPU shadows dont have point lights from torches yet.. and they also flicker for me.
__________________
i7 920 640g/b Raid 0 Corsair 64gig SSD Gigabyte EX58-UD3R 3x 27" Eyefinity 2x5870 crossfire Antec true 750w Logitech G15 6 gig kingston ddr3 1033 Windows 7 x64 Web design:http://www.advancedws.com.au:http://www.nobletrading.com.au:http://www.rackingaudits.com.au:http://www.imhandling.com.au |
|
|
|
![]() |
| Thread Tools | |
|
|