Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 10-21-04, 04:09 AM   #1
netsach
Registered User
 
Join Date: Oct 2004
Location: France
Posts: 6
Question Multiple card, multiple accelerated display performance decrease

Hello !

Fiddling with graphics, i have a question : I plan to create and run an accelerated program of mine on multiple screens, to have a bunch of 3D views to get a better immersion feeling. As of today, the simulation runs only on one screen, but i plan to get it to display on 5 screens.

This is why i bought some other cards, fiddled with Xfree and nvidia driver, and finally got everything to work together. I am using the following stuff :
Mandrake 10.0 official (clean install)
xorg-x11 6.7.0-3mdk (from cooker)
1x Asus V8170 GeForce4 MX 440-SE AGP (64M)
2x PNY Verto GeForceFX 5200 PCI Dual VGA(128M)
nVidia Driver 53.36 (61.11 wouln't work whatever i did)
As stated in a sticky'ed post, some information about my stuff (sorry, i do not know how to get precise informatio about my motherboard)

Code:
----------
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA Linux x86 NVIDIA Kernel Module  1.0-5336  Wed Jan 14 18:29:26 PST 2004
GCC version:  gcc version 3.3.2 (Mandrake Linux 10.0 3.3.2-6mdk)
----------
$ cat /proc/driver/nvidia/agp/host-bridge
Host Bridge: 	 Intel Corp. 82865G/PE/P Processor to I/O Controller
Fast Writes: 	 Supported
SBA: 		 Supported
AGP Rates: 	 4x 2x 1x 
Registers: 	 0x1f004217:0x00000104
----------
$ cat /proc/driver/nvidia/agp/card
Fast Writes: 	 Supported
SBA: 		 Not Supported
AGP Rates: 	 4x 2x 1x 
Registers: 	 0x1f000017:0x1f000104
----------
$ cat /proc/driver/nvidia/agp/status
Status: 	 Enabled
Driver: 	 AGPGART
AGP Rate: 	 4x
Fast Writes: 	 Disabled
SBA: 		 Disabled
----------
$ cat /proc/driver/nvidia/cards/2
Model: 		 GeForce FX 5200
IRQ:   		 22
Video BIOS: 	 04.34.20.34.14
Card Type: 	 PCI
----------
$ cat /proc/driver/nvidia/cards/1
Model: 		 GeForce FX 5200
IRQ:   		 18
Video BIOS: 	 04.34.20.34.14
Card Type: 	 PCI
----------
$ cat /proc/driver/nvidia/cards/0
Model: 		 GeForce4 MX 440
IRQ:   		 16
Video BIOS: 	 04.17.00.69.36
Card Type: 	 AGP
----------
$ X -version

Release Date: 18 December 2003
X Protocol Version 11, Revision 0, Release 6.7
Build Operating System: Linux 2.6.8.1-2mdkenterprise i686 [ELF] 
Current Operating System: Linux racer 2.6.3-7mdk #1 Wed Mar 17 15:56:42 CET 2004 i686
Build Date: 23 September 2004
	Before reporting problems, check http://wiki.X.Org
	to make sure that you have the latest version.
Module Loader present
----------
$ dmesg | grep 'CPU:'
CPU:     After generic identify, caps: bfebfbff 00000000 00000000 00000000
CPU:     After vendor identify, caps: bfebfbff 00000000 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU:     After all inits, caps: bfebfbff 00000000 00000000 00000080
CPU: Intel(R) Pentium(R) 4 CPU 2.60GHz stepping 09
----------
Here are the details of my xorg configuration file


Code:
Section "Files"
    FontPath "unix/:-1"
EndSection

Section "ServerFlags"
    AllowMouseOpenFail
EndSection

Section "Module"
    Load "dbe" # Double-Buffering Extension
    Load "v4l" # Video for Linux
    Load "extmod"
    Load "type1"
    Load "freetype"
    Load "glx" # 3D layer
EndSection

Section "InputDevice"
    Identifier "Keyboard1"
    Driver "Keyboard"
    Option "XkbModel" "pc105"
    Option "XkbLayout" "fr"
    Option "XkbOptions" ""
EndSection

Section "InputDevice"
    Identifier "Mouse1"
    Driver "mouse"
    Option "Protocol" "ExplorerPS/2"
    Option "Device" "/dev/mouse"
    Option "ZAxisMapping" "6 7"
EndSection

Section "Monitor"
    Identifier "monitor1"
    HorizSync 30-70
    VertRefresh 50-60
EndSection

Section "Monitor"
    Identifier "monitor2"
    HorizSync 30-70
    VertRefresh 50-60
EndSection

Section "Monitor"
    Identifier "monitor3"
    HorizSync 30-70
    VertRefresh 50-60
EndSection

Section "Monitor"
    Identifier "monitor4"
    HorizSync 30-70
    VertRefresh 50-60
EndSection

Section "Monitor"
    Identifier "monitor5"
    HorizSync 30-70
    VertRefresh 50-60
EndSection

Section "Device"
    Identifier "device1"
    VendorName "NVidia"
    BoardName "NVIDIA GeForce 4 (generic)"
    Driver "nvidia"
    BusID "PCI:1:0:0"
    Option "DPMS"
    Option "UseInt10Module" "on"
EndSection

Section "Device"
    Identifier "device2"
    VendorName "NVidia"
    BoardName "NVIDIA GeForce FX (generic)"
    Driver "nvidia"
    BusID "PCI:3:1:0"
    Screen 0
    Option "DPMS"
    Option "UseInt10Module" "on"
    Option "ConnectedMonitor" "CRT, CRT"
EndSection

Section "Device"
    Identifier "device3"
    VendorName "NVidia"
    BoardName "NVIDIA GeForce FX (generic)"
    Driver "nvidia"
    BusID "PCI:3:1:0"
    Screen 1
    Option "DPMS"
    Option "UseInt10Module" "on"
    Option "ConnectedMonitor" "CRT, CRT"
EndSection

Section "Device"
    Identifier "device4"
    VendorName "NVidia"
    BoardName "NVIDIA GeForce FX (generic)"
    Driver "nvidia"
    BusID "PCI:3:4:0"
    Screen 0
    Option "DPMS"
    Option "UseInt10Module" "on"
    Option "ConnectedMonitor" "CRT, CRT"
EndSection

Section "Device"
    Identifier "device5"
    VendorName "NVidia"
    BoardName "NVIDIA GeForce FX (generic)"
    Driver "nvidia"
    BusID "PCI:3:4:0"
    Screen 1
    Option "DPMS"
    Option "UseInt10Module" "on"
    Option "ConnectedMonitor" "CRT, CRT"
EndSection

Section "Screen"
    Identifier "screen1"
    Device "device1"
    Monitor "monitor1"
    DefaultColorDepth 24
    Subsection "Display"
        Depth 24
        Virtual 1024 768
    EndSubsection
EndSection

Section "Screen"
    Identifier "screen2"
    Device "device2"
    Monitor "monitor2"
    DefaultColorDepth 24
    Subsection "Display"
        Depth 24
        Virtual 1024 768
    EndSubsection
EndSection

Section "Screen"
    Identifier "screen3"
    Device "device3"
    Monitor "monitor3"
    DefaultColorDepth 24
    Subsection "Display"
        Depth 24
        Virtual 1024 768
    EndSubsection
EndSection

Section "Screen"
    Identifier "screen4"
    Device "device4"
    Monitor "monitor4"
    DefaultColorDepth 24
    Subsection "Display"
        Depth 24
        Virtual 1024 768
    EndSubsection
EndSection

Section "Screen"
    Identifier "screen5"
    Device "device5"
    Monitor "monitor5"
    DefaultColorDepth 24
    Subsection "Display"
        Depth 24
        Virtual 1024 768
    EndSubsection
EndSection

Section "ServerLayout"
    Identifier "layout1"
    InputDevice "Keyboard1" "CoreKeyboard"
    InputDevice "Mouse1" "CorePointer"
    Screen "screen1"
    Screen "screen2" LeftOf "screen1"
    Screen "screen3" LeftOf "screen2"
    Screen "screen4" RightOf "screen1"
    Screen "screen5" RightOf "screen4"
EndSection
I ran glxgears (standard window size) on each display, tried the following combinations, and got the following results (i _KNOW_ glxgears is no benchmark tool, but please read on) :
one glxgears on GF4 :
1700fps
one glxgears on GFX #1 :
1300fps
two glxgears on GFX #1 :
250fps each
one glxgears on GFX #2 :
1300fps
two glxgears on GFX #2 :
250fps each
two glxgears on GFX #2, one glxgears on GF4 :
245fps on each GFX output
945fpx on the GF4
one glxgears on each GFX :
1036fps
one glxgears on each GFX, one glxgears on GF4 :
1030fps on each GFX
1700fps on the GF4
two glxgears on each GFX :
90fps on the each 'primary' output
75fps on the each 'secondary' output
two glxgears on each GFX, one glxgears on GF4 :
65-70fps for each display
So i found out that the fps drop dramatically from 1300-1700 fps (when running on a single display) to 60 fps (when running one glxgears per display). No need to tell me again glxgears is not a benchmark tool, but this loss of performance is quite strange and scares me. Please note that i got the same results with twinview or with separate displays.

I ran some quake3 benchmark (linuxq3apoint-1.32b.x86.run, high quality, quake3 +set timedemo 1 +demo four) : GFX @ 87fps, GF4 @ 170fps. But i don't know how i could run it at the same time on 3 display. I tried running quake in the background with &, but each quake instance waits for the others to finish before running (ie demo runs on GF4, is waiting on GFX #1, is waiting on GFX #2 ; when GF4 finishes, GFX #1 starts and GFX #2 is waiting, and so on, even if there are 3 process running). Seems like i don't know how to benchmark simultaneous displays.

Concerning my project and regarding the glxresults, my best option so far, would be to forget the 5-display idea and keep only 3 of them (one on the GF4, one per GFX). But of course it is not the best option : i want 5 displays

That said, my questions are the following :


- is there a benchmark tool available to test multiple accelerated displays ?
- if it is 'normal' behaviour, why is it getting so slow ?
- can i do anything to prevent this, as it may cause my project to become impossible to realize ?

Any idea, any help or suggestion is welcome, i really do not know what i can do now ! See you later, and thank you all in advance for your attention !
netsach is offline   Reply With Quote
Old 10-21-04, 04:27 AM   #2
Thunderbird
 
Join Date: Jul 2002
Location: Netherlands, Europe
Posts: 2,105
Default Re: Multiple card, multiple accelerated display performance decrease

Note that when you run multiple opengl apps on one card or even on multiple cards that the cpu will become a bottleneck. Assuming you could run lets say quake3 on two cards at the same time, you won't get the amount of fps that you would get on a single card. You might get a lot less ..
Thunderbird is offline   Reply With Quote
Old 10-21-04, 07:10 AM   #3
vincoof
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 104
Send a message via ICQ to vincoof Send a message via AIM to vincoof
Default Re: Multiple card, multiple accelerated display performance decrease

Right, the only meaningful test for graphics cards is with applications that are exclusively limited on the graphics card side. Since glxgears uses as much CPU as it can, running two glxgears reflect the CPU limitation rather than the graphics card limitation.

Ideally, a representative benchmark would need to use at most 50% CPU and 50% RAM, so that there is no limitation on these sides when running two tests simultaneously.
For triple-test, use at most 33% CPU, etc.
vincoof is offline   Reply With Quote
Old 10-21-04, 11:56 AM   #4
netsach
Registered User
 
Join Date: Oct 2004
Location: France
Posts: 6
Arrow Re: Multiple card, multiple accelerated display performance decrease

Thanks for your advices.

As far as i searched, there is no benchmark tools available for multiple display tests.

I plan to write my own, and i have a question : According to your answers, i must make sure the CPU usage keeps being low compared to the GPU part (this is what i understood from your commentary). I though of creating a large landscape filled with objects, lights etc, creating that as a display list, and moving the camera around. So the display list render time should be superior to the cpu usage, shouldn't it ? Maybe i should put the drawing function on a separate thread for each window too, so that way i would not refresh them one after the other, ie to get FPS per display, and not an overall results.

Is this a good idea ? What is wrong with it or what should i change/add to create a 'valid' benchmark tool ?

Thanx again !
netsach is offline   Reply With Quote
Old 10-21-04, 09:19 PM   #5
vincoof
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 104
Send a message via ICQ to vincoof Send a message via AIM to vincoof
Default Re: Multiple card, multiple accelerated display performance decrease

Quote:
Originally Posted by netsach
I though of creating a large landscape filled with objects, lights etc, creating that as a display list, and moving the camera around. So the display list render time should be superior to the cpu usage, shouldn't it ?
Not sure. There are many factors that can slow down the CPU. For benchmarking purposes, I recommend you set an maximum refresh rate (ideally exposed on the command line) using wait functions, so that the application can let other processes work, and the benchmark consists in counting how many milliseconds the frames took to render. For instance if you set the application max refresh at 10Hz, each frame should be rendered in less than 100 ms. So, imagine a frame is rendered in 50 ms, you call a wait/sleep function for 50 ms ; and if the next frame is rendered in 20 ms, wait/sleep for 80 ms. Thus the total time spent to render two frames is 70 ms (50 in the first and 20 in the second frame).

Of course you have to make sure the idle time is always enough to let a second or a third process work in the same time. For instance if you plan to setup the refresh rate at 10 Hz for triple-view, you should make sure that each frame is rendered in less than 33 ms otherwise the CPU will be overloaded and the benchmark will be meaningless graphics-wise. You can check CPU usage with tools like "gkrellm" (full-featured graphical monitor) or "top" (the famous CPU+RAM monitor in text mode). Whenever you see the CPU usage higher than 99% you should decrease the refresh rate of your application, thus the idea of leaving this parameter in the command line

As a last note, please keep in mind that there is no way of fully comparing graphical solutions. A graphics card may be texture-limited as well as shader-limited as well as vertex-limited, or even bandwidth-limited. In other words, the perfect benchmark software does not exist, and can not exist. With that said, good luck for your program ! Feel free to post it around here when it's finished (one may say "when it's done")

Last edited by vincoof; 10-21-04 at 09:31 PM.
vincoof is offline   Reply With Quote
Old 10-22-04, 02:23 AM   #6
netsach
Registered User
 
Join Date: Oct 2004
Location: France
Posts: 6
Lightbulb Re: Multiple card, multiple accelerated display performance decrease

Thank you for your attention and ideas !

One last thing : i thought the type of bus could be responsible for a part of the performance "loss" (or it may be the processor). I mean, i have one AGP and 2 PCI cards, and the way opengl works is (i think) by sending command to the graphic driver, which sends commands to the card, transmitting data and commands through the bus.

In my case, i heard that PCI is slow. Could it be, that the drop is caused by the PCI bus, struggling to transfert data ? I mean, if i have one display per card, the fps drop is not dramatic : 1750->1700 on GF4 AGP, 1300->1000 on each GFX PCI. But when i ask for two displays per GFX card, it drops from 1300->250 when not using the GF4 (ie 4 displays), and 1300->70 when using all 5 displays.

Is it reasonnable to think so ?

If i mesaure the time spent in OpenGL primitives, i will surely get the render time, and the rest of the time could be considered CPU time. But as far as i know, the OpenGL primitives only return when the data are sent (to the driver, or maybe to the card i do not know). So in the render time, there may be the "transfert" time too. Or i think so.

Well, is there a way i could have a grasp of the time spent using the bus ?

If i could, that would clearly show whether the fault comes from the GPU, from the CPU or from the bus
netsach is offline   Reply With Quote
Old 10-22-04, 04:15 AM   #7
vincoof
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 104
Send a message via ICQ to vincoof Send a message via AIM to vincoof
Default Re: Multiple card, multiple accelerated display performance decrease

Quote:
Originally Posted by netsach
If i mesaure the time spent in OpenGL primitives, i will surely get the render time, and the rest of the time could be considered CPU time. But as far as i know, the OpenGL primitives only return when the data are sent (to the driver, or maybe to the card i do not know).
Right, but if you end up every frame with a glFlush + glFinish, and is the driver is not tweaky, you should be able to get the time really spent to render things.
Another technique to force flushing the GL commands is to finish each render by a complete image download. Because you download the whole image, the driver must wait for all GL commands to complete before returning the correct data.

So if you want to be sure the rendering of the frame is really finished at some time, call something like this before swapping buffers :
glFlush();
glFinish();
glReadPixels(0,0,width,height,GL_BGRA,GL_UNSIGNED_ INT_8_8_8_8_REV,pixels);
/* width and height are the dimensions of your viewport, and pixels is an array allocated with width*height*4 bytes (typically allocated when the window is resized)) */
The GL_BGRA and GL_UNSIGNED_INT_8_8_8_8_REV enums result in the ARGB format, which is NVIDIA's native image format, so it accelerates downloads because the driver doesn't need to re-order channels.


Quote:
Originally Posted by netsach
Well, is there a way i could have a grasp of the time spent using the bus ?
Of course, send lots of uploads and downloads.
You can upload with functions like glTexImage (warning, don't use glCopyTexImage because it operates only on the server side) or glDrawPixels, and download with glGetTexImage and glReadPixels.
The most easy way would be eg to call lots of glDrawPixels and glReadPixels, by switching between each of them (ie call Draw-Read-Draw-Read, not Draw-Draw-Read-Read) and, of course, finish with a Read (not a Draw) as mentioned above

Last edited by vincoof; 10-22-04 at 05:07 AM.
vincoof is offline   Reply With Quote
Old 10-22-04, 05:06 AM   #8
netsach
Registered User
 
Join Date: Oct 2004
Location: France
Posts: 6
Red face Re: Multiple card, multiple accelerated display performance decrease

For those who may be interested, here is the first version of multi-bench. The code is based on NeHe's Lesson 05, ported to GLX by Mihael Vrbanec. I added support for multiple display, and the FPS count. To compile/run it you need : OpenGL headers and libraries, Xserver with GLX and XF86VidMode extensions, and gcc of course ! Take a look at the readme for some info.

I know this is very simple, but that shows the limitations (which may be a combination of bus bandwidth, GPU or CPU). Maybe it will be useful for some of you. By the way, nothing graphical fancy is done here, namely a box and a tetrahedron rotating (just as in lesson05 ) If you have any comments please let me know. For the time being, i'll try to make my simulation work on multiple display to see the real impact. More on that later

Stay tuned...
Attached Files
File Type: zip multiscr-1.0.zip (7.2 KB, 149 views)
netsach is offline   Reply With Quote

Old 10-22-04, 05:12 AM   #9
vincoof
Registered User
 
Join Date: Sep 2004
Location: France
Posts: 104
Send a message via ICQ to vincoof Send a message via AIM to vincoof
Default Re: Multiple card, multiple accelerated display performance decrease

Great, I just need to get the XF86VidMode thingy, as I'm getting undefined references to functions like XF86VidModeSwitchToMode.

In the end I'm sure you could even submit to NeHe, especially if you're interested in getting feedback.
vincoof is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Similar Threads
Thread Thread Starter Forum Replies Last Post
NVIDIA Unleashes the GeForce GTX 670 Graphics Card ' Performance Perfected (WCCFTECH) News GeForce GTX 670 Reviews 0 05-10-12 09:40 AM
302.07 (beta) for Linux x86/x86_64 released AaronP NVIDIA Linux 0 05-02-12 10:55 AM
My UT2003 Tweak Guide DXnfiniteFX Gaming Central 48 10-31-02 12:59 AM
Glx mrbig1344 NVIDIA Linux 7 09-30-02 07:45 AM
Remote Accelerated GLX and Raster Display Oddity dzzero NVIDIA Linux 2 09-18-02 02:59 PM

All times are GMT -5. The time now is 06:50 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.