|
|
#1 | |
|
Registered User
Join Date: May 2003
Location: flat, steamy Champaign, IL
Posts: 36
|
Hi. I have a multi-screen Dell XPS710 2-cpu EM64T system
with two graphics cards -- currently a Dell Anonymous 10de:0400 and a GeForce 7300 GT. I'm trying to load 2-D RGB textures into the cards as fast as possible and draw each texture once. Tried various sorts of optimizations -- texture size (it's faster to load a bunch of 256x256 textures than a smaller number of 1024x1024 ones, say), pixel-buffer objects (they slow things down in this case), pixel formats (on some cards BGR is faster than RGB, though on these it doesn't matter). If I just run this on one graphics card at a time, the best speed I've seen is about 250 Mpixel/sec (of 24-bit textures) on the faster card. That's about 750 MB/sec -- much slower than (a) system memory (~2.5GB/sec according to the STREAM Copy benchmark) or (b) what I'd expect for a 16x PCIe path (lspci -vv says "2.5 Gb/sec"). This is without using either Twinview or Xinerama, so that I get two separate screens, one per card. The test program's inner loop calls glBindTexture, glTexSubImage2D on a pre-set chunk of memory, and then (optionally) draws a textured quad. Noting that the program is spending about 100% CPU, I also tried running two copies of it: one on each screen (i.e. one on each graphics card). Result on this system: it's MUCH SLOWER, at least 2x, on two cards than it is on one. E.g. running just on card A (:0.0) gives 250 Mpix/s; running just on B gives about 230 Mpix/s. But running one process each on A and B concurrently gives, at best, about 110 Mpix/s. This is only a little faster than running both processes concurrently on the same screen, where I'd expect lots of degredation due to context switching. But using two cards and two processes on two CPUs, should I suffer from that?? I've also gotten to try this on a Quadroplex system with a pair of Quadro 4500 FX2 cards running in sync, for a total of 8 screens. On those, the peak texture load speed is lower (~150 Mpix/s), but multiple instances running on different screens get better overall performance, until it saturates at about 4-5 instances at 480 Mpix/s. That's ~1.5GB/s, much* more reasonable for a bus limitation. But I'd really like to understand why there's so much interference between two different graphics cards... Can I get better than this? |
|
|
|
|
|
|
#2 | |
|
Registered User
Join Date: May 2003
Location: flat, steamy Champaign, IL
Posts: 36
|
More on above. Found a surprise: with two different code organizations,
(a) for( a few dozen times ) { glBindTexture( n'th texture object ); glTexSubImage2D( n'th texture data ); glBegin( GL_QUADS ); ... glEnd(); } vs (b) glBindTexture( just_one_texture_object ); for( same number of times ) { glTexSubImage2D( n'th texture data ); glBegin( GL_QUADS ); ... glEnd(); } When just one card is in use, (a) and (b) run at nearly the same speed, within a few percent. So glBindTexture() must cost something, but not a whole lot. But when running two such programs concurrently -- on different X screens, one screen per nVidia card, and in different UNIX processes -- then (b) runs about twice as fast as (a)! Also, two copies of (b) each run almost as fast as a single instance of it, so I seem to have a good workaround for the problem of this thread. Further, two instances of (b) running in different windows on the same screen run essentially as fast (in texture uploaded pixels/sec) as on different cards. So switching GL contexts on a card must be pretty fast in case (b), though not so in (a). So I'm happy, but would I'd still like to *understand* why it works out this way. -- Stuart |
|
|
|
|
|
|
#3 |
|
NVIDIA Corporation
Join Date: Mar 2005
Posts: 2,487
|
Hi Stuart,
Can you please attach a copy of your test application? -- Aaron |
|
|
|
|
|
#4 | |
|
Registered User
Join Date: May 2003
Location: flat, steamy Champaign, IL
Posts: 36
|
Righto -- here it is, a bit cleaned up. Note the GRAPHICS sections,
and the compilation/usage commentary near the top. Note too that this version uses POSIX semaphores, so it works on e.g. Linux and maybe MacOSX but probably not Win32. |
|
|
|
|
|
|
#5 |
|
NVIDIA Corporation
Join Date: Mar 2005
Posts: 2,487
|
Hmm. I tried your attached app and get the same results with and without the -many option.
|
|
|
|
|
|
#6 | |
|
Registered User
Join Date: May 2003
Location: flat, steamy Champaign, IL
Posts: 36
|
Quote:
As mentioned in the original post, the "-many" (a) and non-"-many" (b) options ran at nearly the same speed if there was only a single window. But if there are two windows -- even if they are on different PCIe graphics cards and different X screens -- there's a big performance loss with -many but little loss when just re-using one texture tile per window. |
|
|
|
|
|
|
#7 |
|
NVIDIA Corporation
Join Date: Mar 2005
Posts: 2,487
|
Yes, I ran one on each X screen. Have you tried 169.04?
|
|
|
|
|
|
#8 | |
|
Registered User
Join Date: May 2003
Location: flat, steamy Champaign, IL
Posts: 36
|
Quote:
between "-W 128 -many" (~100 Mpix/sec/window) and "-W 128 (without -many)" (~210-220 Mpix/sec/window) -- about the same as with 100.14.23. I do find one case where -many doesn't matter. (Don't know whether this mattered with 100.14.23 or earlier.) Had found earlier that GL_TEXTURE_RECTANGLE_ARB is much speedier than GL_TEXTURE_2D for the same texture-tile size. Using things like "-W 128" or "-W 256x270" selects TEXTURE_RECTANGLE, but omitting -W tests with GL_TEXTURE_2D. (All the examples listed in txspeed.c's comments used -W.) So: with *no* -W, so that it uses 128x128-pixel GL_TEXTURE_2D's, I get about 84 Mpix/sec/window, regardless of -many. Is that a clue? Can I ask what sort of speeds you see? Output from some examples attached (BENCHMARKS.txt). This is the same setup as for the nvidia-bug-report attached earlier, except that it's now running the upgraded nvidia driver. Thanks for looking into this! |
|
|
|
|
![]() |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| My UT2003 Tweak Guide | DXnfiniteFX | Gaming Central | 48 | 10-30-02 11:59 PM |
| X Failed to load NVdriver | c00lr4c3r | NVIDIA Linux | 13 | 10-22-02 01:44 PM |
| Error messages... HELP!!!! | Imperito | NVIDIA Linux | 3 | 09-24-02 10:46 PM |
| GForce drivers installed...but dont work | tomfullery | NVIDIA Linux | 6 | 09-22-02 08:23 AM |
| Need help to get the X to work on my Acer TravelMate 630 | knchee | NVIDIA Linux | 16 | 09-19-02 10:16 PM |