nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   contention among multiple cards (poor texture-loading performance)? (http://www.nvnews.net/vbulletin/showthread.php?t=101697)

stuartlevy 11-04-07 12:42 PM

contention among multiple cards (poor texture-loading performance)?
 
1 Attachment(s)
Hi. I have a multi-screen Dell XPS710 2-cpu EM64T system
with two graphics cards -- currently a Dell Anonymous 10de:0400
and a GeForce 7300 GT. I'm trying to load 2-D RGB textures into the
cards as fast as possible and draw each texture once.

Tried various sorts of optimizations -- texture size (it's faster to load
a bunch of 256x256 textures than a smaller number of 1024x1024 ones, say),
pixel-buffer objects (they slow things down in this case), pixel formats
(on some cards BGR is faster than RGB, though on these it doesn't matter).


If I just run this on one graphics card at a time, the best speed I've seen is
about 250 Mpixel/sec (of 24-bit textures) on the faster card. That's about 750 MB/sec -- much slower than (a) system memory (~2.5GB/sec according to the STREAM Copy benchmark) or (b) what I'd expect for a 16x PCIe path
(lspci -vv says "2.5 Gb/sec").

This is without using either Twinview or Xinerama, so that I get two separate screens, one per card.

The test program's inner loop calls glBindTexture, glTexSubImage2D
on a pre-set chunk of memory, and then (optionally) draws a textured quad.

Noting that the program is spending about 100% CPU, I also tried running
two copies of it: one on each screen (i.e. one on each graphics card).
Result on this system: it's MUCH SLOWER, at least 2x, on two cards than
it is on one.

E.g. running just on card A (:0.0) gives 250 Mpix/s; running just on B gives
about 230 Mpix/s.

But running one process each on A and B concurrently gives, at best,
about 110 Mpix/s. This is only a little faster than running both processes
concurrently on the same screen, where I'd expect lots of degredation
due to context switching. But using two cards and two processes on two CPUs, should I suffer from that??

I've also gotten to try this on a Quadroplex system with a pair of
Quadro 4500 FX2 cards running in sync, for a total of 8 screens.
On those, the peak texture load speed is lower (~150 Mpix/s), but multiple instances running on different screens get better overall performance,
until it saturates at about 4-5 instances at 480 Mpix/s. That's ~1.5GB/s,
much* more reasonable for a bus limitation.

But I'd really like to understand why there's so much interference between
two different graphics cards... Can I get better than this?

stuartlevy 11-06-07 11:44 AM

Re: contention among multiple cards (poor texture-loading performance)?
 
More on above. Found a surprise: with two different code organizations,
(a)
for( a few dozen times ) {
glBindTexture( n'th texture object );
glTexSubImage2D( n'th texture data );
glBegin( GL_QUADS ); ... glEnd();
}
vs
(b)
glBindTexture( just_one_texture_object );
for( same number of times ) {
glTexSubImage2D( n'th texture data );
glBegin( GL_QUADS ); ... glEnd();
}

When just one card is in use, (a) and (b) run at nearly the same speed,
within a few percent. So glBindTexture() must cost something, but
not a whole lot.

But when running two such programs concurrently -- on different X screens,
one screen per nVidia card, and in different UNIX processes --
then (b) runs about twice as fast as (a)!

Also, two copies of (b) each run almost as fast as a single instance of it,
so I seem to have a good workaround for the problem of this thread.

Further, two instances of (b) running in different windows on the same screen
run essentially as fast (in texture uploaded pixels/sec) as on different cards.
So switching GL contexts on a card must be pretty fast in case (b), though
not so in (a).

So I'm happy, but would I'd still like to *understand* why it works out this way.
-- Stuart

AaronP 11-06-07 12:24 PM

Re: contention among multiple cards (poor texture-loading performance)?
 
Hi Stuart,

Can you please attach a copy of your test application?

-- Aaron

stuartlevy 11-06-07 06:10 PM

Re: contention among multiple cards (poor texture-loading performance)?
 
1 Attachment(s)
Righto -- here it is, a bit cleaned up. Note the GRAPHICS sections,
and the compilation/usage commentary near the top.

Note too that this version uses POSIX semaphores, so
it works on e.g. Linux and maybe MacOSX but probably not Win32.

AaronP 11-19-07 07:10 PM

Re: contention among multiple cards (poor texture-loading performance)?
 
Hmm. I tried your attached app and get the same results with and without the -many option.

stuartlevy 11-21-07 03:27 PM

Re: contention among multiple cards (poor texture-loading performance)?
 
Quote:

Originally Posted by AaronP
Hmm. I tried your attached app and get the same results with and without the -many option.

Did you try running the program with multiple windows?

As mentioned in the original post, the "-many" (a) and non-"-many" (b) options ran at nearly
the same speed if there was only a single window. But if there are two windows -- even if they
are on different PCIe graphics cards and different X screens -- there's a big performance loss
with -many but little loss when just re-using one texture tile per window.

AaronP 11-21-07 04:11 PM

Re: contention among multiple cards (poor texture-loading performance)?
 
Yes, I ran one on each X screen. Have you tried 169.04?

stuartlevy 11-21-07 05:55 PM

Re: contention among multiple cards (poor texture-loading performance)?
 
1 Attachment(s)
Quote:

Originally Posted by AaronP
Yes, I ran one on each X screen. Have you tried 169.04?

OK, I just did switch from 100.14.23 to 169.04. I still see a big difference, just about twofold, in performance
between "-W 128 -many" (~100 Mpix/sec/window) and "-W 128 (without -many)" (~210-220 Mpix/sec/window) --
about the same as with 100.14.23.

I do find one case where -many doesn't matter. (Don't know whether this mattered with 100.14.23 or earlier.)
Had found earlier that GL_TEXTURE_RECTANGLE_ARB is much speedier than GL_TEXTURE_2D
for the same texture-tile size. Using things like "-W 128" or "-W 256x270" selects TEXTURE_RECTANGLE,
but omitting -W tests with GL_TEXTURE_2D. (All the examples listed in txspeed.c's comments used -W.)

So: with *no* -W, so that it uses 128x128-pixel GL_TEXTURE_2D's, I get about 84 Mpix/sec/window,
regardless of -many.

Is that a clue? Can I ask what sort of speeds you see?

Output from some examples attached (BENCHMARKS.txt). This is the same setup as for the nvidia-bug-report attached earlier, except that it's now running the upgraded nvidia driver.

Thanks for looking into this!


All times are GMT -5. The time now is 07:09 PM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.