I just watched the GPU conference presentations, and it looks like it might help me. What I'm doing is playing multiple HD videos on two (or more I suppose) GPUs. Currently, I pass the video textures as raw pixels (decoded pixel format) to each context/GPU, then convert to half float RGBA.
I do all this with multiple threads and contexts. One share context per GPU, then one thread per playing movie per GPU for transfers, then one thread per screen.
My main problem, apart from the theoretical limits of the PCIe bus, is that when I transfer big frames, I see blocking between the two GPUs. I'm isolated everything and removed (to the best of my abilities) all contention in my code, and the problem can be 'activated' or not by triggering the copy from to the GPU. This is on a PBO, triggered as asynchronous DMA and left alone as long as possible. The contention is affected by the speed of the weaker of the two GPUs, almost as if both GPUs sync for that one section.
As a side-bar, the other problem is that I have, on one of the GPUs, a 'mixed' screen with openGL and X-windows, and on the other display (half of the twin view) OpenGL only, and I've had trouble sharing textures without using twinview so they're on the same display connection. The consequence is that I see tearing on the fullscreen texture. If nothing else, this extension will let me render on one screen and 'blit' to the other.
But my real gain might
be doing a copy between GPUs, instead of doing the same copy to each GPU.
- Should that avoid the contention, or will it just introduce another level of contention?
- Would the speed of the lesser GPU be less of an issue?
- It looks like I won't get DMA with Quadro. What about if one GPU were to be Quadro? So far, we've shipped one GTX 275 and one 8800 (upgraded from 8400 to minimize this problem).
By the way, this is a commercial/professional use, but my emails to the pro group seem to vaporize. We sell playback devices using Nvidia GPUs as programmable video processing.