caveat: I don't know anything about xvideo, but have used GL for a bit.
run glxinfo, see if you get
as one of the extensions. This may do the colorspace conversion for you in GL. I don't know if it's fast or not - e.g., a similar extension on a radeon in OS X is too slow to use.
Alternatively, to sync to the retrace, you could try polling the glXGetVideoSyncSGI command - I think there's a WaitVideoSync version as well. You may have more luck with these (and judicious usleeps) than waiting for the retrace. I think the Wait function used to burn CPU on previous driver versions, but I've read about it in the changelogs since, so it might work well by now. Both get confused if you're running more than one monitor.
Finally, I've found that messing with delays works far better in 2.6 than 2.4 ..