I am working on a video decoder application that should be able to run as many decoder instances as possible. I am using OpenGL because Xv causes screen tearing when rendering as many decoders as I need to render.
I have traced out the following loads used by this decoder:
5% cpu load per decoder
2% vertex load per decoder (quad or tristrip)
1% texture transfer load
Due to the fact that my company is loading the PC that's running this decoder with other things, such as video encoders, audio decoders and encoders, etc. I need this app to run with the least CPU footprint as possible.
I have commented out certain parts of the program, and have found that rendering the quad to the screen is taking an inordinate amount of CPU time. This quad is rendered with three textures, using a rendering technique much like this code: http://www.fourcc.org/source/YUV420P-OpenGL-GLSLang.c
I am wondering why rendering a single quad onto the screen is taking 2% of the CPU load, even when I use vertex buffer objects and a display list to render it. This 2% scales linearly, so when I am rendering four decoders, I have 8% CPU load in just sending the quad to the screen. I have also tried using a triangle strip, however, there is a negligible change in CPU load. I can't possibly imagine why a single quad would take 2% CPU load. I established it was 2% by commenting out the texture transfers, and just rendering a black quad.
I am running on a dual core Intel 2.4 Ghz machine with 1 gig ram and an NVidia 7600 (AGP) vid card. I am running with the NVidia closed source driver version 9746.
I have tried using vertex buffer objects and display lists to try to reduce this CPU load, to no avail. Are there any other tricks that can reduce this CPU load? NVidia specific tricks?