I've known about this driver bug for a while, and I'm bringing it up now with hopes that it'll be fixed. The problem is that when the GLX function glXSwapBuffers waits for the vblank interval, the glFinish and glXWaitGL functions seem to use a busy-loop to wait for the buffer swap, causing 100% CPU usage. I've attached a small sample program to reproduce the problem. It can be compiled as follows:
$ gcc glbusy.c -o glbusy -lX11 -lGL
after renaming the attached glbusy.c.txt to glbusy.c.
Run the application with vsync enabled in nvidia-settings or with
and it will use 100% of the CPU or a core. After commenting out the glXWaitGL on line 55 of the source and recompiling, the same application uses very little CPU time because the glXSwapBuffers function itself correctly waits for the previous swap to finish by using an interrupt or other method.
This bug affects situations where a GL window is encapsulated inside another GL window, like running OpenGL applications on a window compositor. For 1:1 frame-to-frame correspondence, a compositor needs to know that the framebuffers of the child-windows are complete/swapped before deriving the textures from them. Either glFinish or glXWaitGL is needed because glFlush and friends don't make this guarantee. glXSwapBuffers only provides an implicit glFlush.
I've attached a nvidia-bug-report.log, also, for completeness.