My guess is similar as yours but I don't trace it in such details. Sometimes different hardwares make problems hidden in the driver/library pair. I mean a buggy driver can occasionally have the AP waiting/checking forever(whatever busy-wait or not) , not necessarilly having kernel oops. I've found this on some linux open source drivers.
Thought about it.
If nVidia has a non-blocking function where the GL library issues the command and then continuously uses another ioctl to poll its completion, the behavior could be the same.
When you poll something, you usually sleep or resched between polling attempts; As the user-land application eats all of the CPU, I can assume that the GL busy-waiting function was designed with the assumption that the function completion will be (close to) immediate.
What I don't get is this: If indeed the driver fails in user-land context, how come the user-land application doesn't get stuck (be that in the first function, or the proceeding poll attempts) in an un-interruptible sleep? (Zombie)
Anyways, it would have been nice if I could get my hands on a printk-able version of the driver... nVidia should have release two sets of drivers: Normal and debug (or actually) version. If the normal version dies on you, install the trace version and post your findings instead of posting mute "I run quake the machine locks".
DEV-NG: Intel S2600C0, 2xE52658V2, 32GB, 4x2TB, GTX680, F19/x86_64, Dell U2711.
DEV: Intel S5520SC, 2xX5680, 36GB, 5x320GB, GTX550, F19/x86_64, Dell U2711 (^).
SRV: Tyan Tempest i5400XT, 2xE5335, 8GB, 4x2TB, 9800GTX, F19/x86-64, Dell U2412.
LAP: ASUS N56VJ, i7-3630QM, 16GB, 1TB, 635M, F19/x86_64.