There seem to be problems on my machine using pretty well any driver version. I'm currently using 256.35 but it also happens with 256.44. Have also had problems using pre- 256 drivers (195, 190, 185, 180) but I can't use any of those versions now because my current display locks at an eye-burning 100% brightness with pre- 256 series drivers. The display is a 120hz alienware, but the problems used to happen with my previous (60hz) display as well. Have tried using 8x and 16x PCI-E 2.0 slots on several different mainboards (DFI T3eH6, Asus P6T, currently Gigabyte X58A-UD3R 8x slot).
Message signaled interrupts are enabled for the card, but disabling that doesn't help.
I am currently using KDE 4.4.5 with compiz, but the hangs can still occur even using kwin with the composition disabled.
The hangs rarely happen shortly after a reboot, usually it happens after the system has been running for 3-7 days (7 days is pushing it, usually it's already happened by then). The most common trigger seems to be task switching between a game and some other application, this is with both native games like nexuiz and quake 3, and games using wine eg left 4 dead.
I believe it is related, that sometimes a task switch away from a game causes all applications to redraw extremely slowly, sometimes leading to a complete hang, other times switching back to the game is possible and then things continue to run normally.
Also related (I believe), is that after a few days it usually becomes impossible to rmmod nvidia (for example, to change driver versions) even if there have been no problems so far and Xorg has stopped cleanly - the module is allegedly still in use, but ps shows no instances of Xorg.
When the hangs occur, sometimes the mouse cursor can still be moved and switching to a VT might be possible, other times there's no response to any input. However, the system is always still contactable via ssh, and I have observed a number of different strange behaviours when the system is in this state:
- Xorg is fully utilising one CPU core; this happens pretty much every time
- kill -9 is required to stop Xorg; again pretty much every time
- often, it's not possible to rmmod nvidia even when ps shows there are no instances of Xorg (module is apparently still in use)
- if the module is still in use and I attempt to rmmod -f nvidia, the kernel has an oops and startx fails until after a reboot - but often, after trying to rmmod -f nvidia, the system will hang if I try to reboot and a power cycle is required
- sometimes it is impossible to kill Xorg, even with repeated signal 9's
- sometimes the GPU seems to lock up completely, and it isn't even possible to do a chvt (the command just hangs until ^c)
I can't always access another machine to ssh from, I'll try to find a working laptop so I can get a post- crash bug report (or any other info that might be useful) next time it happens.