|03-03-11, 07:26 AM||#1|
Join Date: Mar 2011
Location: Melbourne, Australia
Scrambled scanout frequent with 260. series drivers: locking issue?
Summary: THIS-> happens frequently with 260. series drivers and a preemptible kernel. When I rebuild the kernel with voluntary preemption, the problem goes away.
Details: My system is a Dell Inspiron 9300 / Pentium-M / GeForce GO 6800. The above photo shows my problem: a garbled screen. I looks like the video scanout is bogus (a printscreen is fine). Sometimes the problem resolves itself after a few seconds, sometimes not. The only reliable way to fix it is to suspend and resume, or hibernate.
I've been having this problem for some time, it's obviously a bug which has been happily lurking, and has finally decided to go for its 15 minutes of fame. With the 195 series drivers and xorg 1.7 it only occurs infrequently (every few days). With the 260 series drivers and xorg 1.9, it happens perhaps once every five minutes, enough to make the system effectively unusable. The problem occurs with kernels 2.6.32/34/36 kernels, possibly even before that, but I can't be sure, too long ago.
After reading the pinned topics in this forum, I tried various kernel settings to do with the PCI bus... in particular pci=nommconf, but also others (pci=bios,no_crs/use_crs,nobar). I also tried settings to do with the PCI-Express bus/bridge (pcie_aspm=off/force pcie_ports=native/compat). None of these settings had any effect.
Then I noticed that the IRQ used by the video was shared with the ICH6/AC'97 audio as well as two USB drivers. Having had problems before with video cards and shared IRQs, I tried passing NVreg_EnableMSI=1 to the kernel module.. this shifted the IRQ. It didn't stop the problem, but it did noticably reduce its frequency to maybe once or twice an hour.
I did some more experimenting with interrupts, and I discovered that the problem occured more frequently when the system was under high interrupt load (eg. playing video, lot of IO). At this point I began to suspect some locking issue, eg. an interrupt half way through programming of registers controlling the video scanout (I'm no hardware guy, so I'm fuzzy on all this, but I'm sure you get my drift).
The kernel RCU subsystem has had a lot of change over the last couple of years, and once before I'd had system lockups with a GeForce 5 series card on an SMP system when the newer hierarchical RCU came in. So I tried building kernels with all three possible RCU options. No change.
Then, I tried building the kernel with CONFIG_PREEMPT_VOLUNTARY (until then I'd been using CONFIG_PREEMPT). Bingo.
This took me a fair while to track down, and very nearly caused me to ditch this laptop. I'm glad I found a work-around, but it's not a solution. I don't know whether this is problem with the nvidia driver or with the kernel, but as I'm able to reproduce this problem repeatably, I'm quite happy to test any fixes; it would be good if we could stop this happening for other people.