Originally Posted by rockob
For the record, setting persistence mode on the nvidia card doesn't stop the crash either. (To do this, you have to set KeepUnusedXServer=true in /etc/bumblebee/bumblebee.conf, "sudo restart bumblebeed" and then "sudo optirun /usr/lib/nvidia-current/bin/nvidia-smi -pm 1" before running the game, because by default persistence mode is disabled if bumblebeed unloads the nvidia module and turns the card to low power mode.)
This time I got a Xid 31 error:
Aug 24 17:21:23 sierra kernel: [75848.971725] pci 0000:01:00.0: power state changed by ACPI to D0
Aug 24 17:21:23 sierra kernel: [75849.008904] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=none
Aug 24 17:21:23 sierra kernel: [75849.009029] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 304.37 Wed Aug 8 19:52:48 PDT 2012
Aug 24 17:21:23 sierra kernel: [75849.023087] nvidia 0000:01:00.0: irq 55 for MSI/MSI-X
Aug 24 17:21:24 sierra kernel: [75849.915655] NVRM: GPU at 0000:01:00: GPU-1b1589e9-15df-5ca5-919b-2f748fae640f
Aug 24 17:26:17 sierra kernel: [76143.099200] NVRM: Xid (0000:01:00): 31, Ch 00000006, engmask 00000101, intr 10000000
Aug 24 17:26:17 sierra kernel: [76143.103234] NVRM: Xid (0000:01:00): 39, CCMDs 00000007 000090b5
Aug 24 17:26:19 sierra kernel: [76145.102562] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Aug 24 17:26:40 sierra kernel: [76166.103395] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
I tried changing clock source at boot, it only got rid of that message.
Activating MSI did some good, but the crash still seemed inevitable.
With taskset, one processor is locked up and a hard reboot is needed to get it back.
With persistence mode something new happened. The same nvidia errors, but the process was killable, and everything went back to normal. To tryi it out I tried to start the game again, and then VGL complained about opengl-something. Lost the message.
Is there any debugging mode for the nvidia module? Or some kernel setting that would help debugging this?