View Single Post
Old 06-18-12, 06:34 PM   #1
amonakov
Registered User
 
Join Date: Jun 2012
Posts: 6
Default "GPU has fallen off the bus" error on 650M unless a CUDA program run first

Shorter version:
If you're trying to get Bumblebee working on a laptop with 650M or another 6xx-series card, and you get a "NVRM: GPU at ... has fallen off the bus" error in dmesg, or your laptop reboots immediately when optirun is invoked, running a CUDA program before optirun may workaround that.


Longer version:
Just for the fun of it, I'm trying to get Bumblebee working on MSI GE60, it's a laptop with Optimus graphics: GT 650M and HD4000. It looks like all video outputs are routed via the Intel chip. In the recent weeks, several Nvidia Linux drivers were released; I've tried most of them without success; all drivers exhibit a similar behaviour.

First, invoking optirun when driver is loaded (and GPU is powered on) induces "NVRM: GPU at ... has fallen off the bus error", unless some CUDA code was run beforehand. After that error is induced, laptop fan speed is increased to maximum. If some CUDA code (no matter what, can be something simple from the SDK like deviceQueryDrv) is run before optirun, that error is not seen. Instead, two ACPI errors are detected:
Code:
[   38.784883] ACPI Error: Field [TMPB] at 282624 exceeds Buffer [ROM1] size 262144 (bits) (20120320/dsopcode-236)
[   38.784920] ACPI Error: Method parse/execution failed [\_SB_.PCI0.PEG0.PEGP._ROM] (Node ffff8801a608b000), AE_AML_BUFFER_LIMIT (20120320/psparse-536)
However, while for someone else that workaround provides a working Bumblebee, for me it does not. When the secondary X server is started, nvidia blob refuses to work claiming there are no attached monitors (relevant excerpt from Xorg.8.log):
Code:
[43.316] (II) NVIDIA(0): Creating default Display subsection in Screen section
	"Default Screen Section" for depth/fbbpp 24/32
[    43.316] (==) NVIDIA(0): Depth 24, (==) framebuffer bpp 32
[    43.316] (==) NVIDIA(0): RGB weight 888
[    43.316] (==) NVIDIA(0): Default visual is TrueColor
[    43.316] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[    43.316] (**) NVIDIA(0): Option "NoLogo" "true"
[    43.316] (**) NVIDIA(0): Option "UseEDID" "false"
[    43.317] (**) NVIDIA(0): Option "ConnectedMonitor" "DFP-0"
[    43.317] (**) NVIDIA(0): Option "CustomEDID" "DFP-0:/etc/bumblebee/LGD0259.bin"
[    43.317] (**) NVIDIA(0): Enabling 2D acceleration
[    43.317] (**) NVIDIA(0): ConnectedMonitor string: "DFP-0"
[    43.317] (**) NVIDIA(0): Ignoring EDIDs
[    43.688] (WW) NVIDIA(0): Failed to enable display hotplug notification
[    43.692] (II) NVIDIA(0): NVIDIA GPU GeForce GT 650M (GK107) at PCI:1:0:0 (GPU-0)
[    43.692] (--) NVIDIA(0): Memory: 2097152 kBytes
[    43.692] (--) NVIDIA(0): VideoBIOS: 80.07.1b.00.0b
[    43.692] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[    43.692] (--) NVIDIA(0): Interlaced video modes are supported on this GPU
[    43.694] (--) NVIDIA(0): Valid display device(s) on GeForce GT 650M at PCI:1:0:0
[    43.694] (--) NVIDIA(0):     none
[    43.694] (EE) NVIDIA(0): Failed to assign any connected display devices to X screen 0
[    43.696] (EE) NVIDIA(0): Failing initialization of X screen 0
[    43.713] (II) UnloadModule: "nvidia"
[    43.713] (II) UnloadSubModule: "wfb"
[    43.713] (II) UnloadSubModule: "fb"
[    43.713] (EE) Screen(s) found, but none have a usable configuration.
That's not too surprising given that all outputs seem to be actually routed via the Intel chip, so the Nvidia card does not have any display to drive on this secondary X server. I wonder if Nvidia can say if any accomodations for such configs will be added in future drivers? Or is it supposed to work via some ACPI display hotplug/switching magic?

For what it's worth, nvidia-bug-report.gz is attached
Attached Files
File Type: gz nvidia-bug-report.log.gz (43.5 KB, 81 views)
amonakov is offline   Reply With Quote