nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   GPU at $BUSID$ has fallen of the bus (http://www.nvnews.net/vbulletin/showthread.php?t=167363)

apriori 10-14-11 01:07 PM

GPU at $BUSID$ has fallen of the bus
 
Hi guys,

lets gather all issues related to this bug. At least its my observation that all distros see increasing amounts of reports of this issue. Currently I only experience it on my notebook which has a 8800MGTX, caused by all drivers starting from 275.09, up to (last tested) 285.05.09 (which is really questionable, because I think I used 275.x.x series without problems). This also doesn't seem to be a kernel bug, at least all kernels 2.6.32->3.0.6 seem to be affected.
More likely it's xorg related, I think. But unfortunately I can't easily revert that one (using 1.11.1 right now).

Distro is Archlinux 2010.05, upgraded to latest stable.
Funny thing is, another machine with the same distro (not quite sure whether exact same packages) and latest nvidia drivers having a 560 Ti works just fine.

So, please, lets try to track that issue down by providing as much data as possible.

luudee 10-15-11 12:49 AM

Re: GPU at $BUSID$ has fallen of the bus
 
I am having this problem as well, running Fedora 14 with all the latest updates. Updated NVDIA driver to 285.05.09. Had 270.41.19, tried 280.13, now at 285.05.09. Card is GTX 580. I am running x86_64. Two 30" monitors, KDE ...

The "Sticky: Stability Issues ..." post is 6 years old, perhaps somebody from Nvidia could update it ?


Thanks,
rudi



Oct 15 00:06:54 cpu11 kernel: NVRM: Xid (0000:04:00): 13, 0006 00000000 00009297 000023ac 00000000 00000000
Oct 15 00:44:38 cpu11 kernel: NVRM: Xid (0000:04:00): 13, 0001 00000000 00009297 00001158 3f800000 00000000
Oct 15 00:44:38 cpu11 kernel: NVRM: Xid (0000:04:00): 32, Channel ID 00000001 intr 00040000
Oct 15 00:44:38 cpu11 kernel: NVRM: Xid (0000:04:00): 32, Channel ID 00000001 intr 00040000
Oct 15 00:44:38 cpu11 kernel: NVRM: Xid (0000:04:00): 32, Channel ID 00000001 intr 00040000
Oct 15 00:44:38 cpu11 kernel: NVRM: Xid (0000:04:00): 32, Channel ID 00000001 intr 00040000
Oct 15 00:44:38 cpu11 kernel: NVRM: Xid (0000:04:00): 32, Channel ID 00000001 intr 00040000
Oct 15 00:44:38 cpu11 kernel: NVRM: Xid (0000:04:00): 32, Channel ID 00000001 intr 00040000
Oct 15 00:45:10 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:12 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:14 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:16 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:18 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:21 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:24 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:27 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:29 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:32 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:35 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:37 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:39 cpu11 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 15 00:45:40 cpu11 kernel: NVRM: GPU at 0000:04:00.0 has fallen off the bus.

luudee 10-15-11 12:53 AM

Re: GPU at $BUSID$ has fallen of the bus
 
One more thing, noticed this in my Xorg.0.log:


[ 1445.558] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 1445.558]
Backtrace:
[ 1445.629] 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x4a0908]
[ 1445.629] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49fe04]
[ 1445.629] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x47c904]
[ 1445.629] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f755b045000+0x453f) [0x7f755b04953f]
[ 1445.629] 4: /usr/bin/Xorg (0x400000+0x6a5f7) [0x46a5f7]
[ 1445.629] 5: /usr/bin/Xorg (0x400000+0x119103) [0x519103]
[ 1445.629] 6: /lib64/libc.so.6 (0x31da400000+0x33140) [0x31da433140]
[ 1445.629] 7: /lib64/libc.so.6 (__sched_yield+0x7) [0x31da4c8607]
[ 1445.629] 8: /usr/lib64/libnvidia-glcore.so.285.05.09 (0x322c800000+0x12e9fbb) [0x322dae9fbb]
[ 1445.629] 9: /usr/lib64/libnvidia-glcore.so.285.05.09 (0x322c800000+0x12ea11b) [0x322daea11b]
[ 1445.629] 10: /usr/lib64/libnvidia-glcore.so.285.05.09 (0x322c800000+0x128b4ad) [0x322da8b4ad]
[ 1445.629] 11: /usr/lib64/libnvidia-glcore.so.285.05.09 (0x322c800000+0x100cf75) [0x322d80cf75]
[ 1445.629] 12: /usr/lib64/xorg/modules/extensions/libglx.so (0x7f755cc3a000+0x4793e1) [0x7f755d0b33e1]



and



[ 1009.459] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 1009.459]
Backtrace:
[ 1009.460] 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x4a0908]
[ 1009.460] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49fe04]
[ 1009.460] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x47c904]
[ 1009.460] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f36a7025000+0x453f) [0x7f36a702953f]
[ 1009.460] 4: /usr/bin/Xorg (0x400000+0x6a5f7) [0x46a5f7]
[ 1009.460] 5: /usr/bin/Xorg (0x400000+0x119103) [0x519103]
[ 1009.460] 6: /lib64/libc.so.6 (0x31da400000+0x33140) [0x31da433140]
[ 1009.460] 7: /usr/lib64/xorg/modules/extensions/libglx.so (0x7f36a8c1a000+0x337c29) [0x7f36a8f51c29]

apriori 10-18-11 04:59 AM

Re: GPU at $BUSID$ has fallen of the bus
 
@luudee:

Please tell me your xorg* versions (especially of the server) and attach the complete Xorg.0.log,
in your case it more looks like an incompatible ABI version. The funny thing is also, that I don't have any of these os_schedule or Xid messages.

ColdFeetBob 10-23-11 01:02 PM

Re: GPU at $BUSID$ has fallen of the bus
 
Apart fron the "NVRM: GPU at 0000:04:00.0 has fallen off the bus." message, I have the exact same symptoms as luudee.

In my case, ususally is mplayer (both windowed and fullscreen) that triggers the freeze.
I'm running a full up-to-date Arch Linux, x86_64, with latest official nvidia drivers, GT240.

monty.clift 10-24-11 06:12 AM

Re: GPU at $BUSID$ has fallen of the bus
 
I am having the same problem running x86_84 Fedora 15 & Fedora 16. I have Dell Precision 6500, laptop, with Quadro FX 2800M. Using Gnome 3 with additional monitor connected to the display_port I immediately get a kernel panic with the following lines in the /var/log/messages:
Oct 24 14:23:36 kernel: [ 1833.815841] dell_wmi: Received unknown WMI event (0x11)
Oct 24 14:23:36 kernel: [ 1833.874781] NVRM: GPU at 0000:01:00.0 has fallen off the bus.

This bug also happens with no external monitor however it takes a bit more time.

The driver I am using is 285.05.09

Thanks in advance,
Monty

apriori 11-04-11 04:25 AM

Re: GPU at $BUSID$ has fallen of the bus
 
@monty.clift:

You might want to try to disable wmi completely. Currently I'm also in the process of finding out how to do that. In my case, the Clevo570RU Notebook, all this mess is not resolved even when completely deactivating ACPI, so its not even related to that (although this machine has a hell lot of ACPI related issues I need to get fixed).

The only useful workaround I came up with up to now was to revert xorg to version 1.10 and use a 260.x driver which is a major pain the more recent your distro is. Funny enough its even possible to use OpenCL with such old drivers if the OpenCL libraries of the newer drivers are still around.

apriori 11-06-11 04:49 AM

Re: GPU at $BUSID$ has fallen of the bus
 
1 Attachment(s)
Here the bugreport log from my latest attempt with 290.06.
So far nothing changed.

cehoyos 11-07-11 05:45 AM

Re: GPU at $BUSID$ has fallen of the bus
 
Quote:

Originally Posted by luudee (Post 2490891)
Two 30" monitors, KDE ...

I saw similar symptoms when using two screens because of heating the GPU. You could observe the GPU temperature to find out if that is the problem.

vojta 11-07-11 11:04 AM

Re: GPU at $BUSID$ has fallen of the bus
 
2 Attachment(s)
I have similar problems. X server freezes while using OpenGL or VDPAU. I have described my problem here.

CPU: Intel Core i5 520M
Memory: 8 GB (2x 4GB)
Graphics card: NVIDIA Quadro NVS 5100M

Using Gentoo Linux
nvidia-drivers version: 290.06 (also tried 285.05.09, nothing has changed)
X.org server version: 1.11.1 (also tried 1.10.4, nothing has changed)
Linux version: 3.1.0 (using ck- patches)

I will update xorg-server to 1.11.2 soon and report if anything changed.

apriori 11-08-11 04:28 AM

Re: GPU at $BUSID$ has fallen of the bus
 
Yeah, I'd like to add, that my issue starts about 15 secs after starting X. I hardly ever manage to login KDE 4 completely (using kdm as login manager).

Currently the only semi-stable versions I got are (quite rare lockups):

Kernel 2.6.32 (yeah, I know its ancient)
Xorg 1.10.4
NVIDIA Drivers 270.41.19

My issue is definetly not hardware failure or temperature related. The machine runs non-stop for days using this driver or windows.

@vojta: The only reason I said something about "reverting to Xorg 1.10.4" is that this enables you to revert to older NVIDIA drivers to, which only support that ABI, e.g. in my case 270.41.19.


All times are GMT -5. The time now is 11:19 PM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.