View Single Post
Old 06-22-12, 03:33 PM   #1
theroot
Registered User
 
Join Date: Jun 2012
Posts: 3
Default xorg locks-up with newest nvidia drivers w/ vdpau.

Hello,

It seems since nvidia-drviers 295.53 I have had weird xorg crashes and lock-ups. I am running a 580gtx, gentoo x64 system up to date, with nvidia 302.17 currently.

Attached is my bug report log.

I can replicate this using a movie player (mplayer, vlc, dragonplayer) and using VDPAU as the video output driver. Usually switching to XV works better - but i've still had it lock up using XV.

I think it has to be playing a while to reproduce. Say I play an entire folder of shows, there's two ways I've seen it happen:

after a few hours the video on the movie gest distorted and the system become unresponive, I have less then 10s to kill the window or X becomes completely frozen. ssh in and kill -9 on mplayer doesn't work. I have to kill kdm and X to get the system back.

after a few hours i go to play another movie, and as soon as the new one opens it starts out corrupted or black output, screen unresponsive, usually can't recover from it without restarting X through an ssh session.

I've tried recompiling everything that seems related. I've tried all the various drivers 295.49, 295.53, 295.59, 302.17 - i had this issue with all of them.

The card is not getting that hot, my card isnt but the rest of my case is water cooled, but the avg tmp for the card is about 50*c during movie playing.

Related versions:

xorg-server 1.12.2
vdpau-video 0.7.3
libvdpau 0.4.1-r1
kde/kdm 4.8.4 (just updated today from 4.8.3 - didnt resolve)
kernel 3.4.3 (had same issue on 3.4.0)

/var/log/messages - last time it locked up
Code:
Jun 22 15:18:14 alpha-centauri kernel: [250992.096032] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Jun 22 15:18:16 alpha-centauri kernel: [250994.093441] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Code:
Jun 22 15:18:14 alpha-centauri kernel: [250992.096032] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Jun 22 15:18:16 alpha-centauri kernel: [250994.093441] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Jun 22 15:18:23 alpha-centauri kernel: [251001.437827] usb 3-4.3: unlink qh1-3008/ffff880214cf8d80 start 0 [1/2 us]
Jun 22 15:18:32 alpha-centauri acpid: client 29903[0:0] has disconnected
Jun 22 15:18:32 alpha-centauri acpid: client 29903[0:0] has disconnected
Jun 22 15:18:32 alpha-centauri acpid: client connected from 29903[0:0]
Jun 22 15:18:32 alpha-centauri acpid: 1 client rule loaded
Jun 22 15:18:32 alpha-centauri acpid: client connected from 29903[0:0]
Jun 22 15:18:32 alpha-centauri acpid: 1 client rule loaded
Jun 22 15:18:32 alpha-centauri kernel: [251010.142999] ehci_hcd 0000:00:16.2: reused qh ffff880214cf8d80 schedule
Jun 22 15:18:32 alpha-centauri kernel: [251010.143003] usb 3-4.3: link qh1-3008/ffff880214cf8d80 start 0 [1/2 us]
messages since last reboot with NVRM in it
Code:
Jun 19 01:05:24 alpha-centauri kernel: [2265402.215178] NVRM: VM: nv_vm_malloc_pages: failed to allocate contiguous memory
Jun 19 01:42:02 alpha-centauri kernel: [2267597.474833] NVRM: Xid (0000:01:00): 13, 0003 00000000 00009297 00001614 00000000 00000000
Jun 19 17:27:02 alpha-centauri kernel: [2324222.788229] NVRM: Xid (0000:01:00): 31, Ch 00000009, engmask 00000101, intr 10000000
Jun 19 17:32:05 alpha-centauri kernel: [  150.191361] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  302.17  Tue Jun 12 16:03:22 PDT 2012
Jun 19 17:37:06 alpha-centauri kernel: [  451.055297] NVRM: Xid (0000:01:00): 31, Ch 00000009, engmask 00000101, intr 10000000
Jun 19 18:10:25 alpha-centauri kernel: [ 2447.539598] NVRM: Xid (0000:01:00): 31, Ch 00000005, engmask 00000101, intr 10000000
Jun 19 19:18:17 alpha-centauri kernel: [ 6513.821785] NVRM: Xid (0000:01:00): 31, Ch 00000007, engmask 00000101, intr 10000000
Jun 19 20:39:03 alpha-centauri kernel: [11353.907557] NVRM: Xid (0000:01:00): 31, Ch 00000007, engmask 00000101, intr 10000000
Jun 20 03:12:34 alpha-centauri kernel: [34933.856679] NVRM: Xid (0000:01:00): 13, 0007 00000000 00009097 00001b0c 0000f010 00000000
Jun 20 04:13:42 alpha-centauri kernel: [38597.351936] NVRM: Xid (0000:01:00): 13, 0007 00000000 00009097 00001b0c 0000f010 00000000
Jun 20 05:58:37 alpha-centauri kernel: [44884.002690] NVRM: Xid (0000:01:00): 13, 0007 00000000 00009097 00001b0c 0000f010 00000000
Jun 22 11:31:00 alpha-centauri kernel: [237376.323721] NVRM: Xid (0000:01:00): 13, 0005 00000000 00009097 00002484 00004203 00000000
Jun 22 15:18:14 alpha-centauri kernel: [250992.096032] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Jun 22 15:18:16 alpha-centauri kernel: [250994.093441] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

It seems that this is a common issue without resolution yet. Although i didnt see anyone recently mentioning how to reproduce using vdpau video playing. This is completely unstable and unacceptable :-( Come on nvidia - help us out!

Let me know if there's any further information I can provide or steps suggestion for resolution. Appreciate everyones help!

Thanks!


EDIT -

I deleted and reinstalled everything nvidia related. Rebooted PC. attempted to play a video in mplayer2 w/ vdpau output, and it frooze within .5s, completely unresponsive immediately. Had to ssh in to kill X and related process'. Here's the messages for the lockup, same as before:
Code:
Jun 22 17:19:23 alpha-centauri kernel: [  142.441039] NVRM: Xid (0000:01:00): 31, Ch 00000006, engmask 00000180, intr 10000000
Jun 22 17:19:23 alpha-centauri kernel: [  144.438393] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Jun 22 17:19:23 alpha-centauri kernel: [  146.435798] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Jun 22 17:19:25 alpha-centauri kernel: [  148.433217] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Jun 22 17:19:27 alpha-centauri kernel: [  150.430620] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context



EDIT AGAIN:

Had a hard lockup i couldnt ssh into, get sysreq keys or acpi shutdown, nothing. First time this has happened. Again looks nvidia related. I wasn't doing anything this time, just web browsing. I don't recall if there was any flash or anything specific on the page.

Code:
Jun 23 00:51:14 alpha-centauri kernel: [27222.124571] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Jun 23 00:51:14 alpha-centauri kernel: [27222.386364] hub 1-0:1.0: hub_suspend
Jun 23 00:51:14 alpha-centauri kernel: [27222.386370] usb usb1: bus auto-suspend, wakeup 1
Jun 23 00:51:14 alpha-centauri kernel: [27222.386372] ehci_hcd 0000:00:12.2: suspend root hub
Jun 23 00:51:15 alpha-centauri kernel: [27223.401034] ------------[ cut here ]------------
Jun 23 00:51:15 alpha-centauri kernel: [27223.401040] WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0xfd/0x15d()
Jun 23 00:51:15 alpha-centauri kernel: [27223.401042] Hardware name: To be filled by O.E.M.
Jun 23 00:51:15 alpha-centauri kernel: [27223.401043] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Jun 23 00:51:15 alpha-centauri kernel: [27223.401044] Modules linked in: nvidia(PO) cpufreq_userspace cpufreq_powersave cpufreq_conservative cpufreq_ondemand cpufreq_stats it87 hwmon_vid vboxnetflt(O) vboxdrv(O) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev i2c_piix4 powernow_k8 mperf freq_table k10temp fam15h_power hwmon [last unloaded: nvidia]
Jun 23 00:51:15 alpha-centauri kernel: [27223.401059] Pid: 0, comm: swapper/3 Tainted: P           O 3.4.3-gentoo #1
Jun 23 00:51:15 alpha-centauri kernel: [27223.401060] Call Trace:
Jun 23 00:51:15 alpha-centauri kernel: [27223.401061]  <IRQ>  [<ffffffff810586a2>] warn_slowpath_common+0x7e/0x96
Jun 23 00:51:15 alpha-centauri kernel: [27223.401067]  [<ffffffff8105874e>] warn_slowpath_fmt+0x41/0x43
Jun 23 00:51:15 alpha-centauri kernel: [27223.401070]  [<ffffffff81574992>] dev_watchdog+0xfd/0x15d
Jun 23 00:51:15 alpha-centauri kernel: [27223.401072]  [<ffffffff81064ba2>] run_timer_softirq+0x1ac/0x27b
Jun 23 00:51:15 alpha-centauri kernel: [27223.401075]  [<ffffffff81037cf5>] ? read_tsc+0x9/0x19
Jun 23 00:51:15 alpha-centauri kernel: [27223.401078]  [<ffffffff81574895>] ? netif_tx_unlock+0x57/0x57
Jun 23 00:51:15 alpha-centauri kernel: [27223.401080]  [<ffffffff8104681d>] ? apic_write+0x11/0x13
Jun 23 00:51:15 alpha-centauri kernel: [27223.401083]  [<ffffffff8105e02d>] __do_softirq+0xc5/0x18b
Jun 23 00:51:15 alpha-centauri kernel: [27223.401086]  [<ffffffff8108de47>] ? tick_program_event+0x1f/0x21
Jun 23 00:51:15 alpha-centauri kernel: [27223.401089]  [<ffffffff8169aa8c>] call_softirq+0x1c/0x30
Jun 23 00:51:15 alpha-centauri kernel: [27223.401091]  [<ffffffff81033bbe>] do_softirq+0x33/0x69
Jun 23 00:51:15 alpha-centauri kernel: [27223.401093]  [<ffffffff8105e315>] irq_exit+0x3f/0xa7
Jun 23 00:51:15 alpha-centauri kernel: [27223.401095]  [<ffffffff81046d3c>] smp_apic_timer_interrupt+0x76/0x84
Jun 23 00:51:15 alpha-centauri kernel: [27223.401097]  [<ffffffff8169a1c7>] apic_timer_interrupt+0x67/0x70
Jun 23 00:51:15 alpha-centauri kernel: [27223.401099]  <EOI>  [<ffffffff813d1464>] ? acpi_idle_enter_simple+0xc0/0xfd
Jun 23 00:51:15 alpha-centauri kernel: [27223.401103]  [<ffffffff813d145f>] ? acpi_idle_enter_simple+0xbb/0xfd
Jun 23 00:51:15 alpha-centauri kernel: [27223.401106]  [<ffffffff81501ecf>] cpuidle_enter+0x12/0x14
Jun 23 00:51:15 alpha-centauri kernel: [27223.401108]  [<ffffffff815023ef>] cpuidle_idle_call+0xf5/0x1a1
Jun 23 00:51:15 alpha-centauri kernel: [27223.401110]  [<ffffffff810396a8>] cpu_idle+0x97/0xf5
Jun 23 00:51:15 alpha-centauri kernel: [27223.401113]  [<ffffffff81684663>] start_secondary+0x1dc/0x1e5
Jun 23 00:51:15 alpha-centauri kernel: [27223.401115] ---[ end trace 002053365cb71a43 ]---
Jun 23 00:51:15 alpha-centauri kernel: [27223.401122] e1000e 0000:02:00.0: eth0: Reset adapter
Jun 23 00:51:18 alpha-centauri kernel: [27225.587834] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Jun 23 00:51:18 alpha-centauri kernel: [27226.425521] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Jun 23 00:51:19 alpha-centauri kernel: [27227.263210] Clocksource tsc unstable (delta = -209686961 ns)
Attached Files
File Type: gz nvidia-bug-report.log.gz (120.1 KB, 120 views)
theroot is offline   Reply With Quote