View Single Post
Old 08-21-12, 12:29 PM   #68
Iesos
Registered User
 
Join Date: Apr 2012
Posts: 15
Default Re: Random crashes, NVRM Xid messages

I have played around with some options, and I don't want it all to go to waste, so I'll summarize them here.

* Not messing around, the usual Xid is 13. Together with a trace of some driver crashing, a "Attempted to yield the CPU" message, etc.

* Activating MSI, the error seems to change to a Xid 32. Since there is nothing that covers what these means, maybe this is not so helpful for us, but maybe for nv devs.

* Running it with "taskset 1" so that the process' will keep to one CPU made it even possible to kill the process without the coumper crashing. The whole output into messages was then:

Code:
Aug 21 18:18:24 localhost kernel: NVRM: Xid (0000:01:00): 31, Ch 00000003, engmask 00000101, intr 30000000
Aug 21 18:18:24 localhost kernel: NVRM: Xid (0000:01:00): 39, CCMDs 00000004 000090b5
Aug 21 18:18:26 localhost kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Aug 21 18:18:28 localhost kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Aug 21 18:19:25 localhost kernel: Clocksource tsc unstable (delta = -1962370138 ns)
Aug 21 18:19:25 localhost kernel: ------------[ cut here ]------------
Aug 21 18:19:25 localhost kernel: WARNING: at drivers/gpu/drm/i915/i915_irq.c:649 ironlake_irq_handler+0x4f2/0x500()
Aug 21 18:19:25 localhost kernel: Hardware name: Dell System XPS L502X
Aug 21 18:19:25 localhost kernel: Missed a PM interrupt
Aug 21 18:19:25 localhost kernel: Modules linked in: nvidia(PO) coretemp bbswitch(O) rtc cdc_ether usbnet cdc_acm snd_hda_codec_hdmi snd_hda_codec_realtek dell_wmi sparse_keymap sg dcdbas snd_hda_intel snd_hda_codec xhci_hcd ehci_hcd thermal [last unloaded: nvidia]
Aug 21 18:19:25 localhost kernel: Pid: 29938, comm: Diablo III.exe Tainted: P           O 3.3.8-gentoo-jesus23 #2
Aug 21 18:19:25 localhost kernel: Call Trace:
Aug 21 18:19:25 localhost kernel: <IRQ>  [<ffffffff8105697b>] ? warn_slowpath_common+0x7b/0xc0
Aug 21 18:19:25 localhost kernel: [<ffffffff81056a75>] ? warn_slowpath_fmt+0x45/0x50
Aug 21 18:19:25 localhost kernel: [<ffffffff813b3412>] ? ironlake_irq_handler+0x4f2/0x500
Aug 21 18:19:25 localhost kernel: [<ffffffff81011c75>] ? read_tsc+0x5/0x20
Aug 21 18:19:25 localhost kernel: [<ffffffff810b917a>] ? handle_irq_event_percpu+0x3a/0x140
Aug 21 18:19:25 localhost kernel: [<ffffffff810b92ba>] ? handle_irq_event+0x3a/0x70
Aug 21 18:19:25 localhost kernel: [<ffffffff810bc107>] ? handle_edge_irq+0x67/0x100
Aug 21 18:19:25 localhost kernel: [<ffffffff8100c5b5>] ? handle_irq+0x15/0x20
Aug 21 18:19:25 localhost kernel: [<ffffffff8100c283>] ? do_IRQ+0x53/0xd0
Aug 21 18:19:25 localhost kernel: [<ffffffff816602ee>] ? common_interrupt+0x6e/0x6e
Aug 21 18:19:25 localhost kernel: [<ffffffff8105c900>] ? __do_softirq+0x50/0x120
Aug 21 18:19:25 localhost kernel: [<ffffffff81098b9f>] ? clockevents_program_event+0x6f/0x120
Aug 21 18:19:25 localhost kernel: [<ffffffff81661e5c>] ? call_softirq+0x1c/0x30
Aug 21 18:19:25 localhost kernel: [<ffffffff8100c625>] ? do_softirq+0x65/0xa0
Aug 21 18:19:25 localhost kernel: [<ffffffff8105cc3e>] ? irq_exit+0x8e/0xb0
Aug 21 18:19:25 localhost kernel: Switching to clocksource hpet
Aug 21 18:19:25 localhost kernel: [<ffffffff81025178>] ? smp_apic_timer_interrupt+0x68/0xa0
Aug 21 18:19:25 localhost kernel: [<ffffffff816615de>] ? apic_timer_interrupt+0x6e/0x80
Aug 21 18:19:25 localhost kernel: <EOI>  [<ffffffffa0694015>] ? _nv014794rm+0x36/0x3a [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00c40f8>] ? _nv014846rm+0x2d/0x33 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa043ef86>] ? _nv009814rm+0xea/0x13a [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa04612ea>] ? _nv004046rm+0x4a81/0xae8b [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa041d220>] ? _nv008399rm+0x60/0xa2 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa0430d1f>] ? _nv008400rm+0xcbf/0xf94 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00ba2bd>] ? _nv001092rm+0x404/0x485 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00b79e3>] ? _nv001073rm+0x1998/0x2d09 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00b7986>] ? _nv001073rm+0x193b/0x2d09 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00b5f94>] ? _nv001039rm+0xd23/0xd59 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00b6033>] ? _nv016414rm+0xe/0x26 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00b654f>] ? _nv001073rm+0x504/0x2d09 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00b5f94>] ? _nv001039rm+0xd23/0xd59 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00b6033>] ? _nv016414rm+0xe/0x26 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00b62d7>] ? _nv001073rm+0x28c/0x2d09 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00b5f94>] ? _nv001039rm+0xd23/0xd59 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa00b6007>] ? _nv016416rm+0x3d/0x5b [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa0699a77>] ? _nv001082rm+0xdf/0x1c3 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffffa069c00c>] ? rm_free_unused_clients+0x98/0x12d [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffff8107b27c>] ? __wake_up_sync_key+0x4c/0x90
Aug 21 18:19:25 localhost kernel: [<ffffffffa06bb14b>] ? nv_kern_ctl_close+0x7b/0x130 [nvidia]
Aug 21 18:19:25 localhost kernel: [<ffffffff811028fa>] ? fput+0xea/0x240
Aug 21 18:19:25 localhost kernel: [<ffffffff810fecbf>] ? filp_close+0x5f/0x90
Aug 21 18:19:25 localhost kernel: [<ffffffff8105a0f5>] ? put_files_struct+0x75/0xf0
Aug 21 18:19:25 localhost kernel: [<ffffffff8105a3a6>] ? do_exit+0x166/0x7e0
Aug 21 18:19:25 localhost kernel: [<ffffffff8165eb90>] ? __schedule+0x2a0/0x6e0
Aug 21 18:19:25 localhost kernel: [<ffffffff8105acc3>] ? do_group_exit+0x53/0xd0
Aug 21 18:19:25 localhost kernel: [<ffffffff81066f19>] ? get_signal_to_deliver+0x199/0x4e0
Aug 21 18:19:25 localhost kernel: [<ffffffff81066f62>] ? get_signal_to_deliver+0x1e2/0x4e0
Aug 21 18:19:25 localhost kernel: [<ffffffff81009ecd>] ? do_signal+0x9d/0x780
Aug 21 18:19:25 localhost kernel: [<ffffffff81093599>] ? ktime_get_ts+0xb9/0xe0
Aug 21 18:19:25 localhost kernel: [<ffffffff81011c75>] ? read_tsc+0x5/0x20
Aug 21 18:19:25 localhost kernel: [<ffffffff8109354d>] ? ktime_get_ts+0x6d/0xe0
Aug 21 18:19:25 localhost kernel: [<ffffffff8100a635>] ? do_notify_resume+0x65/0x90
Aug 21 18:19:25 localhost kernel: [<ffffffff810a65c3>] ? compat_sys_clock_gettime+0x83/0xa0
Aug 21 18:19:25 localhost kernel: [<ffffffff81660df2>] ? int_signal+0x12/0x17
Aug 21 18:19:25 localhost kernel: ---[ end trace b40b7c165a9fd073 ]---
Aug 21 18:19:30 localhost kernel: NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Can anyone help me decipher these messages? htep/tsc clocksource? Why is intel crashing? What exactly is crashing it? etc
Iesos is offline   Reply With Quote