View Single Post
Old 10-17-11, 08:56 PM   #1
cheechr1
Registered User
 
cheechr1's Avatar
 
Join Date: Nov 2010
Posts: 94
Default Constant Xid's 285.05.09 + Wine + SLi + Boinc

Hello again. I have been getting many Xid's and Interrupt while in atomic context errors lately. It's actually working like clockwork right now as to how to reproduce the error.

1. Have Boinc running for any period of time over half an hour or so.
2. Start up world of tanks in wine using the WotFlix wine package found online.
3. Inside the game hangar there are no problems, but playing a round will always freeze this computer, and when I log back in there are always Xid and sometimes the atomic context errors. Sometimes the game will play for one round after using Boinc (Namely Primegrid or Einstein@Home), but will almost always crash during the second round for sure. This setup used to run reasonably well in SLI with three GTX480's but with the newer drivers and especially the 285.05.09 thing's are getting really unstable.

When playing this game after a fresh boot, I can usually play 15-20 rounds of this game before this error happens again.

Oct 16 11:55:23 zod kernel: [94788.171601] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 16 11:55:25 zod kernel: [94790.172324] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 16 11:55:27 zod kernel: [94792.172427] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 16 11:55:29 zod kernel: [94794.172979] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 16 12:39:35 zod kernel: [97440.367662] NVRM: Xid (0000:04:00): 13, 0006 00000000 00009097 0000114c 1004441d 0000000c
Oct 16 12:39:35 zod kernel: [97440.375469] NVRM: Xid (0000:02:00): 13, 0006 00000000 00009097 0000114c 1004441d 0000000c

Added a bug report while SSH'ed in.

Oct 25/11 Update: While using driver 275.09.07 I had another crash but it wasn't the Xid's that finally caused the system to freeze, it was two messages relatively new in my logs:

Oct 25 16:34:19 zod kernel: [ 1922.559591] NVRM: Xid (0000:04:00): 13, 0006 00000000 00009097 0000238c 08000050 0000000c
Oct 25 16:45:12 zod kernel: [ 2575.470654] NVRM: Xid (0000:04:00): 13, 0006 00000000 00009097 00002384 08000000 0000000c
Oct 25 17:26:33 zod kernel: [ 5056.287698] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 00040000
Oct 25 17:26:33 zod kernel: [ 5056.295296] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 00200000
Oct 25 17:26:33 zod kernel: [ 5056.302745] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 02000000
Oct 25 17:26:33 zod kernel: [ 5056.310186] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 00200000
Oct 25 17:26:33 zod kernel: [ 5056.317644] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 00200000
Oct 25 17:26:33 zod kernel: [ 5056.325088] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 00200000
Oct 25 17:54:02 zod kernel: [ 6705.510462] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 25 17:54:04 zod kernel: [ 6707.510395] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 25 17:54:05 zod kernel: [ 6708.232353] NVRM: GPU at 0000:03:00.0 has fallen off the bus.
Oct 25 17:54:05 zod kernel: [ 6708.495033] NVRM: GPU at 0000:04:00.0 has fallen off the bus.

Oct 25 17:54:05 zod kernel: [ 6708.538898] NVRM: os_pci_init_handle: invalid context!
Oct 25 17:54:05 zod kernel: [ 6708.538900] NVRM: os_pci_init_handle: invalid context!


This happened while playing World Of Tanks in Wine 1.3.16. The nice thing is with driver 275.09.07, reboots always work. With 290.03, about every 3rd reboot works and does not freeze on bootup, right after the panels and background appear.

Another interesting tidbit. I just got an MCE while writing this post and a bunch of Xid's before and after it.

Oct 25 18:00:17 zod kernel: [ 279.193194] NVRM: Xid (0000:04:00): 13, 0004 00000000 00009097 0000163c 00000021 0000000c
Oct 25 18:00:17 zod kernel: [ 279.200932] NVRM: Xid (0000:03:00): 13, 0004 00000000 00009097 0000163c 00000021 0000000c
Oct 25 18:00:36 zod kernel: [ 298.664338] NVRM: Xid (0000:04:00): 13, 0004 00000000 00009039 00000300 00100111 00000053
Oct 25 18:00:36 zod kernel: [ 298.671910] NVRM: Xid (0000:04:00): 13, 0004 00000000 00009039 00000304 00000000 00000000
Oct 25 18:00:36 zod kernel: [ 298.679484] NVRM: Xid (0000:04:00): 13, 0004 00000000 00009039 00000304 c0c05000 00000000
Oct 25 18:00:37 zod kernel: [ 299.083939] [Hardware Error]: Machine check events logged
Oct 25 18:20:29 zod kernel: [ 1487.027365] NVRM: Xid (0000:03:00): 31, Ch 00000004, engmask 00000101, intr 10000000
Oct 25 18:20:33 zod kernel: [ 1491.061260] NVRM: Xid (0000:02:00): 32, Channel ID 00000004 intr 00800000
Oct 25 18:20:33 zod kernel: [ 1491.068894] NVRM: Xid (0000:03:00): 32, Channel ID 00000004 intr 00800000

And the output of the MCELog is this:

mcelog: Unsupported new Family 6 Model 2c CPU: only decoding architectural errors
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 0
CPU 4 BANK 5
MISC 1 ADDR 3ff4b3835bda
TIME 1319588475 Tue Oct 25 18:21:15 2011
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS fe00000000800400 MCGSTATUS 0
MCGCAP 1c09 APICID 12 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44

I was getting MCE's with my last processor, maybe time to look at the board. This is the first MCE I have gotten with this new Intel 990X, which leads me to believe it is not related. Although I have been wrong before.
Attached Files
File Type: gz nvidia-bug-report.log.gz (60.7 KB, 53 views)

Last edited by cheechr1; 10-25-11 at 07:35 PM. Reason: Added Bug-report
cheechr1 is offline   Reply With Quote