Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 10-17-11, 09:56 PM   #1
cheechr1
Registered User
 
cheechr1's Avatar
 
Join Date: Nov 2010
Posts: 94
Default Constant Xid's 285.05.09 + Wine + SLi + Boinc

Hello again. I have been getting many Xid's and Interrupt while in atomic context errors lately. It's actually working like clockwork right now as to how to reproduce the error.

1. Have Boinc running for any period of time over half an hour or so.
2. Start up world of tanks in wine using the WotFlix wine package found online.
3. Inside the game hangar there are no problems, but playing a round will always freeze this computer, and when I log back in there are always Xid and sometimes the atomic context errors. Sometimes the game will play for one round after using Boinc (Namely Primegrid or Einstein@Home), but will almost always crash during the second round for sure. This setup used to run reasonably well in SLI with three GTX480's but with the newer drivers and especially the 285.05.09 thing's are getting really unstable.

When playing this game after a fresh boot, I can usually play 15-20 rounds of this game before this error happens again.

Oct 16 11:55:23 zod kernel: [94788.171601] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 16 11:55:25 zod kernel: [94790.172324] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 16 11:55:27 zod kernel: [94792.172427] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 16 11:55:29 zod kernel: [94794.172979] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 16 12:39:35 zod kernel: [97440.367662] NVRM: Xid (0000:04:00): 13, 0006 00000000 00009097 0000114c 1004441d 0000000c
Oct 16 12:39:35 zod kernel: [97440.375469] NVRM: Xid (0000:02:00): 13, 0006 00000000 00009097 0000114c 1004441d 0000000c

Added a bug report while SSH'ed in.

Oct 25/11 Update: While using driver 275.09.07 I had another crash but it wasn't the Xid's that finally caused the system to freeze, it was two messages relatively new in my logs:

Oct 25 16:34:19 zod kernel: [ 1922.559591] NVRM: Xid (0000:04:00): 13, 0006 00000000 00009097 0000238c 08000050 0000000c
Oct 25 16:45:12 zod kernel: [ 2575.470654] NVRM: Xid (0000:04:00): 13, 0006 00000000 00009097 00002384 08000000 0000000c
Oct 25 17:26:33 zod kernel: [ 5056.287698] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 00040000
Oct 25 17:26:33 zod kernel: [ 5056.295296] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 00200000
Oct 25 17:26:33 zod kernel: [ 5056.302745] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 02000000
Oct 25 17:26:33 zod kernel: [ 5056.310186] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 00200000
Oct 25 17:26:33 zod kernel: [ 5056.317644] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 00200000
Oct 25 17:26:33 zod kernel: [ 5056.325088] NVRM: Xid (0000:04:00): 32, Channel ID 00000006 intr 00200000
Oct 25 17:54:02 zod kernel: [ 6705.510462] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 25 17:54:04 zod kernel: [ 6707.510395] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 25 17:54:05 zod kernel: [ 6708.232353] NVRM: GPU at 0000:03:00.0 has fallen off the bus.
Oct 25 17:54:05 zod kernel: [ 6708.495033] NVRM: GPU at 0000:04:00.0 has fallen off the bus.

Oct 25 17:54:05 zod kernel: [ 6708.538898] NVRM: os_pci_init_handle: invalid context!
Oct 25 17:54:05 zod kernel: [ 6708.538900] NVRM: os_pci_init_handle: invalid context!


This happened while playing World Of Tanks in Wine 1.3.16. The nice thing is with driver 275.09.07, reboots always work. With 290.03, about every 3rd reboot works and does not freeze on bootup, right after the panels and background appear.

Another interesting tidbit. I just got an MCE while writing this post and a bunch of Xid's before and after it.

Oct 25 18:00:17 zod kernel: [ 279.193194] NVRM: Xid (0000:04:00): 13, 0004 00000000 00009097 0000163c 00000021 0000000c
Oct 25 18:00:17 zod kernel: [ 279.200932] NVRM: Xid (0000:03:00): 13, 0004 00000000 00009097 0000163c 00000021 0000000c
Oct 25 18:00:36 zod kernel: [ 298.664338] NVRM: Xid (0000:04:00): 13, 0004 00000000 00009039 00000300 00100111 00000053
Oct 25 18:00:36 zod kernel: [ 298.671910] NVRM: Xid (0000:04:00): 13, 0004 00000000 00009039 00000304 00000000 00000000
Oct 25 18:00:36 zod kernel: [ 298.679484] NVRM: Xid (0000:04:00): 13, 0004 00000000 00009039 00000304 c0c05000 00000000
Oct 25 18:00:37 zod kernel: [ 299.083939] [Hardware Error]: Machine check events logged
Oct 25 18:20:29 zod kernel: [ 1487.027365] NVRM: Xid (0000:03:00): 31, Ch 00000004, engmask 00000101, intr 10000000
Oct 25 18:20:33 zod kernel: [ 1491.061260] NVRM: Xid (0000:02:00): 32, Channel ID 00000004 intr 00800000
Oct 25 18:20:33 zod kernel: [ 1491.068894] NVRM: Xid (0000:03:00): 32, Channel ID 00000004 intr 00800000

And the output of the MCELog is this:

mcelog: Unsupported new Family 6 Model 2c CPU: only decoding architectural errors
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 0
CPU 4 BANK 5
MISC 1 ADDR 3ff4b3835bda
TIME 1319588475 Tue Oct 25 18:21:15 2011
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS fe00000000800400 MCGSTATUS 0
MCGCAP 1c09 APICID 12 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44

I was getting MCE's with my last processor, maybe time to look at the board. This is the first MCE I have gotten with this new Intel 990X, which leads me to believe it is not related. Although I have been wrong before.
Attached Files
File Type: gz nvidia-bug-report.log.gz (60.7 KB, 60 views)

Last edited by cheechr1; 10-25-11 at 08:35 PM. Reason: Added Bug-report
cheechr1 is offline   Reply With Quote
Old 10-24-11, 10:13 PM   #2
cheechr1
Registered User
 
cheechr1's Avatar
 
Join Date: Nov 2010
Posts: 94
Default Re: Constant Xid's 285.05.09 + Wine + SLi + Boinc

Bump. Getting really tired of CONSTANT Xid's, not even only when gaming. I just got a couple while typing this message, just after a fresh boot with 290.03. Please look into this issue nvidia 3x GTX480 in Sli on Rampage II Extreme with latest Bios and Intel 990X. I can play maybe for 1 hour, then I will encounter many freezes and Xid's after reboot whether or not I am in game. I have a feeling this has to do with The Eq Overflowing issue, which I was directed to try another keyboard/mouse combo (non logitech) which I have done and has not improved the situation.

Also I am receiving Atomic Context errors in the kern.log, usually during boot resulting in freezes. The computer will successfully boot 1 time out of 3 or 4. Only a few times during boot I have been able to catch the error in kern.log:

Oct 24 17:43:45 zod kernel: [ 46.933183] NVRM: Xid (0000:02:00): 56, CMDre 00000000 0000089c 0100cb1a 00000007 00000000
Oct 24 17:43:45 zod kernel: [ 46.951442] NVRM: Xid (0000:02:00): 31, Ch 00000001, engmask 00000101, intr 10000000
Oct 24 17:43:47 zod kernel: [ 48.951291] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 24 17:43:49 zod kernel: [ 50.951203] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 24 17:43:53 zod kernel: [ 55.016447] NVRM: Xid (0000:02:00): 31, Ch 00000001, engmask 00000101, intr 10000000
Oct 24 17:43:55 zod kernel: [ 57.016307] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 24 17:43:57 zod kernel: [ 59.016207] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Oct 24 17:44:07 zod kernel: [ 68.578193] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
Oct 24 17:44:07 zod kernel: [ 68.578198] IP: [<ffffffffa042f465>] _nv005652rm+0xb4/0xc1 [nvidia]
Oct 24 17:44:07 zod kernel: [ 68.578287] PGD 0
Oct 24 17:44:07 zod kernel: [ 68.578289] Oops: 0002 [#1] SMP

To give you an idea this should be the last kern.log entry. Note the time.

Oct 24 17:43:37 zod kernel: [ 38.933474] hda-intel: IRQ timing workaround is activated for card #4. Suggest a bigger bdl_pos_adj.

Thats how fast the freeze usually happens. I am on kernel 2.6.38-11, Ubuntu 11.04, x64.
Please let me know if you need any more info.
cheechr1 is offline   Reply With Quote
Old 10-28-11, 10:30 PM   #3
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: Constant Xid's 285.05.09 + Wine + SLi + Boinc

I found that if I increase the wine app's priority then I no longer get the Xids, attempts to yield while atomic, or system crashes.

eg if my app is called test.exe:

Code:
sudo renice -n -15 `pidof test.exe`
(you can check if it has worked with schedtool `pidof test.exe`).

Does that help with your issue?
rockob is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 02:31 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.