nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   Mysterious Message (http://www.nvnews.net/vbulletin/showthread.php?t=101751)

jesmith 11-05-07 08:40 AM

Mysterious Message
 
After running all weekend our system was hung this morning. I don't know if it's releated, but the syslog had this in it...

Nov 2 16:21:00 (none) kernel: [ 41.575114] NVRM: loading NVIDIA UNIX x86 Kernel Module 100.14.19 Wed Sep 12 14:12:24 PDT 2007
Nov 2 16:32:59 (none) kernel: [ 761.104514] NVRM: Xid (0001:00): 8, Channel 00000003
Nov 2 16:32:59 (none) kernel: [ 761.104792] NVRM: Xid (0001:00): 30, L1 -> L0
Nov 2 17:52:44 (none) kernel: [ 5544.415779] NVRM: Xid (0001:00): 8, Channel 00000003
Nov 2 17:52:44 (none) kernel: [ 5544.416056] NVRM: Xid (0001:00): 30, L1 -> L0
Nov 2 18:28:09 (none) kernel: [ 7669.082997] NVRM: Xid (0001:00): 8, Channel ffffffff
Nov 2 18:28:09 (none) kernel: [ 7669.083297] NVRM: Xid (0001:00): 30, L1 -> L0

Any idea what that means?

-Joshua

netllama 11-05-07 09:27 AM

Re: Mysterious Message
 
Please see the forum sticky posts.

jesmith 11-05-07 10:49 AM

Re: Mysterious Message
 
I see them. Could you be more specific?

Are you passive-aggressively saying that you want the whole nvidia-bug-report log stuff? I'd be happy to do that, but I figured you could tell me what the cryptic message meant without having to go thorugh all that. After all, it might be a perfectly harmless message....

-Joshua

netllama 11-05-07 10:51 AM

Re: Mysterious Message
 
Xids are never harmless, they represent something that is going wrong somewhere.

I'm not passively-aggressively telling you anything. The forum sticky posts are quite explicit in what steps you need to take if you'd like assistance here.

jesmith 11-05-07 10:56 AM

Re: Mysterious Message
 
Are the Xids documented?

(There are a half-dozen sticky threads. So if you are going to be so terse, the *least* you could do is point out which one you wanted me to read. If I hadn't already gone through this logs thing with you about 5 times now, I would have had ABSOLUTELY no clue what you were trying to tell me.)

-Joshua

netllama 11-05-07 10:59 AM

Re: Mysterious Message
 
The driver README explains the purpose and potential meaning of Xids.

energyman76b 11-05-07 03:36 PM

Re: Mysterious Message
 
to quote the readme:
Q. My kernel log contains messages that are prefixed with "Xid"; what do these
messages mean?

A. "Xid" messages indicate that a general GPU error occurred, most often due
to the driver misprogramming the GPU or to corruption of the commands sent
to the GPU. These messages provide diagnostic information that can be used
by NVIDIA to aid in debugging reported problems.


Xid means, you are in deep ****. Your hardware is confused and your driver is yelling. Please post the output of nvidia-bug-report.sh and reboot.

I for example got sometimes Xid's with a certain combination of damaged wmv and mplayer and my old 6600 (non gt), resulting not only in Xid's in dmesg but also in blue video windows (aka, xv is dead, Jim).

The sticky posts tell you to post the output of that script. And instead of attacking netllama you would have saved everybody a lot of time if you just did so.

jesmith 11-07-07 08:51 AM

Re: Mysterious Message
 
1 Attachment(s)
The log is attached.

This morning the system was hung, hard. Even pressing the power button couldn't unhang it. I have a script which captures tail -f syslog, and it said:

Nov 6 15:33:48 (none) kernel: [ 42.369694] nvidia: module license 'NVIDIA' taints kernel.
Nov 6 15:33:48 (none) kernel: [ 42.620009] NVRM: loading NVIDIA UNIX x86 Kernel Module 100.14.19 Wed Sep 12 14:12:24 PDT 2007
Nov 6 17:41:35 (none) kernel: [ 7707.799925] NVRM: Xid (0001:00): 8, Channel ffffffff
Nov 6 17:41:35 (none) kernel: [ 7707.800194] NVRM: Xid (0001:00): 30, L1 -> L0
Nov 6 20:34:32 (none) kernel: [18083.596385] NVRM: Xid (0001:00): 8, Channel ffffffff
Nov 6 20:34:32 (none) kernel: [18083.596655] NVRM: Xid (0001:00): 30, L1 -> L0
Nov 6 22:48:08 (none) kernel: [26097.742084] NVRM: Xid (0001:00): 8, Channel ffffffff
Nov 6 22:48:08 (none) kernel: [26097.742352] NVRM: Xid (0001:00): 30, L1 -> L0
Nov 7 01:29:22 (none) kernel: [35769.579245] NVRM: Xid (0001:00): 8, Channel ffffffff
Nov 7 01:29:22 (none) kernel: [35769.579524] NVRM: Xid (0001:00): 30, L1 -> L0

I do not know if it hung at 1:30 am, just that it was hung at 8 am.

Note that I've never seen this error, and we put together a lot of systems with an identical configuration to this, so it feels like we're looking at a hardware fault, either in the motherboard or the Quadro. A pointer to which is more likely would be MUCH appreciated.

Also, my apologies if you think I wasted anyone's time. However, I would recommend that in the future, something like "Please read the forum sticky post; we need to see the log to diagnose that" wouldn't have taken any more time for netlama to write, and would have been far more customer-friendly. (My company buys a lot of hardware from NVIDIA, so pardon me for expecting to be treated like a customer, not a nuisance.)

netllama 11-07-07 10:23 AM

Re: Mysterious Message
 
If you have other identical systems which are not experiencing the instability, then its very likely that this is a hardware problem. Swapping the Quadro 4600 between good & bad systems would be the best way to isolate the problem.

I don't see anything in the bug report which points to a specific problem.

jesmith 11-07-07 10:30 AM

Re: Mysterious Message
 
Thanks. We're starting those tests now. But is it at all possible that you could tell me what that message actually means?

netllama 11-07-07 10:40 AM

Re: Mysterious Message
 
The Xids that you've posted refer to status changes in the driver. Without knowing specifically what triggered them, I couldn't comment further.

jesmith 11-07-07 03:49 PM

Re: Mysterious Message
 
It's not a hardware problem.

I just ran the same test on a completely different unit, and it had the same error in the logs:

Nov 7 00:44:34 (none) kernel: [ 4673.803135] NVRM: Xid (0001:00): 8, Channel 00000003
Nov 7 00:44:34 (none) kernel: [ 4673.803434] NVRM: Xid (0001:00): 30, L1 -> L0
Nov 7 00:47:10 (none) kernel: [ 4829.780294] NVRM: Xid (0001:00): 8, Channel 00000003
Nov 7 00:47:10 (none) kernel: [ 4829.780593] NVRM: Xid (0001:00): 30, L1 -> L0

The system was completely hung. I had to hold the power button for 5 seconds to get it to shut down.

So it appears this problem is in the driver. How do we proceed with diagnosing this?


All times are GMT -5. The time now is 08:58 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.