nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Solaris (http://www.nvnews.net/vbulletin/forumdisplay.php?f=45)
-   -   Intermittent problems with Quadro NVS 290 cards (http://www.nvnews.net/vbulletin/showthread.php?t=134109)

paulbjr 06-05-09 06:37 PM

Intermittent problems with Quadro NVS 290 cards
 
1 Attachment(s)
We are developing a new release of an established Solaris based product that we have sold for years on Solaris SPARC. We have now ported it to Solaris 10 - X86 using Nvidia Quadro NVS 290 cards. One system has one Quadro NVS 290 (two displays) the other has two Quadro NVS 290 cards and four displays.

Of the twenty two development and test machines we are using, TWO have on a number of occasions exhibited a graphics freeze and/or system crashes. At other times, NVRM message show up in our /var/adm/messages with no *apparent* harm to the system.

No other systems appear to have any NVRM messages!

We are running the driver that was packaged with the version of Solaris 10 we are using:

- version 100.14.19 dated Sept 12, 2007.

We are updating our systems to the latest 180.51 driver but I have kept this system at the old driver level in case the old driver is helping unmask hardware issues.

The errors occur at unpredictable times (not right after booting). When related crashes occur they are often a half hour to several hours after NVRM messages appear in /var/adm/messages.

This week, after a several weeks of very few NVRM messages, a swarm of NVRM messages showed up and two crashes occurred. I did NOT notice any freeze on this box

The history looks like this:

Jun 4 10:40:14 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 00000184 00000466 00000008
Jun 4 10:40:14 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 00000100 00000000 00000100
Jun 4 15:43:23 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 00005000 00000000 00000478 00000000
Jun 4 15:43:23 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 00000100 00000000 00000100

system crash with core dump occurred at 15:44:36

Jun 4 15:49:49 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 00005000 00000000 00000478 00000000
Jun 4 15:49:49 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 00000100 00000000 00000100

system crash with core dump occurred at 19:01:43

Jun 5 04:05:34 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 6, PE0001
Jun 5 04:05:34 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 00000100 00000000 00000100
Jun 5 04:05:34 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 000005e0 007f06a2 00000100
Jun 5 08:15:41 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 00000184 00000466 00000008
Jun 5 11:11:05 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 00000184 00000466 00000008
Jun 5 12:17:41 pluto nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 13, 0001 00000000 0000002d 00000000 00000478 00000000


Since I could not find a nvidia-bug-report.sh script on our system, I am uploading bug-report.gz containing /var/adm/messages, Xorg.0.log, and xorg.conf.

Any insight will be much appreciated.

paulbjr 06-11-09 01:45 PM

Re: Intermittent problems with Quadro NVS 290 cards
 
As an update to my initial post, the following error messages have shown up in the past few days:

Note: while the machine radisys-ha has two Quadro NVS 290 cards (x16 and x1), the machine named arrow2 has a Quadro NVS 440 card.

/var/adm/messages:Jun 10 17:03:16 radisys nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 1, Channel 00000002 Method 00000064 Data 000000b0
/var/adm/messages:Jun 10 17:05:46 radisys nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 1, Channel 00000002 Method 00000064 Data 000000b0

/var/adm/messages.0:Jun 9 09:42:54 radisys-ha nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 1, Channel 00000002 Method 00000064 Data 000000b0

/var/adm/messages.0:Jun 10 10:56:50 arrow2 nvidia: [ID 702911 kern.notice] NVRM: Xid (0004:00): 16, Head 00000001 Count 00004b7e
/var/adm/messages.0:Jun 10 10:56:57 arrow2 nvidia: [ID 702911 kern.notice] NVRM: Xid (0004:00): 16, Head 00000001 Count 00004b7f
/var/adm/messages.0:Jun 10 10:57:05 arrow2 nvidia: [ID 702911 kern.notice] NVRM: Xid (0004:00): 16, Head 00000001 Count 00004b80
/var/adm/messages.0:Jun 10 10:57:13 arrow2 nvidia: [ID 702911 kern.notice] NVRM: Xid (0004:00): 16, Head 00000001 Count 00004b81
/var/adm/messages.0:Jun 10 10:57:21 arrow2 nvidia: [ID 702911 kern.notice] NVRM: Xid (0004:00): 16, Head 00000001 Count 00004b82
/var/adm/messages.0:Jun 10 10:57:28 arrow2 nvidia: [ID 702911 kern.notice] NVRM: Xid (0004:00): 16, Head 00000001 Count 00004b83
/var/adm/messages.0:Jun 10 10:57:36 arrow2 nvidia: [ID 702911 kern.notice] NVRM: Xid (0004:00): 16, Head 00000001 Count 00004b84
/var/adm/messages.0:Jun 10 10:57:43 arrow2 nvidia: [ID 702911 kern.notice] NVRM: Xid (0004:00): 16, Head 00000001 Count 00004b85


These errors seem to occur at random times and only sometimes cause crashes.

paulbjr 06-11-09 01:50 PM

Re: Intermittent problems with Quadro NVS 290 cards
 
Going back a bit in my records, I noticed that we have been having over the past month or so a good number of seemingly "innocuous" NVRM messages of the form:

May 27 12:30:26 radisys-ha nvidia: [ID 702911 kern.notice] NVRM: Xid (0001:00): 1, Channel 00000002 Method 00000064 Data bfef0007

on our "radisys-ha" machine. This box has two Quadro NVS 290 (X16 & X1) cards.


All times are GMT -5. The time now is 04:19 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.