nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   System Lockups - 8800 + Xeon System (http://www.nvnews.net/vbulletin/showthread.php?t=95864)

Biffidus 07-31-07 09:45 PM

System Lockups - 8800 + Xeon System
 
3 Attachment(s)
System: Intel Xeon workstation. 2x Xeon 5160 (dual core) CPUs. Intel S5000XVN motherboard (latest firmware as of July 2007). 8800 GTX video card (100.14.11 drivers).

The system is running CentOS 4.4 i386 (32-bit).

3D applications cause the system to lock up. The system cannot be accessed locally or remotely and does not respond to pings. The crash can be reliably triggered within 10 seconds with the following commands:
Code:

xmms song.ogg &
glxgears &

I have tried two other video cards:
  • 8800GS - same problem
  • 7600GT - no crashes!
I have a Core Duo system running the same OS, drivers and software. This does not crash with any of the video cards (7600GT, 8800GS or 8800GTX).

I have tried the following fixes, with no success:
  • pci=nommconf
  • idle=poll
  • different X configurations: single display, dual-head with Xinerama on/off, TwinView
I have attached logs for some of the configurations described above. Any suggestions are appreciated. Please let me know if there is any other information that might be helpful.

netllama 08-01-07 10:37 AM

Re: System Lockups - 8800 + Xeon System
 
The X configuration that you're attempting to use is not possible with the hardware that you have. You cannot drive two separate X screens along with Twinview with only 1 GPU. Granted this shouldn't cause instability, but you should correct the configuration to simply things.

I have a few questions:
0) Does this problem persist with the latest RHEL-4.5 kernel?
1) Can you setup a serial console to capture any kernel messages at the time of the crash?

thanks,
Lonni

Biffidus 08-01-07 07:25 PM

Re: System Lockups - 8800 + Xeon System
 
The X conf file is a bit of a mess. I was using multiple screens, then I used the nvidia-xconfig utility to set it up for TwinView and then just a simple display.

I will try to set up a serial console for error logging.

Do you have any other suggestions?

Biffidus 09-09-07 08:37 PM

Re: System Lockups - 8800 + Xeon System
 
Quote:

Originally Posted by netllama
I have a few questions:
0) Does this problem persist with the latest RHEL-4.5 kernel?
1) Can you setup a serial console to capture any kernel messages at the time of the crash?

The serial console captured the same occaisional "NVRM: Xid" messages from the nvidia kernel module as I was seeing in the system logs.

Some of my crashes were caused by the card not sitting securely in the PCIe slot. The little plastic clips that some cases use to hold the cards in place are no match for the weight of the 8800GTX cards.

Moving from RHEL 4.4 to RHEL 4.5 seems to have fixed the remaining crashes.

Biffidus 09-24-07 02:59 AM

Re: System Lockups - 8800 + Xeon System
 
I spoke too soon. I have had a couple more crashes since moving to CentOS 4.5. I found the usual NVRM Xid messages in my system logs. what do they mean?

Code:

Sep 24 16:49:03 ridcully kernel: NVRM: Xid (0007:00): 8, Channel 00000003
Sep 24 16:49:11 ridcully kernel: NVRM: Xid (0007:00): 8, Channel 00000003
Sep 24 16:49:11 ridcully kernel: NVRM: Xid (0007:00): 13, 0003 00000000 00005097 000015e0 00000000 00000080
Sep 24 16:49:11 ridcully kernel: NVRM: Xid (0007:00): 13, 0003 00000000 0000502d 00000860 00000000 00000100
Sep 24 16:49:11 ridcully kernel: NVRM: Xid (0007:00): 13, 0003 00000000 0000502d 00000860 00000000 00000100

I'll just point out that these crashes have both been since I plugged the second monitor back in. I ran it with a single screen for a week with no problems so it is possible it is something to do with running multiple monitors.

Biffidus 11-20-07 08:04 PM

Re: System Lockups - 8800 + Xeon System
 
1 Attachment(s)
The system is now running with a Quadro 5600 card. Crash frequency has decreased but it is still crashing. I am running out of things to try here. What can I try next?

The crashes seem to happen more gradually now: they used to be an instant hard-lock but now I am seeing the system gradually become non-responsive over 10-20 seconds before everything locks up. X, console and network connections are all unresponsive. The reset button always works.

The latest crash generated the following syslog entry:
Code:

Nov 16 14:08:22 ridcully kernel: NVRM: Xid (0007:00): 6, PE0005
Nov 16 14:08:22 ridcully kernel: NVRM: Xid (0007:00): 30,  L1 -> L0
Nov 16 14:08:48 ridcully kernel: NVRM: Xid (0007:00): 8, Channel ffffffff
Nov 16 14:08:48 ridcully kernel: NVRM: Xid (0007:00): 30,  L0 -> L0


Biffidus 11-26-07 06:28 PM

Re: System Lockups - 8800 + Xeon System
 
Quote:

Originally Posted by netllama
The X configuration that you're attempting to use is not possible with the hardware that you have. You cannot drive two separate X screens along with Twinview with only 1 GPU. Granted this shouldn't cause instability, but you should correct the configuration to simply things.

I have a few questions:
0) Does this problem persist with the latest RHEL-4.5 kernel?
1) Can you setup a serial console to capture any kernel messages at the time of the crash?

Lonni

I'm still waiting for a response from you on this!
  • I have disabled Twinview and am running multiple screens with Xinerama
  • The problem does indeed persist with the latest CentOS-4.5 kernel - and it still occurs now that I've replaced the 8800 card with a Quadro 5600.
  • I set up a serial console but it only gave me the same information that was available in the system logs and was unable to communicate with the system once X had locked up.
Please let me know what else I can do to help diagnose this problem.


All times are GMT -5. The time now is 05:26 PM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.