Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 07-31-07, 09:45 PM   #1
Biffidus
Registered User
 
Join Date: Nov 2005
Posts: 7
Default System Lockups - 8800 + Xeon System

System: Intel Xeon workstation. 2x Xeon 5160 (dual core) CPUs. Intel S5000XVN motherboard (latest firmware as of July 2007). 8800 GTX video card (100.14.11 drivers).

The system is running CentOS 4.4 i386 (32-bit).

3D applications cause the system to lock up. The system cannot be accessed locally or remotely and does not respond to pings. The crash can be reliably triggered within 10 seconds with the following commands:
Code:
xmms song.ogg &
glxgears &
I have tried two other video cards:
  • 8800GS - same problem
  • 7600GT - no crashes!
I have a Core Duo system running the same OS, drivers and software. This does not crash with any of the video cards (7600GT, 8800GS or 8800GTX).

I have tried the following fixes, with no success:
  • pci=nommconf
  • idle=poll
  • different X configurations: single display, dual-head with Xinerama on/off, TwinView
I have attached logs for some of the configurations described above. Any suggestions are appreciated. Please let me know if there is any other information that might be helpful.
Attached Files
File Type: log nvidia-bug-report-CoreDuo+8800GS.log (113.2 KB, 114 views)
File Type: log nvidia-bug-report-Xeon+7600GT.log (139.8 KB, 91 views)
File Type: log nvidia-bug-report-Xeon+8800GTX.log (140.1 KB, 113 views)
Biffidus is offline   Reply With Quote
Old 08-01-07, 10:37 AM   #2
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: System Lockups - 8800 + Xeon System

The X configuration that you're attempting to use is not possible with the hardware that you have. You cannot drive two separate X screens along with Twinview with only 1 GPU. Granted this shouldn't cause instability, but you should correct the configuration to simply things.

I have a few questions:
0) Does this problem persist with the latest RHEL-4.5 kernel?
1) Can you setup a serial console to capture any kernel messages at the time of the crash?

thanks,
Lonni
netllama is offline   Reply With Quote
Old 08-01-07, 07:25 PM   #3
Biffidus
Registered User
 
Join Date: Nov 2005
Posts: 7
Default Re: System Lockups - 8800 + Xeon System

The X conf file is a bit of a mess. I was using multiple screens, then I used the nvidia-xconfig utility to set it up for TwinView and then just a simple display.

I will try to set up a serial console for error logging.

Do you have any other suggestions?
Biffidus is offline   Reply With Quote
Old 09-09-07, 08:37 PM   #4
Biffidus
Registered User
 
Join Date: Nov 2005
Posts: 7
Default Re: System Lockups - 8800 + Xeon System

Quote:
Originally Posted by netllama
I have a few questions:
0) Does this problem persist with the latest RHEL-4.5 kernel?
1) Can you setup a serial console to capture any kernel messages at the time of the crash?
The serial console captured the same occaisional "NVRM: Xid" messages from the nvidia kernel module as I was seeing in the system logs.

Some of my crashes were caused by the card not sitting securely in the PCIe slot. The little plastic clips that some cases use to hold the cards in place are no match for the weight of the 8800GTX cards.

Moving from RHEL 4.4 to RHEL 4.5 seems to have fixed the remaining crashes.
Biffidus is offline   Reply With Quote
Old 09-24-07, 02:59 AM   #5
Biffidus
Registered User
 
Join Date: Nov 2005
Posts: 7
Default Re: System Lockups - 8800 + Xeon System

I spoke too soon. I have had a couple more crashes since moving to CentOS 4.5. I found the usual NVRM Xid messages in my system logs. what do they mean?

Code:
Sep 24 16:49:03 ridcully kernel: NVRM: Xid (0007:00): 8, Channel 00000003
Sep 24 16:49:11 ridcully kernel: NVRM: Xid (0007:00): 8, Channel 00000003
Sep 24 16:49:11 ridcully kernel: NVRM: Xid (0007:00): 13, 0003 00000000 00005097 000015e0 00000000 00000080
Sep 24 16:49:11 ridcully kernel: NVRM: Xid (0007:00): 13, 0003 00000000 0000502d 00000860 00000000 00000100
Sep 24 16:49:11 ridcully kernel: NVRM: Xid (0007:00): 13, 0003 00000000 0000502d 00000860 00000000 00000100
I'll just point out that these crashes have both been since I plugged the second monitor back in. I ran it with a single screen for a week with no problems so it is possible it is something to do with running multiple monitors.
Biffidus is offline   Reply With Quote
Old 11-20-07, 08:04 PM   #6
Biffidus
Registered User
 
Join Date: Nov 2005
Posts: 7
Default Re: System Lockups - 8800 + Xeon System

The system is now running with a Quadro 5600 card. Crash frequency has decreased but it is still crashing. I am running out of things to try here. What can I try next?

The crashes seem to happen more gradually now: they used to be an instant hard-lock but now I am seeing the system gradually become non-responsive over 10-20 seconds before everything locks up. X, console and network connections are all unresponsive. The reset button always works.

The latest crash generated the following syslog entry:
Code:
Nov 16 14:08:22 ridcully kernel: NVRM: Xid (0007:00): 6, PE0005 
Nov 16 14:08:22 ridcully kernel: NVRM: Xid (0007:00): 30,  L1 -> L0
Nov 16 14:08:48 ridcully kernel: NVRM: Xid (0007:00): 8, Channel ffffffff
Nov 16 14:08:48 ridcully kernel: NVRM: Xid (0007:00): 30,  L0 -> L0
Attached Files
File Type: log nvidia-bug-report.log (140.8 KB, 97 views)
Biffidus is offline   Reply With Quote
Old 11-26-07, 06:28 PM   #7
Biffidus
Registered User
 
Join Date: Nov 2005
Posts: 7
Default Re: System Lockups - 8800 + Xeon System

Quote:
Originally Posted by netllama
The X configuration that you're attempting to use is not possible with the hardware that you have. You cannot drive two separate X screens along with Twinview with only 1 GPU. Granted this shouldn't cause instability, but you should correct the configuration to simply things.

I have a few questions:
0) Does this problem persist with the latest RHEL-4.5 kernel?
1) Can you setup a serial console to capture any kernel messages at the time of the crash?
Lonni

I'm still waiting for a response from you on this!
  • I have disabled Twinview and am running multiple screens with Xinerama
  • The problem does indeed persist with the latest CentOS-4.5 kernel - and it still occurs now that I've replaced the 8800 card with a Quadro 5600.
  • I set up a serial console but it only gave me the same information that was available in the system logs and was unable to communicate with the system once X had locked up.
Please let me know what else I can do to help diagnose this problem.
Biffidus is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Similar Threads
Thread Thread Starter Forum Replies Last Post
Max Payne 3 system requirements updated, new PC screenshots released News Archived News Items 0 05-28-12 06:49 AM
Strange RedHat 7.3 issues with A7N266-C nforceuser NVIDIA Linux 8 09-27-02 03:16 PM

All times are GMT -5. The time now is 12:35 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.