View Single Post
Old 11-08-07, 10:27 AM   #14
Registered User
Join Date: Mar 2007
Posts: 64
Default Re: Mysterious Message

If you are seeing severe stability problems and you are using a Linux 2.6 SMP kernel on a system with multiple processors (or processor cores) in combination with more than one GPU, please search the output of `dmesg` for the presence of the message below after the system has just been started:

If this message is present, please boot the system with the pci=nommconf kernel parameter and check if the stability problems continue to reproduce.
I only have one GPU installed, so this probably does not apply. However, my dmesg says:
[ 0.069897] PCI: Not using MMCONFIG.
so I'm OK either way.

If your system is equipped with a dual-core processor, booting with the idle=poll and/or maxcpus=1 kernel parameters may improve reliability with some Linux kernels.
dmesg says:
[ 21.976468] using mwait in idle threads.
and I have two cores, so I can try both of these.

However, poll=idle could make the system run very hot, and losing one of my cores will significantly hurt the performance of our application. So I'll try these to see if they have an effect, but they certainly cannot be considered a solution (or even a work-around).

These tests are now running on two systems. I'll get back to you when I have some results.

If you are using an AGP graphics card, please test setting NvAGP to 0 in xorg.conf. If this eliminates the instability, then you are experiencing a problem outside of the NVIDIA X driver, either in the motherboard BIOS, kernel, kernel AGP driver, or possibly in the motherboard itself.
Nope, PCI-E.

If you are using a Linux/x86-64 2.6 kernel and see the warning message below

If you are using the 1.0-7676 NVIDIA Linux/x86-64 graphics driver release and a Linux/x86-64 2.6 kernel < Linux 2.6.11,

If you are using a Linux/x86-64 kernel >= Linux 2.6.11 and < Linux 2.6.14,
Nope, 2.6.20.

If you see warning messages similar to those below in the system log file(s) when starting the X server or OpenGL applications,

For any problem that involves instability, you should always verify that you are using the most recently released BIOS for the motherboard.
Intel says the latest is:
BIOS Update 1709 [MQ96510J.86A] (1094KB)
1709 10/11/2007
Which is exactly what I'm running.

To make sure this log file includes as much relevant information as possible, please start the X server with `startx -- -logverbose 6` and run `` after the problem has occurred.
The bug report I posted before was captured before the problem occurred and I didn't have the logverbose 6 option turned on. There is no way for me to get the report after the problem occurs, since the machine is completely and utterly wedged. Would turning on that X option and rerunning the pre-death report be of any use?
jesmith is offline   Reply With Quote