nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx) (http://www.nvnews.net/vbulletin/showthread.php?t=150212)

vincefn 04-20-10 04:19 PM

195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
Hi,

A few days ago I updated my PCs to try (k)ubuntu Lucid Lynx (10.04) beta2 - the installation (clean on a reformatted partition) went all right until I installed the nvidia 195.36.15 driver (using the nvidia-current ubuntu package). After reboot the system would completely crash around the time X should start - the system is then totally unresponsive, and I cannot see anything in the logs.

The PC has two nVidia cards, one 9600GT with two screens attached, and one GTX295 for CUDA computing only. After searching a bit, I found out that removing one card allowed to boot without any problem. The kernel used is 2.6.32-21-{generic|server}, with the amd64 architecture.

Following a previous thread (http://www.nvnews.net/vbulletin/show....php?p=2220574), I tried recompiling the 2.6.32-21 kernel after disabling the VGA Arbiter, but apparently simply setting CONFIG_VGA_ARB=n does not work, it's enabled automagically during compilationdue to some kernel dependency (?) - in fact it's not visible during make {x|menu}config.

Finally I found a slightly older (2.6.31-21) ubuntu kernel, which works perfectly fine with both cards and the nvidia-current/195.36.15 nVidia kernel - CUDA (3.0) is back to normal as well.

So it seems some change between 2.6.31 and 2.6.32 is having some very bad side effect with dual nVidia cards. Here is the relevant bug in launchpad:
https://bugs.launchpad.net/ubuntu/+s...rs/+bug/548362

Here are other threads that seem to be relevant:
http://www.nvnews.net/vbulletin/show....php?p=2220574
http://www.nvnews.net/vbulletin/showthread.php?t=149072
http://www.nvnews.net/vbulletin/show....php?p=2199119

I am not on the problematic PC at the moment, I can give more log report tomorrow.

vincefn 04-28-10 04:31 AM

195.36.15(or 24) + kernel 2.6.32 + two nVidia cards = crash
 
2 Attachment(s)
Hi,

This bug - occurring when using two nvidia cards with kernel 2.6.32 and driver 195.36.15 (now -24 as well), has now been confirmed by several people. See the updated bug report on launchpad:

https://bugs.launchpad.net/ubuntu/+s...rs/+bug/548362

This morning I tried also with 195.36.24 (from the x-swat ppa archive), with the same result as with 195.36.15: hard crash around the time X should start, the computer is completely unresponsive (for more than 2 minutes), only a reset allows to reboot.
With kernel 2.6.31-21, it works fine.

I have attached the two bug reports. For both I booted first using the 2.6.32(-21-server) kernel, which led to a crash, and then using the 2.6.31 kernel, which booted fine and allowed to generate the bug report.
The first bug report is when using 195.36.15, the second was made immediately after installing driver 195.36.24.

Note that in the launchpad bug report, all people seem to be using the amd64 architecture so far.

danix 04-28-10 01:06 PM

Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
Do you still get a crash with boot option intel_iommu=off?

vincefn 04-29-10 03:45 AM

Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
Quote:

Originally Posted by danix (Post 2241671)
Do you still get a crash with boot option intel_iommu=off?

Yes, same crash with intel_iommu=off.

Vadim 05-01-10 06:39 AM

Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
I had exactly the same problem, but it was resolved by turning off CONFIG_VGA_ARB option.
This option can be found in "Device driver" > "Graphics support". But its necessary to check "Configure standard kernel features" option in "General setup" to make this option visible.
However, after switching off VGA arbiter VESA framebuffer doesn't work anymore. :(

vincefn 05-01-10 02:10 PM

Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
Quote:

Originally Posted by Vadim (Post 2243543)
I had exactly the same problem, but it was resolved by turning off CONFIG_VGA_ARB option.
This option can be found in "Device driver" > "Graphics support". But its necessary to check "Configure standard kernel features" option in "General setup" to make this option visible.
However, after switching off VGA arbiter VESA framebuffer doesn't work anymore. :(

Ah, so this is how you can manage to activate that option during build - when I tried the option was always reactivated. But now I'd rather use the 2.6.31 kernel and keep a console, and hope guys at nVidia can reproduce and fix this for the next driver release.

btw, in launchpad [https://bugs.launchpad.net/bugs/548362] there are more new reports with the same issue - so far all with the amd64 arch.

AlexLG 05-01-10 08:47 PM

Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
I have the same problem. I tried to plug a PCI card with the two nvidia pci-e card plugged, specified in bios to use the pci card to boot and everthing work fine. So I can't manage to dump a kernel panic from the driver as it doesn't freeze :(

vincefn 05-02-10 05:23 AM

Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
One further note: apparently the crash occurs before Xorg even starts logging - if you open the nvidi-bug-report given in a previous message, you'll see that for both the Xorg.0.log and the Xorg.0.log.old, the kernel is 2.6.31.
What I did was:
1) Start from working 2.6.31
2) reboot on 2.6.32 => crash
3) wait 2 minutes
4) hard reboot to 2.6.31
5) nvidia-bug-report.sh

So in other words during the 2.6.32 boot X did not manage to start logging, and the logs corresponds to steps (1) and (4), but nothing is written during (2)...

danix 05-03-10 09:52 PM

Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
1 Attachment(s)
Thanks for all the reports. We have tracked this down to a problem with the kernel's VGA arbiter trying to move VGA ownership to a GPU that hasn't POSTed. We're working with the developers of the VGA arbiter to get this fixed in the kernel.

In the meantime, we have a patch which works around the issue by preventing the VGA arbiter from moving VGA ownership away from the default device. It is attached as "NVIDIA_kernel-195.36.24-682377.diff.txt".

You can apply the patch by downloading the latest installer from http://www.nvidia.com/object/linux-d...195.36.24.html (32-bit) or http://www.nvidia.com/object/linux-d...195.36.24.html (64-bit) and running the installer with the "--apply-patch /path/to/patch.diff" option. This will create a patched installer with a name ending in "-custom.run", which will install a driver with this workaround.

Vadim 05-04-10 05:35 AM

Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
Thx for the patch.
It has the same effect as turning off VGA arbiter - system doesn't crash anymore when loading X, but its still impossible to use framebuffer.

vincefn 05-05-10 09:45 AM

Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
Thanks for the quick fix of this issue.

I have tested the updated ubuntu package including your patch (in -proposed for Lucid) and it works fine with kernel 2.6.32.

Cheers,

Arup 05-05-10 11:37 AM

Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)
 
This update has also been applied to the Ubuntu xswat ppa, quite surprised that they are acting this quick. :)


All times are GMT -5. The time now is 04:47 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.