|
|
#1 | |
|
Registered User
Join Date: Apr 2010
Posts: 6
|
Hi,
A few days ago I updated my PCs to try (k)ubuntu Lucid Lynx (10.04) beta2 - the installation (clean on a reformatted partition) went all right until I installed the nvidia 195.36.15 driver (using the nvidia-current ubuntu package). After reboot the system would completely crash around the time X should start - the system is then totally unresponsive, and I cannot see anything in the logs. The PC has two nVidia cards, one 9600GT with two screens attached, and one GTX295 for CUDA computing only. After searching a bit, I found out that removing one card allowed to boot without any problem. The kernel used is 2.6.32-21-{generic|server}, with the amd64 architecture. Following a previous thread (http://www.nvnews.net/vbulletin/show....php?p=2220574), I tried recompiling the 2.6.32-21 kernel after disabling the VGA Arbiter, but apparently simply setting CONFIG_VGA_ARB=n does not work, it's enabled automagically during compilationdue to some kernel dependency (?) - in fact it's not visible during make {x|menu}config. Finally I found a slightly older (2.6.31-21) ubuntu kernel, which works perfectly fine with both cards and the nvidia-current/195.36.15 nVidia kernel - CUDA (3.0) is back to normal as well. So it seems some change between 2.6.31 and 2.6.32 is having some very bad side effect with dual nVidia cards. Here is the relevant bug in launchpad: https://bugs.launchpad.net/ubuntu/+s...rs/+bug/548362 Here are other threads that seem to be relevant: http://www.nvnews.net/vbulletin/show....php?p=2220574 http://www.nvnews.net/vbulletin/showthread.php?t=149072 http://www.nvnews.net/vbulletin/show....php?p=2199119 I am not on the problematic PC at the moment, I can give more log report tomorrow. |
|
|
|
|
|
|
#2 | |
|
Registered User
Join Date: Apr 2010
Posts: 6
|
Hi,
This bug - occurring when using two nvidia cards with kernel 2.6.32 and driver 195.36.15 (now -24 as well), has now been confirmed by several people. See the updated bug report on launchpad: https://bugs.launchpad.net/ubuntu/+s...rs/+bug/548362 This morning I tried also with 195.36.24 (from the x-swat ppa archive), with the same result as with 195.36.15: hard crash around the time X should start, the computer is completely unresponsive (for more than 2 minutes), only a reset allows to reboot. With kernel 2.6.31-21, it works fine. I have attached the two bug reports. For both I booted first using the 2.6.32(-21-server) kernel, which led to a crash, and then using the 2.6.31 kernel, which booted fine and allowed to generate the bug report. The first bug report is when using 195.36.15, the second was made immediately after installing driver 195.36.24. Note that in the launchpad bug report, all people seem to be using the amd64 architecture so far. |
|
|
|
|
|
|
#3 |
|
NVIDIA Corporation
Join Date: Feb 2010
Location: Santa Clara, CA
Posts: 237
|
Do you still get a crash with boot option intel_iommu=off?
|
|
|
|
|
|
#4 | |
|
Registered User
Join Date: Apr 2010
Posts: 6
|
|
|
|
|
|
|
|
#5 |
|
Registered User
Join Date: May 2003
Location: Moscow, Russia
Posts: 10
|
I had exactly the same problem, but it was resolved by turning off CONFIG_VGA_ARB option.
This option can be found in "Device driver" > "Graphics support". But its necessary to check "Configure standard kernel features" option in "General setup" to make this option visible. However, after switching off VGA arbiter VESA framebuffer doesn't work anymore. ![]() |
|
|
|
|
|
#6 | |
|
Registered User
Join Date: Apr 2010
Posts: 6
|
Quote:
btw, in launchpad [https://bugs.launchpad.net/bugs/548362] there are more new reports with the same issue - so far all with the amd64 arch. |
|
|
|
|
|
|
#7 |
|
Registered User
Join Date: Feb 2010
Posts: 9
|
I have the same problem. I tried to plug a PCI card with the two nvidia pci-e card plugged, specified in bios to use the pci card to boot and everthing work fine. So I can't manage to dump a kernel panic from the driver as it doesn't freeze
![]() |
|
|
|
|
|
#8 |
|
Registered User
Join Date: Apr 2010
Posts: 6
|
One further note: apparently the crash occurs before Xorg even starts logging - if you open the nvidi-bug-report given in a previous message, you'll see that for both the Xorg.0.log and the Xorg.0.log.old, the kernel is 2.6.31.
What I did was: 1) Start from working 2.6.31 2) reboot on 2.6.32 => crash 3) wait 2 minutes 4) hard reboot to 2.6.31 5) nvidia-bug-report.sh So in other words during the 2.6.32 boot X did not manage to start logging, and the logs corresponds to steps (1) and (4), but nothing is written during (2)... |
|
|
|
|
|
#9 |
|
NVIDIA Corporation
Join Date: Feb 2010
Location: Santa Clara, CA
Posts: 237
|
Thanks for all the reports. We have tracked this down to a problem with the kernel's VGA arbiter trying to move VGA ownership to a GPU that hasn't POSTed. We're working with the developers of the VGA arbiter to get this fixed in the kernel.
In the meantime, we have a patch which works around the issue by preventing the VGA arbiter from moving VGA ownership away from the default device. It is attached as "NVIDIA_kernel-195.36.24-682377.diff.txt". You can apply the patch by downloading the latest installer from http://www.nvidia.com/object/linux-d...195.36.24.html (32-bit) or http://www.nvidia.com/object/linux-d...195.36.24.html (64-bit) and running the installer with the "--apply-patch /path/to/patch.diff" option. This will create a patched installer with a name ending in "-custom.run", which will install a driver with this workaround. |
|
|
|
|
|
#10 |
|
Registered User
Join Date: May 2003
Location: Moscow, Russia
Posts: 10
|
Thx for the patch.
It has the same effect as turning off VGA arbiter - system doesn't crash anymore when loading X, but its still impossible to use framebuffer. |
|
|
|
|
|
#11 |
|
Registered User
Join Date: Apr 2010
Posts: 6
|
Thanks for the quick fix of this issue.
I have tested the updated ubuntu package including your patch (in -proposed for Lucid) and it works fine with kernel 2.6.32. Cheers, |
|
|
|
|
|
#12 |
|
Registered User
Join Date: May 2009
Posts: 122
|
This update has also been applied to the Ubuntu xswat ppa, quite surprised that they are acting this quick.
![]() |
|
|
|
![]() |
| Thread Tools | |
|
|