Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 04-20-10, 03:19 PM   #1
vincefn
Registered User
 
Join Date: Apr 2010
Posts: 6
Default 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

Hi,

A few days ago I updated my PCs to try (k)ubuntu Lucid Lynx (10.04) beta2 - the installation (clean on a reformatted partition) went all right until I installed the nvidia 195.36.15 driver (using the nvidia-current ubuntu package). After reboot the system would completely crash around the time X should start - the system is then totally unresponsive, and I cannot see anything in the logs.

The PC has two nVidia cards, one 9600GT with two screens attached, and one GTX295 for CUDA computing only. After searching a bit, I found out that removing one card allowed to boot without any problem. The kernel used is 2.6.32-21-{generic|server}, with the amd64 architecture.

Following a previous thread (http://www.nvnews.net/vbulletin/show....php?p=2220574), I tried recompiling the 2.6.32-21 kernel after disabling the VGA Arbiter, but apparently simply setting CONFIG_VGA_ARB=n does not work, it's enabled automagically during compilationdue to some kernel dependency (?) - in fact it's not visible during make {x|menu}config.

Finally I found a slightly older (2.6.31-21) ubuntu kernel, which works perfectly fine with both cards and the nvidia-current/195.36.15 nVidia kernel - CUDA (3.0) is back to normal as well.

So it seems some change between 2.6.31 and 2.6.32 is having some very bad side effect with dual nVidia cards. Here is the relevant bug in launchpad:
https://bugs.launchpad.net/ubuntu/+s...rs/+bug/548362

Here are other threads that seem to be relevant:
http://www.nvnews.net/vbulletin/show....php?p=2220574
http://www.nvnews.net/vbulletin/showthread.php?t=149072
http://www.nvnews.net/vbulletin/show....php?p=2199119

I am not on the problematic PC at the moment, I can give more log report tomorrow.
vincefn is offline   Reply With Quote
Old 04-28-10, 03:31 AM   #2
vincefn
Registered User
 
Join Date: Apr 2010
Posts: 6
Default 195.36.15(or 24) + kernel 2.6.32 + two nVidia cards = crash

Hi,

This bug - occurring when using two nvidia cards with kernel 2.6.32 and driver 195.36.15 (now -24 as well), has now been confirmed by several people. See the updated bug report on launchpad:

https://bugs.launchpad.net/ubuntu/+s...rs/+bug/548362

This morning I tried also with 195.36.24 (from the x-swat ppa archive), with the same result as with 195.36.15: hard crash around the time X should start, the computer is completely unresponsive (for more than 2 minutes), only a reset allows to reboot.
With kernel 2.6.31-21, it works fine.

I have attached the two bug reports. For both I booted first using the 2.6.32(-21-server) kernel, which led to a crash, and then using the 2.6.31 kernel, which booted fine and allowed to generate the bug report.
The first bug report is when using 195.36.15, the second was made immediately after installing driver 195.36.24.

Note that in the launchpad bug report, all people seem to be using the amd64 architecture so far.
Attached Files
File Type: gz nvidia195.36.15-bug-report.log.gz (46.6 KB, 102 views)
File Type: gz nvidia195.36.24-bug-report.log.gz (46.5 KB, 97 views)
vincefn is offline   Reply With Quote
Old 04-28-10, 12:06 PM   #3
danix
NVIDIA Corporation
 
danix's Avatar
 
Join Date: Feb 2010
Location: Santa Clara, CA
Posts: 237
Default Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

Do you still get a crash with boot option intel_iommu=off?
danix is offline   Reply With Quote
Old 04-29-10, 02:45 AM   #4
vincefn
Registered User
 
Join Date: Apr 2010
Posts: 6
Default Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

Quote:
Originally Posted by danix View Post
Do you still get a crash with boot option intel_iommu=off?
Yes, same crash with intel_iommu=off.
vincefn is offline   Reply With Quote
Old 05-01-10, 05:39 AM   #5
Vadim
Registered User
 
Join Date: May 2003
Location: Moscow, Russia
Posts: 10
Default Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

I had exactly the same problem, but it was resolved by turning off CONFIG_VGA_ARB option.
This option can be found in "Device driver" > "Graphics support". But its necessary to check "Configure standard kernel features" option in "General setup" to make this option visible.
However, after switching off VGA arbiter VESA framebuffer doesn't work anymore.
Vadim is offline   Reply With Quote
Old 05-01-10, 01:10 PM   #6
vincefn
Registered User
 
Join Date: Apr 2010
Posts: 6
Default Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

Quote:
Originally Posted by Vadim View Post
I had exactly the same problem, but it was resolved by turning off CONFIG_VGA_ARB option.
This option can be found in "Device driver" > "Graphics support". But its necessary to check "Configure standard kernel features" option in "General setup" to make this option visible.
However, after switching off VGA arbiter VESA framebuffer doesn't work anymore.
Ah, so this is how you can manage to activate that option during build - when I tried the option was always reactivated. But now I'd rather use the 2.6.31 kernel and keep a console, and hope guys at nVidia can reproduce and fix this for the next driver release.

btw, in launchpad [https://bugs.launchpad.net/bugs/548362] there are more new reports with the same issue - so far all with the amd64 arch.
vincefn is offline   Reply With Quote
Old 05-01-10, 07:47 PM   #7
AlexLG
Registered User
 
Join Date: Feb 2010
Posts: 9
Default Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

I have the same problem. I tried to plug a PCI card with the two nvidia pci-e card plugged, specified in bios to use the pci card to boot and everthing work fine. So I can't manage to dump a kernel panic from the driver as it doesn't freeze
AlexLG is offline   Reply With Quote
Old 05-02-10, 04:23 AM   #8
vincefn
Registered User
 
Join Date: Apr 2010
Posts: 6
Default Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

One further note: apparently the crash occurs before Xorg even starts logging - if you open the nvidi-bug-report given in a previous message, you'll see that for both the Xorg.0.log and the Xorg.0.log.old, the kernel is 2.6.31.
What I did was:
1) Start from working 2.6.31
2) reboot on 2.6.32 => crash
3) wait 2 minutes
4) hard reboot to 2.6.31
5) nvidia-bug-report.sh

So in other words during the 2.6.32 boot X did not manage to start logging, and the logs corresponds to steps (1) and (4), but nothing is written during (2)...
vincefn is offline   Reply With Quote

Old 05-03-10, 08:52 PM   #9
danix
NVIDIA Corporation
 
danix's Avatar
 
Join Date: Feb 2010
Location: Santa Clara, CA
Posts: 237
Default Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

Thanks for all the reports. We have tracked this down to a problem with the kernel's VGA arbiter trying to move VGA ownership to a GPU that hasn't POSTed. We're working with the developers of the VGA arbiter to get this fixed in the kernel.

In the meantime, we have a patch which works around the issue by preventing the VGA arbiter from moving VGA ownership away from the default device. It is attached as "NVIDIA_kernel-195.36.24-682377.diff.txt".

You can apply the patch by downloading the latest installer from http://www.nvidia.com/object/linux-d...195.36.24.html (32-bit) or http://www.nvidia.com/object/linux-d...195.36.24.html (64-bit) and running the installer with the "--apply-patch /path/to/patch.diff" option. This will create a patched installer with a name ending in "-custom.run", which will install a driver with this workaround.
Attached Files
File Type: txt NVIDIA_kernel-195.36.24-682377.diff.txt (419 Bytes, 268 views)
danix is offline   Reply With Quote
Old 05-04-10, 04:35 AM   #10
Vadim
Registered User
 
Join Date: May 2003
Location: Moscow, Russia
Posts: 10
Default Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

Thx for the patch.
It has the same effect as turning off VGA arbiter - system doesn't crash anymore when loading X, but its still impossible to use framebuffer.
Vadim is offline   Reply With Quote
Old 05-05-10, 08:45 AM   #11
vincefn
Registered User
 
Join Date: Apr 2010
Posts: 6
Smile Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

Thanks for the quick fix of this issue.

I have tested the updated ubuntu package including your patch (in -proposed for Lucid) and it works fine with kernel 2.6.32.

Cheers,
vincefn is offline   Reply With Quote
Old 05-05-10, 10:37 AM   #12
Arup
Registered User
 
Join Date: May 2009
Posts: 122
Default Re: 195.36.15 + kernel 2.6.32 + dual cards crash (Ubuntu 10.04 Lucid Lynx)

This update has also been applied to the Ubuntu xswat ppa, quite surprised that they are acting this quick.
Arup is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 06:10 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.