Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 03-26-04, 02:58 PM   #1
Jaymz031602
Registered User
 
Join Date: Mar 2004
Location: Illinois
Posts: 2
Default Badness in pci_find_subsys at drivers/pci/search.c:167 -- kernel 2.6.3-gentoo-r1

I've been experiencing some problems with the latest nVidia 5336 driver in combination with my 2.6.3-gentoo-r1 (sys-kernel/gentoo-dev-sources) kernel. X starts perfectly well, and KDE boots up just fine. 3D hardware acceleration works beautifully, but here's the kicker:

After about a half-hour to a few hours of operation in X, the keyboard and mouse simply stop responding in X. I can move the mouse around, but I can't click anything or type any keys. Numlock / Scroll-lock / CAPS-lock lights do not respond.

Other times, the screen would go blank (I'm not sure if it does this after the previous side-effects happen, or if its just random), and the keyboard and mouse still don't work, and the only way to get the system back is to use SysRq+K and kill the X display, then CTRL+ALT+F1 to a console, login as root, and killall -9 X, and killall -9 kdm, then I get the framebuffer back, and I can restart KDE fine. But it's a vicious cycle and it will happen about another half-hour later.

This seems to happen completely randomly, even when I'm not using any hardware-accelerated applications (e.g. games). But I do understand that X uses hardware acceleration to render, so don't flame me for that.

I experienced this problem in kernel-2.6.5-rc1-mm1 as well. I read somewhere that dumping the mm1 extensions to the kernel might fix the problem, which is why I switched to the gentoo-dev-sources (2.6.3-gentoo-r1) but alas it did not fix the issue.

Here's my /var/log/messages dump when this happens (I confirmed the time):

Mar 26 02:48:49 localhost Badness in pci_find_subsys at drivers/pci/search.c:167
Mar 26 02:48:49 localhost Call Trace:
Mar 26 02:48:49 localhost [<c02704a8>] pci_find_subsys+0xe8/0xf0
Mar 26 02:48:49 localhost [<c02704df>] pci_find_device+0x2f/0x40
Mar 26 02:48:49 localhost [<c02702e8>] pci_find_slot+0x28/0x50
Mar 26 02:48:49 localhost [<e1eac48e>] os_pci_init_handle+0x39/0x68 [nvidia]
Mar 26 02:48:49 localhost [<e1d4085f>] _nv001243rm+0x1f/0x24 [nvidia]
Mar 26 02:48:49 localhost [<e1e87115>] _nv000816rm+0x2f5/0x384 [nvidia]
Mar 26 02:48:49 localhost [<e1def92c>] _nv003801rm+0xd8/0x100 [nvidia]
Mar 26 02:48:49 localhost [<e1e86c4f>] _nv000809rm+0x2f/0x34 [nvidia]
Mar 26 02:48:49 localhost [<e1df0750>] _nv003816rm+0xf0/0x104 [nvidia]
Mar 26 02:48:49 localhost [<e1df14c7>] _nv000013rm+0x77/0x84 [nvidia]
Mar 26 02:48:49 localhost [<e1df0e6b>] _nv003780rm+0x1df/0x2c8 [nvidia]
Mar 26 02:48:49 localhost [<e1df0c77>] _nv000012rm+0x43/0x58 [nvidia]
Mar 26 02:48:49 localhost [<e1df0c34>] _nv000012rm+0x0/0x58 [nvidia]
Mar 26 02:48:49 localhost [<e1d3469c>] _nv001219rm+0xa8/0x124 [nvidia]
Mar 26 02:48:49 localhost [<e1ea9bd9>] nv_kern_rc_timer+0x0/0x37 [nvidia]
Mar 26 02:48:49 localhost [<e1d44eb6>] rm_run_rc_callback+0x36/0x4c [nvidia]
Mar 26 02:48:49 localhost [<e1ea9bec>] nv_kern_rc_timer+0x13/0x37 [nvidia]
Mar 26 02:48:49 localhost [<c012a88b>] run_timer_softirq+0xcb/0x1b0
Mar 26 02:48:49 localhost [<c012aa5f>] do_timer+0xdf/0xf0
Mar 26 02:48:49 localhost [<c0126210>] do_softirq+0x90/0xa0
Mar 26 02:48:49 localhost [<c010da6d>] do_IRQ+0xfd/0x130
Mar 26 02:48:49 localhost [<c010bd88>] common_interrupt+0x18/0x20

(the same exact messages are duplicated twice sequentially in /var/log/messages)

I did some poking around in the kernel-2.6.3 sources and found that pci_find_subsys has a NOTE by it saying the following:

* NOTE: Do not use this function anymore, use pci_get_subsys() instead, as
* the pci device returned by this function can disappear at any moment in
* time.
*/ (my own emphasis)

I checked the differences between the two functions (which are defined within a few lines of each other), and was baffled to find they differ by only 2 lines of code. i.e. THEY'RE ALMOST IDENTICAL. I'm hoping there is logic involved in this use of function differences, but you never know . Doesn't seem to make much sense to me, but then again I do not claim to be a kernel expert by any means .

So, is this a kernel issue, or an nVidia issue? It seems to be invoked in the nvidia kernel module, but the badness was found in the kernel!

I googled the line 'Badness in pci_find_subsys...' and came up with one thread on LKML with no solution in sight, so I gave up googling for it. I created this thread for help from you guys.

Would this problem be solved (in my narrow vision of sight) by changing the calls from pci_find_subsys() to calls to pci_get_subsys() in the nvidia kernel module? I'll do some kernel hacking and make pci_find_subsys() mimic exactly pci_get_subsys(), and post the results here when I find anything out.

Thanks in advance to those kind hackers who'll reply here . Sorry for the long post!

BTW, posting my hardware information here would be completely useless, as the card works fine under Windows on this box, and did work also under 2.4 kernels with the same driver (5336). I do realize that the 5336 version of the nVidia kernel JUST introduced 2.6 kernel support, so there are bound to be problems like this.
Jaymz031602 is offline   Reply With Quote
Old 03-27-04, 05:25 AM   #2
maro
Registered User
 
Join Date: Feb 2004
Location: Holy Roman Empire
Posts: 64
Default Re: Badness in pci_find_subsys at drivers/pci/search.c:167 -- kernel 2.6.3-gentoo-r1

Hi, if you search around in this forum you'll find this bug has been reported a number of times, including by myself. If I am allowed to recap:

Apparently the real problem hanging the machine has already happened by the time the routine is called that logs the syslog message. The driver does not log any further information that allows you to troubleshoot. Apparently there are many reasons for the problem, usually AGP related. A lot of experts give a lot of advice to try this that and the other to alleviate the problem (do a forum search for details). Usually nothing works (other than removing AGP support altogether). Some experts in this forum blame the linux AGP driver. The linux kernel developers will not work on it due to the closed source nature of the nvidia driver.

There.

Try sending a bug report to linux-bugs@nvidia.com.
maro is offline   Reply With Quote
Old 03-27-04, 06:58 AM   #3
energyman76b
Registered User
 
Join Date: Dec 2002
Location: Clausthal/Germany
Posts: 1,104
Default Re: Badness in pci_find_subsys at drivers/pci/search.c:167 -- kernel 2.6.3-gentoo-r1

Hi,
I get such problems, when my passive cooled fx 5200 gets too hot.
Maybe you should monitor the temps in your case?
Oh, and in my experience, the vanilla-kernel are more stable than the gentoo-patched ones.
energyman76b is offline   Reply With Quote
Old 03-29-04, 07:23 PM   #4
Jaymz031602
Registered User
 
Join Date: Mar 2004
Location: Illinois
Posts: 2
Default Re: Badness in pci_find_subsys at drivers/pci/search.c:167 -- kernel 2.6.3-gentoo-r1

Thanks for your replies, guys! Guess the only thing to do is to wait for the next NVIDIA release, and see if anything happens. Oh well.
Jaymz031602 is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Similar Threads
Thread Thread Starter Forum Replies Last Post
Random crashes, NVRM Xid messages Iesos NVIDIA Linux 90 10-04-12 03:27 AM
Corrupted display - 302.17 - Dell Precision T3500 (G98 [Quadro NVS 295]) gbailey NVIDIA Linux 1 06-27-12 10:24 AM
UEFI+Nvidia - NVRM: Your system is not currently configured to drive a VGA console... interzoneuk NVIDIA Linux 0 06-26-12 04:51 AM
xorg locks-up with newest nvidia drivers w/ vdpau. theroot NVIDIA Linux 1 06-24-12 11:04 AM
Crash when logout from X TGL NVIDIA Linux 10 09-13-02 08:22 PM

All times are GMT -5. The time now is 10:18 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.