Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 03-06-04, 11:57 AM   #37
ByteEnable
Registered User
 
Join Date: Jun 2003
Posts: 24
Default Re: Why flame?

Quote:
Originally posted by tamran
ByteEnable,
Zander has provided nothing but insight as far as I'm concerned. I have read his posts in other topics in this forum and whether or not he knows what he's talking about (I think the former), his suggestions were an excellent basis for helping a few of us troubleshoot and isolate the problem.
Tamran
I agree Zander has provided some helpful troubleshooting tips. Thats it. I was disturbed by his "It works on Windows" remarks, then pretty much blames the hardware or Linux. I'm not flaming, I'm pointing out the flaws in his logic, his untruth's and my personal experience working with NVidia FAE's in a Hardware Design Engineer (motherboard designer) capacity when I worked for Dell.

Byte
ByteEnable is offline   Reply With Quote
Old 03-07-04, 02:10 AM   #38
Andy Mecham
l33t master
 
Join Date: Jul 2002
Location: Santa Clara, CA
Posts: 1,163
Default

The e-mail for reporting problems is featured on the driver download page, several times in this forum, in the README, in the open files, and in the installer, both binary (as an error message) and in the source.

If you have a problem, send a *full*, detailed description to linux-bugs@nvidia.com.

--andy
__________________
Andy Mecham
NVIDIA Corporation
Andy Mecham is offline   Reply With Quote
Old 03-07-04, 10:14 AM   #39
tamran
Registered User
 
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
Default

Thanks Andy,

I should have reported my request for the nvidia agp driver there instead of here. I'll do that now. As far as the crashes etc. I just needed to get more clarification on the issue before I start barking at NVIDIA for a fix when I don't have enough information myself. I think I've probably got enough for a decent report now.
tamran is offline   Reply With Quote
Old 03-07-04, 02:17 PM   #40
maro
Registered User
 
Join Date: Feb 2004
Location: Holy Roman Empire
Posts: 64
Default

thanks andy, I will also send a report shortly.
maro is offline   Reply With Quote
Old 03-08-04, 07:33 PM   #41
helamonster
Registered User
 
Join Date: Jan 2004
Posts: 4
Send a message via ICQ to helamonster Send a message via AIM to helamonster Send a message via Yahoo to helamonster
Default reproducable method of crashing X with nvidia driver bug

I have recently had problems with X locking up when switching from other framebuffer vitrual terminals back to the X vt. I have found a way that crashes X every time that might be helpful in debugging this problem.

First of all, I only assume that this problem is related becuase I get the:
Code:
[kernel] Badness in pci_find_subsys at drivers/pci/search.c:167
error spit to my logs.

I recently started using fbi (framebuffer image viewer) which is pretty neat, but I soon noticed that it seems to encourages the nvidia driver to become defunct.
Here is the essence of what I do and what happens:

1. I have an X session running in virtual terminal 7 (vt7)
2. I switch to another framebuffer vt (say vt6) with Ctrl+Alt+F6
3. I run fbi in vt6 to view an image file and then exit fbi (or leave it open)
4. I attempt to switch back to vt7 (from vt6) with Ctrl+Alt+F7
5. I get a blank black screen (with some garbled artifacts along the top) and X locks along with the keyboard and mouse (100% CPU by the X process).

Of course, I just login remotly from another machine and killall -9 X
and everything seems OK after that.

Added:
After you do the "killall -9 X" from the remote machine, you can also do "chvt 1" to switch the local machine to a text console vt (which you should be able to read). Of course, if your keyboard is not still locked up you can also use Ctrl+Alt+Fn.

I noticed that at the same time I attempt to switch to vt7 I get
that pci_find_subsys error in my system log 9 times in a row.

FYI, I am running:
Linux Kernel 2.6.3
XFree86 v4.3.0 ( http://xfree86.org/ )
fbi v1.3.1 ( http://bytesex.org/fbi.html )
nvidia-kernel/nvidia-glx v5336 ( http://nvidia.com/ )

Note:
This problem does NOT occur when using the open source nvidia driver
module ("nv") instead of the "nvidia" module.

I hope this can help someone find the root cause of the problem. I don't mean to blame anyone; I just report what I see.

Last edited by helamonster; 03-11-04 at 01:01 AM.
helamonster is offline   Reply With Quote
Old 03-10-04, 09:48 PM   #42
tamran
Registered User
 
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
Default More news

OK, I just re-did my gentoo system with a fresh install, and again I got the lockup. But this time I was able to ssh to it and kill X remotely (it was taking all the cpu). Sure enough, as others have stated, the text was garbled. But "reset" worked (I had to do it blindly). I restarted X and continued running ... thank heaven for screen.

I am strongly convinced at this point that the finger-pointing belongs to the Linux agpgart driver. I emailed my findings to nvidia (and to request NVIDIA agp driver support for AMD64) but have not yet gotten a response.

Regards,

Tamran
tamran is offline   Reply With Quote
Old 03-11-04, 01:33 AM   #43
bdw
Registered User
 
Join Date: May 2003
Posts: 13
Default

I'm also getting the same error, with 2.6.4:

Mar 9 14:31:19 localhost kernel: Badness in pci_find_subsys at drivers/pci/search.c:167
Mar 9 14:31:19 localhost kernel: Call Trace:
Mar 9 14:31:19 localhost kernel: [<c01c12a8>] pci_find_subsys+0xe8/0xf0
Mar 9 14:31:19 localhost kernel: [<c01c12df>] pci_find_device+0x2f/0x40
Mar 9 14:31:19 localhost kernel: [<c01c10e8>] pci_find_slot+0x28/0x50
Mar 9 14:31:19 localhost kernel: [<f9d61258>] os_pci_init_handle+0x39/0x68 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9bf585f>] _nv001243rm+0x1f/0x24 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9d3c115>] _nv000816rm+0x2f5/0x384 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9ca492c>] _nv003801rm+0xd8/0x100 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9d3bc4f>] _nv000809rm+0x2f/0x34 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9cd3b48>] _nv003606rm+0xe4/0x114 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9cd36b8>] _nv003564rm+0x688/0x908 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9c0e267>] _nv004046rm+0x3a3/0x3b0 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9d0fb03>] _nv001476rm+0x1d3/0x45c [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9bf839a>] _nv000896rm+0x4a/0x64 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9bf9bb4>] rm_isr_bh+0xc/0x10 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9d5eab1>] nv_kern_isr_bh+0xf/0x13 [nvidia]
Mar 9 14:31:19 localhost kernel: [<c0121bf6>] tasklet_action+0x46/0x70
Mar 9 14:31:19 localhost kernel: [<c0121a10>] do_softirq+0x90/0xa0
Mar 9 14:31:19 localhost kernel: [<c010b02d>] do_IRQ+0xfd/0x130
Mar 9 14:31:19 localhost kernel: [<c01094f4>] common_interrupt+0x18/0x20
Mar 9 14:31:19 localhost kernel: [<c02c7de0>] sock_poll+0x0/0x40
Mar 9 14:31:19 localhost kernel: [<c0168023>] do_select+0x243/0x2d0
Mar 9 14:31:19 localhost kernel: [<c0167c30>] __pollwait+0x0/0xd0
Mar 9 14:31:19 localhost kernel: [<c016839f>] sys_select+0x2bf/0x4c0
Mar 9 14:31:19 localhost kernel: [<c01551b0>] vfs_read+0xf0/0x130
Mar 9 14:31:19 localhost kernel: [<c0109335>] sysenter_past_esp+0x52/0x71
Mar 9 14:31:19 localhost kernel:
Mar 9 14:31:19 localhost kernel: Badness in pci_find_subsys at drivers/pci/search.c:167
Mar 9 14:31:19 localhost kernel: Call Trace:
Mar 9 14:31:19 localhost kernel: [<c01c12a8>] pci_find_subsys+0xe8/0xf0
Mar 9 14:31:19 localhost kernel: [<c01c12df>] pci_find_device+0x2f/0x40
Mar 9 14:31:19 localhost kernel: [<c01c10e8>] pci_find_slot+0x28/0x50
Mar 9 14:31:19 localhost kernel: [<f9d61258>] os_pci_init_handle+0x39/0x68 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9bf585f>] _nv001243rm+0x1f/0x24 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9ca6a5d>] _nv003797rm+0xa9/0x128 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9d134a1>] _nv001490rm+0x55/0xe4 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9d3c154>] _nv000816rm+0x334/0x384 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9ca492c>] _nv003801rm+0xd8/0x100 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9d3bc4f>] _nv000809rm+0x2f/0x34 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9cd3b48>] _nv003606rm+0xe4/0x114 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9cd36b8>] _nv003564rm+0x688/0x908 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9c0e267>] _nv004046rm+0x3a3/0x3b0 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9d0fb03>] _nv001476rm+0x1d3/0x45c [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9bf839a>] _nv000896rm+0x4a/0x64 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9bf9bb4>] rm_isr_bh+0xc/0x10 [nvidia]
Mar 9 14:31:19 localhost kernel: [<f9d5eab1>] nv_kern_isr_bh+0xf/0x13 [nvidia]
Mar 9 14:31:19 localhost kernel: [<c0121bf6>] tasklet_action+0x46/0x70
Mar 9 14:31:19 localhost kernel: [<c0121a10>] do_softirq+0x90/0xa0
Mar 9 14:31:19 localhost kernel: [<c010b02d>] do_IRQ+0xfd/0x130
Mar 9 14:31:19 localhost kernel: [<c01094f4>] common_interrupt+0x18/0x20
Mar 9 14:31:19 localhost kernel: [<c02c7de0>] sock_poll+0x0/0x40
Mar 9 14:31:19 localhost kernel: [<c0168023>] do_select+0x243/0x2d0
Mar 9 14:31:19 localhost kernel: [<c0167c30>] __pollwait+0x0/0xd0
Mar 9 14:31:19 localhost kernel: [<c016839f>] sys_select+0x2bf/0x4c0
Mar 9 14:31:19 localhost kernel: [<c01551b0>] vfs_read+0xf0/0x130
Mar 9 14:31:19 localhost kernel: [<c0109335>] sysenter_past_esp+0x52/0x71

I have VIA AGPGART compiled in, and did a status on the AGP:

$ cd /proc/driver/nvidia/agp
$ cat card
Fast Writes: Supported
SBA: Not Supported
AGP Rates: 4x 2x 1x
Registers: 0x1f000017:0x1f000104

$ cat host-bridge
Host Bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133]
Fast Writes: Supported
SBA: Supported
AGP Rates: 4x 2x 1x
Registers: 0x1f000217:0x00000104

$cat status
Status: Enabled
Driver: AGPGART
AGP Rate: 4x
Fast Writes: Disabled
SBA: Disabled

I set the NvAGP to 0, and there are no problems at all. Of course, the display is a tad sluggish, but OpenGL works OK.

Xine works fine with DVDs, but not mplayer. The latter won't play all DVDs with the AGP disabled.

I've read this thread, and it seems that the causes are legion. Disabling ACPI doesn't work, but I need to disable the vesaFB and see what happens. there.

--Brian
bdw is offline   Reply With Quote
Old 03-11-04, 01:39 AM   #44
Andy Mecham
l33t master
 
Join Date: Jul 2002
Location: Santa Clara, CA
Posts: 1,163
Default

Quote:
and to request NVIDIA agp driver support for AMD64
This is not possible. AGPGART provides IOMMU functionality for AMD64, and this functionality is needed at boot time. You'll notice that you can't compile AGPGART as a module for AMD64, which implies that you can't use AGP unless support is in the kernel. As NVAGP is only initialized when X starts, it simply isn't an option.

--andy
__________________
Andy Mecham
NVIDIA Corporation
Andy Mecham is offline   Reply With Quote

Old 03-11-04, 07:44 AM   #45
tamran
Registered User
 
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
Default Actually, it _IS_ possible

Andy,

I think the following is the assumption by most:

Quote:
This is not possible. AGPGART provides IOMMU functionality for AMD64, and this functionality is needed at boot time. You'll notice that you can't compile AGPGART as a module for AMD64, which implies that you can't use AGP unless support is in the kernel. As NVAGP is only initialized when X starts, it simply isn't an option.
However, it is not entirely accurate. If you disable IOMMU support you can select and unselect the agpgart module in the kernel config. (I have a dual opteron and my system works fine without IOMMU) I've tried my sytem with and without this configuration. Currently (after yesterdays "crash" actually) I disabled IOMMU support and now have agpgart loaded as a module. IOMMU support is only "necessary" if you have >4gb of RAM.

Given that, I'm sure those who don't mind disabling IOMMU support (which I'm assuming would be anyone with less than 4gb of ram) would allow for loading of the Nvidia agp driver on AMD64 systems. Of course, maybe I'm totally missing something?

Regards,

Tamran
tamran is offline   Reply With Quote
Old 03-14-04, 04:05 PM   #46
maro
Registered User
 
Join Date: Feb 2004
Location: Holy Roman Empire
Posts: 64
Default

just out of interest, has anybody had any reply from linux-bugs@nvidia.com ?

btw, today I have observed the problem again with kernel 2.6.4, nvidia driver 1.0-5336 and an uptime of nearly 3 days. Back to the builtin XFree driver...
maro is offline   Reply With Quote
Old 03-14-04, 06:23 PM   #47
tamran
Registered User
 
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
Default

Nope ... no reply yet

Problem still exists here too. It doesn't seem to matter what changes I make.

Tamran
tamran is offline   Reply With Quote
Old 03-14-04, 06:25 PM   #48
SuLinUX
 
SuLinUX's Avatar
 
Join Date: Sep 2003
Location: UK
Posts: 847
Default

I have not seen this problem refected in a forum I modurate and I dont have this problem either, seems to me that its a distro/nvidia driver rather than the nvidia driver itself.

Also looking back through my /var/log/messagers I can see no sign of thoughs errors at all.
__________________
AthlonXP 2600+ / nForce2 Asus A7N8X-X / PNY GeForce FX5900 Ultra / 1024Mb Samsung Ram /nForce Sound / Hansol 920D Plus 19" monitor / Lite-On 32x12x40 / 2x Maxtor HD 40Gb/80Gb / nVidia 7174 driver / Gnome 2.10.1 / Kernel 2.6.11.9 / Slackware 10.0
SuLinUX is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 01:45 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.