Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 01-16-07, 10:33 AM   #1
baswesterbaan
Registered User
 
Join Date: Jan 2007
Posts: 11
Default kernel page fault on modprobe of 9746's nvidia.ko

When I modprobe the nvidia module it kernel-page-faults on ffffffff6d723538 in os_get_cpu_frequency+0xb. From dmesg:

Code:
Unable to handle kernel paging request at ffffffff6d723538 RIP: 
 [<ffffffff6d723538>]
PGD 203027 PUD 0 
Oops: 0010 [1] PREEMPT SMP 
CPU 0 
Modules linked in: nvidia(P) bcraid
Pid: 7028, comm: modprobe Tainted: P      2.6.19-gentoo-r4 #1
RIP: 0010:[<ffffffff6d723538>]  [<ffffffff6d723538>]
RSP: 0018:ffff81003b25de40  EFLAGS: 00010296
RAX: 0000000000181b00 RBX: ffff81003b25de98 RCX: 00000000078bfbff
RDX: 0000000000181100 RSI: 0000000000000001 RDI: ffff81003ddef000
RBP: ffff81003ddef000 R08: ffff81003b25de8c R09: ffff81003b25de88
R10: 0000000000000002 R11: 0000000000000001 R12: ffffffff88848380
R13: 00002ae561bda010 R14: 00000000005080f8 R15: 00002ae561bda010
FS:  00002ae561bd8ae0(0000) GS:ffffffff80691000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff6d723538 CR3: 000000003bdf7000 CR4: 00000000000006e0
Process modprobe (pid: 7028, threadinfo ffff81003b25c000, task ffff81003df1f880)
Stack:  ffffffff88426e17 0000000000000286 ffff81003b25de98 ffff81003ddef000
 ffffffff88119bbd ffff81003b25de98 ffffffff88104316 ffff81003b25de90
 0000000000003200 0000080000000000 00000f5a078bfbff 69746e6568747541
Call Trace:
 [<ffffffff88426e17>] :nvidia:os_get_cpu_frequency+0xb/0x44
 [<ffffffff88119bbd>] :nvidia:_nv003359rm+0x9/0xe
 [<ffffffff88104316>] :nvidia:_nv002562rm+0x1f6/0x362
 [<ffffffff881079ce>] :nvidia:_nv002556rm+0x80/0xa6
 [<ffffffff88122751>] :nvidia:rm_init_rm+0x9/0xe
 [<ffffffff8884e0e3>] :nvidia:nvidia_init_module+0xe3/0x7aa
 [<ffffffff802215cf>] __up_read+0x13/0x8a
 [<ffffffff8029aa76>] sys_init_module+0xaf/0x227
 [<ffffffff8025ba1e>] system_call+0x7e/0x83


Code:  Bad RIP value.
RIP  [<ffffffff6d723538>]
 RSP <ffff81003b25de40>
CR2: ffffffff6d723538
I do experience similar problems with 9742, but not with 9631 (which randomly crashes though, that's the reason I upgraded).

I cannot remove the module with rmmod, it seems to be stuck initializing due to this page-fault. When letting Xorg load the nvidia module, it'll crash in the same way. (Although the input freezes I ascertained this via ssh)
Attached Files
File Type: log nvidia-bug-report.log (55.4 KB, 75 views)
baswesterbaan is offline   Reply With Quote
Old 01-16-07, 12:10 PM   #2
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

I have a few questions:
0) Is this problem specific to your 2.6.19-gentoo-r4 kernel? Does it reproduce with an older kernel and/or a kernel.org kernel?
1) What kind of motherboard & graphics card are you using?
2) Have you verified that you're using the latest motherboard BIOS?

Thanks,
Lonni
netllama is offline   Reply With Quote
Old 01-16-07, 02:23 PM   #3
baswesterbaan
Registered User
 
Join Date: Jan 2007
Posts: 11
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

Quote:
Originally Posted by netllama
I have a few questions:
0) Is this problem specific to your 2.6.19-gentoo-r4 kernel? Does it reproduce with an older kernel and/or a kernel.org kernel?
Doesn't seem so. 2.6.18 produces the same error.

Quote:
1) What kind of motherboard & graphics card are you using?
Tyan K8W Dual Opteron S2885ANRF, with two Opteron 244's (MB is 8X AGP). The card is a Club3d Geforce 7600 GT for AGP with DVI and VGA.

Quote:
2) Have you verified that you're using the latest motherboard BIOS?
Yes.


Just to reiterate: the startup and most of the stuff works fine with the older 9631 nvidia-drivers, but these tend to randomly freeze completely after about 20min in a wm like compiz.
baswesterbaan is offline   Reply With Quote
Old 01-16-07, 06:52 PM   #4
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

I'm not able to reproduce this problem with a GeForce 7600 in a Tyan 2885 motherboard with 1.0-9746. X starts up fine (which would accomplish much more than just modprobing nvidia). My only guess at this point is that your crash is something specific to the Gentoo environment that you've running.
netllama is offline   Reply With Quote
Old 01-17-07, 07:51 AM   #5
baswesterbaan
Registered User
 
Join Date: Jan 2007
Posts: 11
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

Quote:
Originally Posted by netllama
I'm not able to reproduce this problem with a GeForce 7600 in a Tyan 2885 motherboard with 1.0-9746. X starts up fine (which would accomplish much more than just modprobing nvidia). My only guess at this point is that your crash is something specific to the Gentoo environment that you've running.
Which kernel did you use? I might try yours and see if it makes a difference.

Edit Oh, and maybe it might be in the settings of the kernel too.
baswesterbaan is offline   Reply With Quote
Old 01-17-07, 11:01 AM   #6
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

I was using the latest Fedora Core 6 kernel, which is based off of 2.6.18.x. You can get its source & configuration here:
http://mirrors.kernel.org/fedora/cor...69.fc6.src.rpm
netllama is offline   Reply With Quote
Old 01-18-07, 06:47 AM   #7
baswesterbaan
Registered User
 
Join Date: Jan 2007
Posts: 11
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

Quote:
Originally Posted by netllama
I was using the latest Fedora Core 6 kernel, which is based off of 2.6.18.x. You can get its source & configuration here:
http://mirrors.kernel.org/fedora/cor...69.fc6.src.rpm
I just tried a vanilla 2.6.18.6 kernel.

Just after I compiled the 9746 drivers it worked fine. Then I restarted and modprobe-d the nvidia module. This, contrary to the newer kernel, did work.

However, when I started Xorg, it hangs. I ssh-d into the machine and found this in dmesg:

Code:
Unable to handle kernel paging request at ffffffff6d723638 RIP: 
 [<ffffffff6d723638>]
PGD 203027 PUD 0 
Oops: 0010 [1] PREEMPT SMP 
CPU 0 
Modules linked in: stir4200 usbhid parport_pc nvidia parport uhci_hcd ohci_hcd eth1394 bcraid
Pid: 9956, comm: Xorg Tainted: P      2.6.18.6 #1
RIP: 0010:[<ffffffff6d723638>]  [<ffffffff6d723638>]
RSP: 0018:ffff81007f1bbba0  EFLAGS: 00010202
RAX: ffff810037868000 RBX: ffff810037868000 RCX: ffff810037489110
RDX: ffff810037489110 RSI: ffff810037489110 RDI: ffffffff8886c200
RBP: ffff8100384ca8c0 R08: ffff810037489110 R09: 0000000000000001
R10: 0000000000000000 R11: ffffffff80492204 R12: ffff810037489110
R13: ffff810037489110 R14: 0000000000000001 R15: 0000000000000000
FS:  00002b686123fae0(0000) GS:ffffffff806c6000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff6d723638 CR3: 000000007ed45000 CR4: 00000000000006e0
Process Xorg (pid: 9956, threadinfo ffff81007f1ba000, task ffff81007f331180)
Stack:  ffffffff8813ae60 ffff810037489110 ffffffff8838f0a2 00000000bfef0020
 ffff810037458800 0000000000000000 00000000bfef0100 0000000000000000
 ffff81003c917400 ffff81003d05b000 ffffffff88112ece 00000000bfef0100
Call Trace:
 [<ffffffff8813ae60>] :nvidia:_nv003253rm+0x34/0x3a
 [<ffffffff8838f0a2>] :nvidia:_nv004835rm+0x74/0xd8
 [<ffffffff88112ece>] :nvidia:_nv002598rm+0x6e/0x94
 [<ffffffff88112cdb>] :nvidia:_nv002595rm+0xcd/0xee
 [<ffffffff882a1742>] :nvidia:_nv009103rm+0x8c/0xae
 [<ffffffff8811dfee>] :nvidia:_nv002597rm+0x1da/0x2d2
 [<ffffffff804922bf>] pci_conf1_read+0xbb/0xc6
 [<ffffffff8811dd2d>] :nvidia:_nv002600rm+0xef/0x1d6
 [<ffffffff8811da06>] :nvidia:_nv002603rm+0x42/0x27a
 [<ffffffff88143883>] :nvidia:rm_set_interrupts+0x11f/0x136
 [<ffffffff8844834e>] :nvidia:os_acquire_sema+0x5f/0x77
 [<ffffffff88119592>] :nvidia:_nv004373rm+0x70/0xaa
 [<ffffffff88146461>] :nvidia:_nv002552rm+0x1a9/0x63a
 [<ffffffff88143b4d>] :nvidia:rm_ioctl+0x9/0xe
 [<ffffffff884450da>] :nvidia:nv_kern_ioctl+0x35a/0x3eb
 [<ffffffff884451aa>] :nvidia:nv_kern_unlocked_ioctl+0x1c/0x23
 [<ffffffff8024152d>] do_ioctl+0x21/0x6b
 [<ffffffff8022fa49>] vfs_ioctl+0x252/0x26b
 [<ffffffff8022fa83>] __up_write+0x21/0x10d
 [<ffffffff8024bf49>] sys_ioctl+0x3c/0x5c
 [<ffffffff8025d452>] system_call+0x7e/0x83


Code:  Bad RIP value.
RIP  [<ffffffff6d723638>]
 RSP <ffff81007f1bbba0>
CR2: ffffffff6d723638
It seems like about the same problem. This time the invalid isn't ffffffff6d723538 but ffffffff6d723638.


I attached the .config of the 2.6.18.6 kernel I used.
Attached Files
File Type: txt config-2.6.18.6.txt (36.7 KB, 88 views)
baswesterbaan is offline   Reply With Quote
Old 01-18-07, 11:39 AM   #8
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

You stated that you "restarted". What did you restart?

Does reinstalling 1.0-9746 have any impact?

Also, please post a new bug report.

thanks,
Lonni
netllama is offline   Reply With Quote

Old 01-18-07, 02:51 PM   #9
baswesterbaan
Registered User
 
Join Date: Jan 2007
Posts: 11
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

I rebooted the entire system.

I'll try to reproduce it now. First I make sure there isn't any remaining nvidia module that could be loaded by udevd on startup with a ` rm `find -name nvidia.ko`'.

Then I rebooted into the 2.6.18.6 kernel.

I prevent Xorg from starting (it would just complain about the missing module), and reinstall the nvidia-drivers from the Vt.

To make sure this isn't all a filesystem corruption bug I noted the md5sum:

Code:
henk ~ # md5sum /lib/modules/2.6.18.6/video/nvidia.ko 
c9ce219812c25a258ee2bed93222214e  /lib/modules/2.6.18.6/video/nvidia.ko
I `modprobe nvidia'-d, and it works just fine. I start Xorg, which all works fine.
I created a bug report log.

Then I shut down linux and the system itself completely and booted the computer again. And booted into the 2.6.18.6 kernel.

udevd automatically loads the nvidia kernel for me and during init I saw a kernel page fault scrolling by. It continued init properly. This time I again prevent Xorg from starting because it would just hang.

I check the nvidia module, and to my amazement:

Code:
henk ~ # md5sum /lib/modules/2.6.18.6/video/nvidia.ko 
a59547309e66c1e7d98ee6de0f9e26dc  /lib/modules/2.6.18.6/video/nvidia.ko
h
It has changed! I moved this `changed' module to nvidia.old.ko and re-install the drivers:

Code:
henk ~ # md5sum /lib/modules/2.6.18.6/video/*         
c9ce219812c25a258ee2bed93222214e  /lib/modules/2.6.18.6/video/nvidia.ko
a59547309e66c1e7d98ee6de0f9e26dc  /lib/modules/2.6.18.6/video/nvidia.old.ko
So what's the difference?

Code:
henk video # objdump -D nvidia.ko > nvidia.ko.dump
henk video # objdump -D nvidia.old.ko > nvidia.old.ko.dump 
henk video # diff nvidia.ko.dump nvidia.old.ko.dump 
2c2
< nvidia.ko:     file format elf64-x86-64
---
> nvidia.old.ko:     file format elf64-x86-64
Nothing that has been noticed by objdump.

Then I tried a diff on two hexdumps:

Code:
henk video # hexdump nvidia.ko > nvidia.ko.dump
henk video # hexdump nvidia.old.ko > nvidia.old.ko.dump
henk video # diff nvidia.ko.dump nvidia.old.ko.dump 
424513c424513
< 06f09f0 c77c 0001 0000 0000 0002 0000 16db 0000
---
> 06f09f0 c77c 0001 0000 0000 0002 0000 17db 0000
424528c424528
< 06f0ae0 c9d8 0001 0000 0000 0002 0000 16db 0000
---
> 06f0ae0 c9d8 0001 0000 0000 0002 0000 17db 0000
424530c424530
< 06f0b00 000b 0000 16db 0000 0000 0000 0000 0000
---
> 06f0b00 000b 0000 17db 0000 0000 0000 0000 0000
425382c425382
< 06f4040 0002 0000 16db 0000 fffb ffff ffff ffff
---
> 06f4040 0002 0000 17db 0000 fffb ffff ffff ffff
426295c426295
< 06f7950 fe5c 0002 0000 0000 0002 0000 36df 0000
---
> 06f7950 fe5c 0002 0000 0000 0002 0000 37df 0000
435540c435540
< 071bb20 0002 0000 16fd 0000 fffc ffff ffff ffff
---
> 071bb20 0002 0000 17fd 0000 fffc ffff ffff ffff
459937c459937
< 077aff0 042b 0000 0000 0000 0002 0000 36db 0000
---
> 077aff0 042b 0000 0000 0000 0002 0000 37db 0000
Is this just some housekeeping of linux on modules or is this the cause of the crash?

This is what I found in dmesg:

Code:
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  1.0-9746  Fri Dec 15 10:19:35 PST 2006
Unable to handle kernel paging request at ffffffff6d723938 RIP: 
 [<ffffffff880f7428>] :nvidia:nvidia_init_module+0x428/0x7aa
PGD 203027 PUD 0 
Oops: 0000 [1] PREEMPT SMP 
CPU 0 
Modules linked in: nvidia parport_pc parport ohci_hcd uhci_hcd eth1394 bcraid
Pid: 2568, comm: modprobe Tainted: P      2.6.18.6 #2
RIP: 0010:[<ffffffff880f7428>]  [<ffffffff880f7428>] :nvidia:nvidia_init_module+0x428/0
x7aa
RSP: 0018:ffff81003de9ff08  EFLAGS: 00010282
RAX: ffffffff88875240 RBX: 0000000000000000 RCX: ffff81003de9fe28
RDX: ffff81003dc80c80 RSI: ffffffff8854c7cf RDI: 0000033d00000000
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff81003dc80a40
R10: ffff81003dc80c80 R11: ffff8100021ed000 R12: 0000000000972eed
R13: 00002b80ee39c010 R14: 00000000005080e8 R15: ffff81003d6ad740
FS:  00002b80ee39aae0(0000) GS:ffffffff806c6000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff6d723938 CR3: 000000003d068000 CR4: 00000000000006e0
Process modprobe (pid: 2568, threadinfo ffff81003de9e000, task ffff810037de2740)
Stack:  ffffffff805c6de0 ffffffff80221455 ffffffff805c6de0 ffffffff88875240
 0000000000508120 0000000000972eed 00002b80ee39c010 00000000005080e8
 00002b80ee39c010 ffffffff8029a8c4 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff80221455>] __up_read+0x13/0x8a
 [<ffffffff8029a8c4>] sys_init_module+0xaf/0x228
 [<ffffffff8025d452>] system_call+0x7e/0x83


Code: 48 8b 15 09 c5 62 e5 48 89 42 48 83 3d b6 1c 78 00 00 0f 84 
RIP  [<ffffffff880f7428>] :nvidia:nvidia_init_module+0x428/0x7aa
 RSP <ffff81003de9ff08>
CR2: ffffffff6d723938
This one looks suprisingly similar to the other ones where we got this f..f6.. page faulting.

I made a second nv bug report. This will be the 'after' one.
Attached Files
File Type: bz2 nvidia-bug-report-before.log.bz2 (17.4 KB, 77 views)
File Type: bz2 nvidia-bug-report-after.log.bz2 (17.4 KB, 81 views)
baswesterbaan is offline   Reply With Quote
Old 01-18-07, 04:04 PM   #10
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

If the md5sum of the nvidia kernel module is changing during reboots, then that seems like an OS or hardware problem. There's certainly nothing in the nvidia driver itself that would cause such behavior.
netllama is offline   Reply With Quote
Old 01-21-07, 07:03 AM   #11
baswesterbaan
Registered User
 
Join Date: Jan 2007
Posts: 11
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

Quote:
Originally Posted by netllama
If the md5sum of the nvidia kernel module is changing during reboots, then that seems like an OS or hardware problem. There's certainly nothing in the nvidia driver itself that would cause such behavior.
Ok, I looked into that a bit more. I found out that nvidia.ko corrupts, not just on the place it should be as module, but practically anywhere. I believe this has to do with a rare bug in my raid drivers.

At this moment I'm running my linux system from a normal IDE harddisk (I haven't even loaded the raid drivers). Now installing the latest drivers works fine, restarting too.

However, the random crashes I experienced (which was the original reason to upgrade to the latest drivers) still persist. Because the raid drivers aren't loaded, they can't be the problem.

After a while the whole system freezes. sysrq doesn't respond. sshd doesn't respond. My two monitors still display everything that I was doing, but freezed. No artifacts though.

I've attached the new bug report.
Attached Files
File Type: log nvidia-bug-report.log (72.4 KB, 70 views)
baswesterbaan is offline   Reply With Quote
Old 01-21-07, 07:06 AM   #12
baswesterbaan
Registered User
 
Join Date: Jan 2007
Posts: 11
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

Oh, maybe useful to note: there aren't any records of the crash in /var/log/messages or in any other log file after reboot. I guess everything freezes, which prevents logging.
baswesterbaan is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Similar Threads
Thread Thread Starter Forum Replies Last Post
Random crashes, NVRM Xid messages Iesos NVIDIA Linux 90 10-04-12 03:27 AM
Corrupted display - 302.17 - Dell Precision T3500 (G98 [Quadro NVS 295]) gbailey NVIDIA Linux 1 06-27-12 10:24 AM
UEFI+Nvidia - NVRM: Your system is not currently configured to drive a VGA console... interzoneuk NVIDIA Linux 0 06-26-12 04:51 AM
xorg locks-up with newest nvidia drivers w/ vdpau. theroot NVIDIA Linux 1 06-24-12 11:04 AM
Crash when logout from X TGL NVIDIA Linux 10 09-13-02 08:22 PM

All times are GMT -5. The time now is 04:44 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.