View Single Post
Old 01-18-07, 02:51 PM   #9
baswesterbaan
Registered User
 
Join Date: Jan 2007
Posts: 11
Default Re: kernel page fault on modprobe of 9746's nvidia.ko

I rebooted the entire system.

I'll try to reproduce it now. First I make sure there isn't any remaining nvidia module that could be loaded by udevd on startup with a ` rm `find -name nvidia.ko`'.

Then I rebooted into the 2.6.18.6 kernel.

I prevent Xorg from starting (it would just complain about the missing module), and reinstall the nvidia-drivers from the Vt.

To make sure this isn't all a filesystem corruption bug I noted the md5sum:

Code:
henk ~ # md5sum /lib/modules/2.6.18.6/video/nvidia.ko 
c9ce219812c25a258ee2bed93222214e  /lib/modules/2.6.18.6/video/nvidia.ko
I `modprobe nvidia'-d, and it works just fine. I start Xorg, which all works fine.
I created a bug report log.

Then I shut down linux and the system itself completely and booted the computer again. And booted into the 2.6.18.6 kernel.

udevd automatically loads the nvidia kernel for me and during init I saw a kernel page fault scrolling by. It continued init properly. This time I again prevent Xorg from starting because it would just hang.

I check the nvidia module, and to my amazement:

Code:
henk ~ # md5sum /lib/modules/2.6.18.6/video/nvidia.ko 
a59547309e66c1e7d98ee6de0f9e26dc  /lib/modules/2.6.18.6/video/nvidia.ko
h
It has changed! I moved this `changed' module to nvidia.old.ko and re-install the drivers:

Code:
henk ~ # md5sum /lib/modules/2.6.18.6/video/*         
c9ce219812c25a258ee2bed93222214e  /lib/modules/2.6.18.6/video/nvidia.ko
a59547309e66c1e7d98ee6de0f9e26dc  /lib/modules/2.6.18.6/video/nvidia.old.ko
So what's the difference?

Code:
henk video # objdump -D nvidia.ko > nvidia.ko.dump
henk video # objdump -D nvidia.old.ko > nvidia.old.ko.dump 
henk video # diff nvidia.ko.dump nvidia.old.ko.dump 
2c2
< nvidia.ko:     file format elf64-x86-64
---
> nvidia.old.ko:     file format elf64-x86-64
Nothing that has been noticed by objdump.

Then I tried a diff on two hexdumps:

Code:
henk video # hexdump nvidia.ko > nvidia.ko.dump
henk video # hexdump nvidia.old.ko > nvidia.old.ko.dump
henk video # diff nvidia.ko.dump nvidia.old.ko.dump 
424513c424513
< 06f09f0 c77c 0001 0000 0000 0002 0000 16db 0000
---
> 06f09f0 c77c 0001 0000 0000 0002 0000 17db 0000
424528c424528
< 06f0ae0 c9d8 0001 0000 0000 0002 0000 16db 0000
---
> 06f0ae0 c9d8 0001 0000 0000 0002 0000 17db 0000
424530c424530
< 06f0b00 000b 0000 16db 0000 0000 0000 0000 0000
---
> 06f0b00 000b 0000 17db 0000 0000 0000 0000 0000
425382c425382
< 06f4040 0002 0000 16db 0000 fffb ffff ffff ffff
---
> 06f4040 0002 0000 17db 0000 fffb ffff ffff ffff
426295c426295
< 06f7950 fe5c 0002 0000 0000 0002 0000 36df 0000
---
> 06f7950 fe5c 0002 0000 0000 0002 0000 37df 0000
435540c435540
< 071bb20 0002 0000 16fd 0000 fffc ffff ffff ffff
---
> 071bb20 0002 0000 17fd 0000 fffc ffff ffff ffff
459937c459937
< 077aff0 042b 0000 0000 0000 0002 0000 36db 0000
---
> 077aff0 042b 0000 0000 0000 0002 0000 37db 0000
Is this just some housekeeping of linux on modules or is this the cause of the crash?

This is what I found in dmesg:

Code:
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  1.0-9746  Fri Dec 15 10:19:35 PST 2006
Unable to handle kernel paging request at ffffffff6d723938 RIP: 
 [<ffffffff880f7428>] :nvidia:nvidia_init_module+0x428/0x7aa
PGD 203027 PUD 0 
Oops: 0000 [1] PREEMPT SMP 
CPU 0 
Modules linked in: nvidia parport_pc parport ohci_hcd uhci_hcd eth1394 bcraid
Pid: 2568, comm: modprobe Tainted: P      2.6.18.6 #2
RIP: 0010:[<ffffffff880f7428>]  [<ffffffff880f7428>] :nvidia:nvidia_init_module+0x428/0
x7aa
RSP: 0018:ffff81003de9ff08  EFLAGS: 00010282
RAX: ffffffff88875240 RBX: 0000000000000000 RCX: ffff81003de9fe28
RDX: ffff81003dc80c80 RSI: ffffffff8854c7cf RDI: 0000033d00000000
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff81003dc80a40
R10: ffff81003dc80c80 R11: ffff8100021ed000 R12: 0000000000972eed
R13: 00002b80ee39c010 R14: 00000000005080e8 R15: ffff81003d6ad740
FS:  00002b80ee39aae0(0000) GS:ffffffff806c6000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff6d723938 CR3: 000000003d068000 CR4: 00000000000006e0
Process modprobe (pid: 2568, threadinfo ffff81003de9e000, task ffff810037de2740)
Stack:  ffffffff805c6de0 ffffffff80221455 ffffffff805c6de0 ffffffff88875240
 0000000000508120 0000000000972eed 00002b80ee39c010 00000000005080e8
 00002b80ee39c010 ffffffff8029a8c4 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff80221455>] __up_read+0x13/0x8a
 [<ffffffff8029a8c4>] sys_init_module+0xaf/0x228
 [<ffffffff8025d452>] system_call+0x7e/0x83


Code: 48 8b 15 09 c5 62 e5 48 89 42 48 83 3d b6 1c 78 00 00 0f 84 
RIP  [<ffffffff880f7428>] :nvidia:nvidia_init_module+0x428/0x7aa
 RSP <ffff81003de9ff08>
CR2: ffffffff6d723938
This one looks suprisingly similar to the other ones where we got this f..f6.. page faulting.

I made a second nv bug report. This will be the 'after' one.
Attached Files
File Type: bz2 nvidia-bug-report-before.log.bz2 (17.4 KB, 82 views)
File Type: bz2 nvidia-bug-report-after.log.bz2 (17.4 KB, 85 views)
baswesterbaan is offline   Reply With Quote