![]() |
|
|
#1 | |
|
Registered User
Join Date: Apr 2003
Posts: 38
|
Yes, I *know* it's a -mm kernel. and what that means.
Things worked well under 2.6.21-rc5-mm2.Having said that, I encountered problems on a Dell Latitude D820. lspci says: 01:00.0 VGA compatible controller: nVidia Corporation G72M [Quadro NVS 110M/GeForce Go 7300] (rev a1) The symptom is a hard system crash when starting the X server - screen black, no sysrq works, using netconsole to log to another box doesn't catch anything, a power cycle is needed to recover. Bisection shows the offending patch to be: x86_64-mm-__pa-and-__pa_symbol-address-space-separation.patch. This changes the function __pa() and provides a __pa_symbol() variant, which apparently tickles the __pa() usage in nv_get_kern_phys_address in nv.c (it *might* be one of the 2 uses in nv-linux.h, except those are apparently DMA-related, which means the new __pa_symbol() isn't applicable). Unfortunately, my kernel-fu isn't quite strong enough to figure out what a proper fix is... |
|
|
|
|
|
|
#2 |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,661
|
Without having looked at the problem in any detail, my guess is that the patch does not consider kernel symbols tagged with __init, which are freed after initialization. If the NVIDIA graphics driver gets a page from this "pool", it will still call __pa() and not __pa_symbol(). It is unclear to me if drivers are supposed to account for this themselves; it would seem that retaining __pa()'s traditional behavior and adding optimized __pa_direct() and __pa_symbol() versions would be safer.
|
|
|
|
| Sponsored Ads - Guests Only | |
|
|
|
|
#3 | |
|
Registered User
Join Date: Dec 2004
Posts: 87
|
Quote:
![]() zander: Could you post that on LKML, perchance? |
|
|
|
|
|
|
#4 | |
|
Registered User
Join Date: Apr 2003
Posts: 38
|
Quote:
![]() Fortunately, there's only two or 3 places the nvidia driver plays with __pa, so it *should* be feasible to produce a patch. If properly coded, it should even work OK on older kernels (though I admit not being sure when __pa_symbol() got added). Unfortunately, the patch *does* intentionally alter the default behavior of __pa(), which is why the problem happens. It actually simplifies the work involved in __pa() for the usual case, but with the advent of relocatable kernel support, you need to use __pa_symbol() - if you use __pa() instead, you'll basically get the unrelocated address which will almost certainly miss. Damn, that reminds me - I should test if it works if the kernel is built with CONFIG_RELOCATE=n. If it does, that will tell us a bunch about what needs to be fixed. ![]() |
|
|
|
|
|
|
#5 |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,661
|
Assuming the problem is what I suspect it is, I don't think the NVIDIA Linux graphics driver should have to know that whatever piece of memory it allocated and needs to get the physical address of used to be a symbol. In any case, I'll try to take a closer look when I get a chance to.
|
|
|
|
|
|
#6 |
|
Registered User
Join Date: Apr 2003
Posts: 38
|
Following up - I rebuilt the kernel with CONFIG_RELOCATABLE=n, and the X server will now start with the stock 9755 code. So obviously RELOCATABLE and its interaction with the patch I identified is an issue.
Also, although it *starts*, it consistently takes a nasty panic on the way down: [ 151.825464] Unable to handle kernel paging request at ffff81006a77fed0 RIP: [ 151.825475] [<ffffffff80249480>] hrtimer_run_queues+0xde/0x188 [ 151.825487] PGD 8063 PUD a063 PMD 800000006a6901e3 BAD [ 151.825495] Oops: 0009 [1] PREEMPT SMP [ 151.825502] last sysfs file: devices/system/cpu/cpu0/cache/index2/shared_cpu_map [ 151.825512] CPU 1 (That's all that netconsole to another machine captured - if I had a stacktrace, I'd list it). |
|
|
|
|
|
#7 |
|
Registered User
Join Date: Apr 2003
Posts: 38
|
OK, somebody on the lkml list today hit the same exact problem with a Radeon 9200se. So it's probably not an NVidia driver-specific problem.
|
|
|
|
|
|
#8 |
|
Registered User
Join Date: Apr 2003
Posts: 38
|
9755 and 100.14.003 both were borked on 21-rc6-mm*, but work OK on 21-rc7-mm1, 21-mm1 and 22-rc1-mm1 (am typing on -rc1-mm1 kernel with Thomas Gleixner's dynticks patch. Yee-hah bleeding edge.
![]() |
|
|
|
|
|
#9 |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,661
|
That's good to know. Thanks for the update.
|
|
|
|
![]() |
| Most Popular NVIDIA Based Graphics Cards | |
|
|
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|