Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 11-14-06, 08:07 PM   #1
bwkaz
Registered User
 
Join Date: Sep 2002
Posts: 2,262
Default Pixmap corruption with 32-bit-nvidia_drv.o / 64-bit-nvidia.ko

I'll first explain what I'm doing, and why. I have two kernels (one 32-bit and one 64-bit). I have only one userspace (all of which is 32-bit binaries). My X server must run under both kernels, so it must be a 32-bit binary. This means that it must use the 32-bit nvidia_drv.o. However, the nvidia.ko driver must match the current kernel's bit-ness -- if I want to load it under the 64-bit kernel, I have to load the 64-bit driver. (I need the 64-bit kernel because I have another system that's being built whose userspace stuff is 64-bit, and I chroot into it.)

I install the drivers somewhat strangely. First, I run the 32-bit installer, which works fine. Then, I reboot into the 64-bit kernel and use --extract-only on the x86_64 installer. I manually build and install its kernel driver (because I have to modify my $PATH to get the 64-bit gcc 4.1.1 cross-compiler in it, I have to set CC for the nvidia checks, and I have to set both ARCH and CROSS_COMPILE for the kernel to figure out what's going on). I do not install the 64-bit userspace stuff, because none of my userspace programs will use it. So the attached nvidia-installer.log (in nvidia-bug-report.log) is for the 32-bit stuff only, not the 64-bit kernel driver.

None of this has been an issue so far; I've been able to use a 32-bit nvidia_drv.o with both 32-bit and 64-bit nvidia.ko files. I've used 8774, the 9625 beta, and 9629, all (mostly) successfully. But now I'm using 9742, and it looks like it's broken (see below).

The reason I went to 9742 was because of some very strange periodic lockups I was seeing under both 9625 and 9629. Every once in a while, gkrellm would start showing about a second of 100% CPU usage in the kernel, followed by a five- or ten-second complete lockup (not even timer interrupts were being handled), followed by another second of 100% in-kernel CPU usage. Then the pattern repeated maybe two or three seconds later. If I managed to get the X server to exit, this behavior completely stopped.

Anyway, I think 9742 may have fixed that one. The problem is, now mixing 32-bit userspace with the 64-bit kernel is broken. I get seemingly random pixmap corruption; see the attached screenshot of gkrellm for instance. The screenshot file is the right size, but has the wrong pixels. (Note that at the time this screenshot was taken, the gkrellm window was fully exposed, and the screenshot's contents are not what I was seeing on the screen. But it illustrates the problem well enough -- it's similar to what I was seeing -- so I'll post it anyway.)

Running a 32-bit kernel with this X server works fine. Booting into my other 64-bit-X-server system and running its 9742 user driver against the 64-bit 9742 kernel driver also works fine. But 32-bit userspace + 64-bit kernel driver gives me grief.

Stuff I've already looked at that isn't in the log file I'm going to attach:

Booting with noapic: No change, still corrupts stuff
Booting with pci=noacpi: Kernel hangs at boot time just after initializing vesafb
Booting with acpi=off: Same kernel hang

I first saw this with the RenderAccel option on; turning it off did not help. (It's off in the log file.) I also had AGP fast writes and SBA turned on using module parameters (though FW are not supported by my host bridge); turning those off didn't actually change anything in the /proc/driver/nvidia/agp/status file (which is odd...), but didn't fix the corruption issue either. (They're set to zero in the log.)

Anything else I should look at, or is this a known bug in the beta driver? If it's known, I assume it should be fixed whenever 97xx is released (or possibly in the next beta), right?

Thanks!

(Edit: Apparently libGL-using programs are segfaulting too, but I'm going to assume for the moment that that's a separate issue. We'll see. It seems to be both 32-bit libGL from the same system as the 64-bit kernel driver, and 64-bit libGL from the other system inside chroot. I wonder if I broke the kernel module...)

(Edit again: I have a GF 6800 GT, so it's not the NV2x issue. In case that isn't obvious from nvidia-bug-report.log.)
Attached Thumbnails
Click image for larger version

Name:	gkrellm.png
Views:	154
Size:	37.3 KB
ID:	21992  
Attached Files
File Type: log nvidia-bug-report.log (116.3 KB, 84 views)
__________________
Registered Linux User #219692
bwkaz is offline   Reply With Quote
Old 11-20-06, 05:53 PM   #2
bwkaz
Registered User
 
Join Date: Sep 2002
Posts: 2,262
Default Re: Pixmap corruption with 32-bit-nvidia_drv.o / 64-bit-nvidia.ko

OK, it's not something wrong with my kernel module, I don't think.

I booted into my 64-bit multilib system, and rebuilt 2.6.18 (after "make mrproper"). Then I installed its modules, and copied it to that system's /boot directory. Then I copied it to my other system's /boot directory, and also copied its entire module tree.

Then, after verifying that both systems worked (they did), I booted into the 64-bit multilib system, and installed the driver using the .run file (with 64- and 32-bit libraries, not using a chroot path). I then copied the nvidia.ko file to the other (non-multilib) system.

Then I booted back to the non-multilib system, under the 64-bit kernel, did the required "depmod -ae", and tried to startx -- and got the exact same pixmap corruption.

This tells me that the problem really looks like some kind of incompatibility (probably accidental) between the 32-bit nvidia_drv.o and the 64-bit nvidia.ko. Is anybody else seeing this at all, or is everyone else running a 64-bit Xorg, or what?

Thanks!
__________________
Registered Linux User #219692
bwkaz is offline   Reply With Quote
Old 11-20-06, 10:14 PM   #3
AaronP
NVIDIA Corporation
 
AaronP's Avatar
 
Join Date: Mar 2005
Posts: 2,487
Default Re: Pixmap corruption with 32-bit-nvidia_drv.o / 64-bit-nvidia.ko

Hi bwkaz,
I've reproduced this problem and opened bug 269306. Thanks!
AaronP is offline   Reply With Quote
Old 11-23-06, 11:37 AM   #4
bwkaz
Registered User
 
Join Date: Sep 2002
Posts: 2,262
Default Re: Pixmap corruption with 32-bit-nvidia_drv.o / 64-bit-nvidia.ko

Just out of curiosity, is the bug database public, so I can look at the state of this bug from time to time? I suspect not, but it can't hurt to ask.

Either way, thanks! I'll try to remember to check for new betas from time to time, too, in case you release another beta that has the issue fixed.
__________________
Registered Linux User #219692
bwkaz is offline   Reply With Quote
Old 12-01-06, 04:41 PM   #5
AaronP
NVIDIA Corporation
 
AaronP's Avatar
 
Join Date: Mar 2005
Posts: 2,487
Default Re: Pixmap corruption with 32-bit-nvidia_drv.o / 64-bit-nvidia.ko

I'm afraid the bug database is only available to NVIDIA employees. You can ask for a status update here or on linux-bugs@nvidia.com. Anyway, this bug should be fixed in the next 1.0-97xx release.
AaronP is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 08:34 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.