nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   >=3G installed crashes X server due to simplistic PCI allocation by kernel (http://www.nvnews.net/vbulletin/showthread.php?t=106114)

cksony1 01-12-08 02:42 PM

>=3G installed crashes X server due to simplistic PCI allocation by kernel
 
Apologies for posting this issue as a new thread while having stated it in another thread, but I felt I should bring it to the attention of nvidia and kernel developers by starting a new thread with an appropriate title.

I'm running an AMD 64bit system with 4G memory and a GeForce Go 7600 (128M onboard) installed. With any nvidia drivers, my X server keeps crashing on startup or gives a black screen. Everything works fine with 2G installed. I traced the problem down to this issue:

When 2G are installed, this is how things basically look:
00000000 - 7FFFFFFF main memory
80000000 - AFFFFFFF system components
B0000000 - BFFFFFFF GeForce (256M)
C0000000 - CFFFFFFF free or system components
D0000000 - D1FFFFFF GeForce (32M)
D2FFFFFF - DFFFFFFF free or system components
E0000000 - FFFFFFFF reserved by the BIOS, don't know why.
Everything ok here.

When 4G are installed, things get crowded:
00000000 - BFFFFFFF main memory (3G only, as usual)
C0000000 - CFFFFFFF system components
D0000000 - D1FFFFFF GeForce (32M)
D2FFFFFF - DFFFFFFF free or system components
E0000000 - FFFFFFFF reserved by the BIOS, don't know why.

Due to lack of sufficiently large free address space for the 256M chunk, PCI allocation fails (kernel log says so, too) and the kernel reprograms the chunk to start at 100000000, which is not a 32bit address anymore. That again is a problem for the X server, which attempts to correct the memory allocation in vain, and fails or crashes.

So, some questions:

- I have 128M video memory, and an AGP aperture of 256M (at least in Vista, the Notebook BIOS doesn't tell it), so why does it allocate 32M + 256M ?

- Could I patch my kernel much in the same manner as has been suggested in the related thread titled dual-channel RAMs, to allocate 256M at C0000000-CFFFFFFF, prior to allocation of other PCI devices ? This is what Vista does, as I can tell from the device resources in the hardware manager.

Best regards,
Chris

chunkey 01-12-08 03:13 PM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
Do you have tried the latest BIOS yet? (because yours is BUGGY!!!).
Another thing: could you please generate a nvidia-bug-report.log? (see sticky "If you have a problem, PLEASE read this first")

regards,
Christian

cksony1 01-12-08 03:26 PM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
Well, certainly it is buggy since initializing the GeForce to B0000000 when this actually is main memory is not a good idea. Still, this is a notebook (Sony Vaio FE31M) and Sony insists on supporting up to 2G main memory. So from their POV the BIOS is ok and yes, I have the newest revision flashed.

I'll file a bug report, yes. However, it's more of a bug in the linux pci kernel module than an nvidia driver bug.

cksony1 01-12-08 05:06 PM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
Ok, this was my very first linux kernel hack.

I went to arch/i386/pci/i386.c and hacked the PCI address allocation code as suggested in the other thread. I polled for the wrong address range B0000000-BFFFFFFF and replaced it by C0000000-CFFFFFFF. The other devices that come later on the bus list fail to allocate, but are properly reallocated by the code that cares about unassigned PCI devices.

My X server now starts properly with 4GB installed and I have MemTotal=3021M in proc/meminfo.

The lesson to be learned probably is that the kernel's PCI assignment code should care even more than it already does about wrong BIOS settings in the case that much main memory is installed.

To those of you who are more familiar with the usual way of doing things than me: What would be the best way to make kernel developers aware of the issue ? I'm sure a lot of folks will run into it, given the low DRAM prices.

It would also be nice if further nvidia drivers could at least catch the issue with addresses >4G getting assigned to the PCI device, and print a suitable message.

Thanks for reading this :-)

pe1chl 01-13-08 05:12 AM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
I think the kernel developers are aware of this issue and they want to avoid putting hacks that are specific to a certain buggy BIOS into the mainstream kernel, because it might affect other systems that worked OK without the hack.
This whole issue is only very temporary, because not only people are using more than 2GB memory these days, but they are also getting 64-bit capable processors and install 64-bit versions of Linux. That solves the issue.

cksony1 01-13-08 07:15 AM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
You are wrong, I'm afraid.

I _have_ a 64 bit CPU (Core Duo T5600) and I _do_ run a 64 bit Linux (AMD64 gentoo 2.6.23-r5).
Sorry for not having expressed this clearly beforehand.

The issue is not temporary and will not vanish with time. As I described above, the issue is with allocation of PCI device address spaces. It will persist as long as the X server does not cope with true 64 bit addresses.

Again, my BIOS' ACPI information lists a wrong start address for the GeForce. It is B0000000 even though this is mapped to main memory already. It should have been C0000000 or higher. PCI allocation in arch/i386/pci/i386.c detects this and postpones resource allocation for the GeForce Go 7600 until _after_ all other devices had their chance to allocate resources. Unfortunately, when the postponed allocation for the GeForce starts - after all other devices have acquired their address spaces - there is no consecutive chunk of 256M free below 4G any more. So the device gets an address above 4G. This in turn kills the X server.

I have fixed this by manually moving the wrong B0000000 address to C0000000. I understand that this is not a fix of general applicability and I do not propose to fix the kernel in this way for _everyone_. However I wanted to say that it is possible for individuals to fix it this way, and that the information might be worthwhile.

What I _do_ propose is to make PCI resource allocation a bit more clever. Two issues stand out: Postponing failing allocations is bad. Allocating addresses >=4G seems bad with the current state of drivers and software, EVEN THOUGH it is true 64 bit software.

What I also _do_ propose is to harden the nvidia drivers against that issue. It should definitely check for framebuffer addresses >=4G assigned to the nvidia device, give note to the user, and stop the X server startup.

pe1chl 01-13-08 09:08 AM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
Ok, I think I have read that the use of a 64-bit OS (and of course also X server) version fixes this problem. Apparently that is not the case. Maybe that problem should be investigated and fixed.
(I am using a 64-bit system but I have "only" 2GB RAM so there is no allocation problem on my system anyway)

ssake 02-17-08 03:29 AM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
I just want to confirm that I have the same problem.

I am running an ASUS M2A-VM Motherboard with AMD Dual Core 64bit CPU and an NVIDIA GeForce7200 card so I use the 169.09 driver. When I have up to 3GB of RAM everything works fine. Going to 4GB (I also went up to 6GB just to check) put the XServer in an infinite loop (even sax2 so I cannot even install the driver with more than 3GB), as I see from the xorg log:

(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(WW) NVIDIA(0): The NVIDIA X driver has encountered too many errors. Falling
(WW) NVIDIA(0): back to legacy PCI mode.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.

and goes on and on...

Just using VESA works fine. Does anyone know if the NVidia developers are working on this and we can expect a solution soon?

LucidTaZ 04-02-08 07:11 PM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
Hi,

I'm using an ASUS M2A-VM HDMI motherboard and GeForce 8600 GTS. I use 4GB of memory and this problem also occurs to me. I have zero experience of hacking into kernels, so what would be the best way of resolving this problem?

Would a bios update work?

uOpt 04-02-08 09:13 PM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
Hmmm, could this also be a problem in 32 bit with PAE?

I had several panics now with a freshly made AMD64 under Linux/i386/2.4.24-linux/169-nvidia.

logan 04-02-08 09:54 PM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
I'm running 32bit unstable Debian on a MSI P35 PLATINUM+E8400 with 4GB ram and 169.12 drivers. No problems getting into X, just some hangs while gaming.

My .config has CONFIG_HIGHMEM4G=y and CONFIG_RESOURCES_64BIT=y, which seem like pretty standard options, RedHat/Fedora uses this anyway.. I don't have anything for CONFIG_X86_PAE in my .config, not sure how to enable it otherwise, but I know there's other issues with it so I never bothered.

A few weeks back, several people experienced hangs when attempting to start X and all but one fixed the problem with a BIOS update - http://www.nvnews.net/vbulletin/showthread.php?t=109359 - I don't see anything about memory configuration there, just sounds like most of them were using motherboards that were 1.5-2+ years old.

When I asked for help in the past, a BIOS update was always one of the first things mentioned. If there's something more recent available, you should try it LucidTaZ.

LucidTaZ 04-03-08 04:34 AM

Re: >=3G installed crashes X server due to simplistic PCI allocation by kernel
 
Thanks for the replies. :)

I forgot to tell that I'm running 64 bit Linux. And this motherboard is brand new (several weeks old) so could the bios really be the problem? The card is running fine in Windows XP btw.


All times are GMT -5. The time now is 01:02 PM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.