View Single Post
Old 02-18-04, 10:01 AM   #9
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default

The Badness in pci_find_subsys at drivers/pci/search.c problem really is one of many possible symptoms of common stability problems; it has been discussed here and elsewhere in the past.

Many of the stability problems are old "friends" and neither kernel nor driver version dependent (many are Linux specific in the sense that they do not reproduce on Windows). The warning message itself is harmless: the actual problem has already occured by the time it is printed, was detected by the NVIDIA driver and a recovery attempt is being made; note that the warning is not necessarily followed by a hard system lockup, in many cases the error condition can be corrected and normal operation resumed (the system may be rendered unresponsive for a few seconds when this happens). There is no single works-for-all solution or even workaround for those who experience this or similar stability problems: the best one can do to solve them is to eliminate possible error sources one by one. It is unfortunate that one needs to deal with this at all, but it seems that this still is one of the prices one pays for using GNU/Linux...

The most common error sources are bad AGP/ACPI/APIC configurations (hardware/software), conflicts with vesafb/rivafb (using two independent drivers for a single piece of hardware is asking for trouble), ..., heavily patched (experimental) kernels, bad RAM and thermal stress (excess heat) due to insufficient cooling. There's the possibility of bugs in the NVIDIA driver as well, naturally, but they seem to account for no more than a fraction of the stability problems encountered.

If you find that your system doesn't work reliably, try to approach the problem systematically: disable ACPI and/or APIC support, disable vesafb (or rivafb), check the RAM with a dedicated memory test such as memtest86, ..., monitor the CPU and case temperatures, check if disabling AGP works (if it does, experiment with the configuration (switch the AGP GART driver, disable FW/SBA, throttle the transfer rate, check the README for additional things to try)), try different (older, newer, less experimental) kernels, try a different driver release, ... . These are just some suggestions, you will find more in the official documentation, on forums (such as this one), ..., and mailing lists.
zander is offline   Reply With Quote