AMD64 dual-core multi-core NVIDIA 8200 ubuntu 8.10 2.6.27 freeze hang lock-up maxcpus=1
System freezes (blank screen, white or other solid colour, cannot toggle numlock, change tty, or get any response via Alt-SysRq combinations) with amd64, multi-core and integrated NVIDIA 8200. Requires hard reset. No message is output to the F8 console (if provoking the freeze while in text mode) and no errors are written to system logs.
(nvidia-bug-report.log attached in two parts.)
STEPS TO REPRODUCE
The freeze can be easily provoked by configuring networking using knetworkmanager using an Atheros chipset 802.11n PCI network card and the ath9k driver... freeze is usually within seconds of starting networking…at most a few minutes, depending on network activity. However, the freeze will still happen eventually even if the network is not initialized.
Only known work-around is limiting the system to a single CPU core via maxcpus=1 kernel boot parameter (clearly not a desirable solution). With a single core, the system is rock solid, even under heavy network, graphics, processing or memory load.
As there is a history of amd64 multi-core + NVIDIA systems freezing, and the freeze will happen eventually even without configuring networking, my hypothesis is that the freezing is NVIDIA driver-related, possibly to do with interrupt handling or a spinlock issue? Having testing limiting the kernel to less than 4GB of memory and disabling iommu (either APGART or swiotbl) I believe I’ve ruled out a memory DMA issue.
Motherboard: Asus M3N78-VM
BIOS: Ver. 0804 (10/15/2008) - latest available from Asus web site
Video: Integrated NVIDIA GeForce 8200 GPU (nvidia drivers 177.82, 180.08 beta)
Processor: AMD Athlon 64 X2 Dual Core 6000+
Memory: 4GB (2x OCZ 2G DDR2 PC2 6400 )
Peripherals: Seagate 1TB SATA II hard drive (AHCI mode, ST31000340AS)
Network: D-Link DWA-552 802.11n PCI (Atheros 5416)
Distribution: kubuntu 8.10 with all updates
Latest kernel: 2.6.27-7-generic
Earliest kernel tried: 2.6.27-4
- Collecting mcelog entries… none logged
- Memtest (complete pass, no errors)
- Limiting RAM <4GB and disabling iommu
- Compiling ath9k (compat-wireless-2008-11-17) with debug (ATH_DBG_ANY) – no errors
- Disabling kpowernowd
- Loading resetting CMOS and loading BIOS defaults
- Trying to limit all interrupt handling to the first core
- range of BIOS options
-Kernel boot options I’ve tried (all still result in freeze)
noapic nolapic (system will not boot)
noacpi (system will not boot)
POSSIBLY RELATED BUG REPORTS/DISCUSSIONS
WHAT TO TRY NEXT?
Not sure where I should be going next with this. Suggestions? Some ideas I’m playing with….
- Is there a more generic NV driver I can try, that would help to eliminate a driver issue?
- Try onboard networking instead of the 802.11n PCI NIC?
- Enable some debugging in kernel to try and capture cause of hang? (How to go about this ?)