Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 06-01-04, 01:04 PM   #1
gallafent
Registered User
 
Join Date: Mar 2004
Posts: 10
Default Frequent server crash with 53.36 on SuSE 9.1 ...

Hi,

I recently installed SuSE 9.1 on my system (details below). Since then, I have been experiencing frequent server crashes (server hangs, mouse pointer still moves but numlock/capslock etc. non responsive, cannot switch to a console, etc.). The server can usually be killed with signal 9, but ignores e.g. signal 15. Most often it is KMail (starting up or redrawing) which triggers the crash, though it can happen when other applications are drawing to the screen (e.g. VMWare Workstation, Konqueror). The following appears in /var/log/messages, and seems to implicate fairly emphatically the nvidia kernel driver as being at fault. There are other variations of entry point (gettimeofday interrupt rather than apic, etc.), but they always end up in the same place, with the "badness" mentioned in the dump:

====================

Jun 1 15:50:29 linux kernel: Badness in pci_find_subsys at drivers/pci/search.c:167
Jun 1 15:50:29 linux kernel: Call Trace:
Jun 1 15:50:29 linux kernel: [pci_find_subsys+251/256] pci_find_subsys+0xfb/0x100
Jun 1 15:50:29 linux kernel: [pci_find_device+24/32] pci_find_device+0x18/0x20
Jun 1 15:50:29 linux kernel: [pci_find_slot+31/96] pci_find_slot+0x1f/0x60
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+280323/1876935] os_pci_init_handle+0x29/0x4f [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+387099/1876935] _nv001243rm+0x1f/0x24 [nvidia]
Jun 1 15:50:29 linux kernel: [<faed884f>] _nv001243rm+0x1f/0x24 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1724625/1876935] _nv000816rm+0x2f5/0x384 [nvidia]
Jun 1 15:50:29 linux kernel: [<fb01f105>] _nv000816rm+0x2f5/0x384 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1104104/1876935] _nv003801rm+0xd8/0x100 [nvidia]
Jun 1 15:50:29 linux kernel: [<faf8791c>] _nv003801rm+0xd8/0x100 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1723403/1876935] _nv000809rm+0x2f/0x34 [nvidia]
Jun 1 15:50:29 linux kernel: [<fb01ec3f>] _nv000809rm+0x2f/0x34 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1107724/1876935] _nv003816rm+0xf0/0x104 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1101770/1876935] _nv003795rm+0x6ea/0xaec [nvidia]
Jun 1 15:50:29 linux kernel: [<faf86ffe>] _nv003795rm+0x6ea/0xaec [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+487971/1876935] _nv004046rm+0x3a3/0x3b0 [nvidia]
Jun 1 15:50:29 linux kernel: [<faef1257>] _nv004046rm+0x3a3/0x3b0 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1543011/1876935] _nv001476rm+0x277/0x45c [nvidia]
Jun 1 15:50:29 linux kernel: [<faff2b97>] _nv001476rm+0x277/0x45c [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+398166/1876935] _nv000896rm+0x4a/0x64 [nvidia]
Jun 1 15:50:29 linux kernel: [<faedb38a>] _nv000896rm+0x4a/0x64 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+404336/1876935] rm_isr_bh+0xc/0x10 [nvidia]
Jun 1 15:50:29 linux kernel: [<faedcba4>] rm_isr_bh+0xc/0x10 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+273804/1876935] nv_kern_isr_bh+0xb/0xf [nvidia]
Jun 1 15:50:29 linux kernel: [<faebcdc0>] nv_kern_isr_bh+0xb/0xf [nvidia]
Jun 1 15:50:29 linux kernel: [tasklet_action+85/192] tasklet_action+0x55/0xc0
Jun 1 15:50:29 linux kernel: [<c0129fa5>] tasklet_action+0x55/0xc0
Jun 1 15:50:29 linux kernel: [do_softirq+114/224] do_softirq+0x72/0xe0
Jun 1 15:50:29 linux kernel: [do_IRQ+332/432] do_IRQ+0x14c/0x1b0
Jun 1 15:50:29 linux kernel: [<c010c2fc>] do_IRQ+0x14c/0x1b0
Jun 1 15:50:29 linux kernel: [smp_apic_timer_interrupt+234/352] smp_apic_timer_interrupt+0xea/0x160
Jun 1 15:50:29 linux kernel: [<c011adca>] smp_apic_timer_interrupt+0xea/0x160
Jun 1 15:50:29 linux kernel: [common_interrupt+24/32] common_interrupt+0x18/0x20
Jun 1 15:50:29 linux kernel: [<c010a148>] common_interrupt+0x18/0x20
Jun 1 15:50:29 linux kernel:
Jun 1 15:50:29 linux kernel: Badness in pci_find_subsys at drivers/pci/search.c:167
Jun 1 15:50:29 linux kernel: Call Trace:
Jun 1 15:50:29 linux kernel: [pci_find_subsys+251/256] pci_find_subsys+0xfb/0x100
Jun 1 15:50:29 linux kernel: [<c02072cb>] pci_find_subsys+0xfb/0x100
Jun 1 15:50:29 linux kernel: [pci_find_device+24/32] pci_find_device+0x18/0x20
Jun 1 15:50:29 linux kernel: [<c02072e8>] pci_find_device+0x18/0x20
Jun 1 15:50:29 linux kernel: [pci_find_slot+31/96] pci_find_slot+0x1f/0x60
Jun 1 15:50:29 linux kernel: [<c020730f>] pci_find_slot+0x1f/0x60
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+280323/1876935] os_pci_init_handle+0x29/0x4f [nvidia]
Jun 1 15:50:29 linux kernel: [<faebe737>] os_pci_init_handle+0x29/0x4f [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+387099/1876935] _nv001243rm+0x1f/0x24 [nvidia]
Jun 1 15:50:29 linux kernel: [<faed884f>] _nv001243rm+0x1f/0x24 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1112601/1876935] _nv003797rm+0xa9/0x128 [nvidia]
Jun 1 15:50:29 linux kernel: [<faf89a4d>] _nv003797rm+0xa9/0x128 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1557597/1876935] _nv001490rm+0x55/0xe4 [nvidia]
Jun 1 15:50:29 linux kernel: [<faff6491>] _nv001490rm+0x55/0xe4 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1724688/1876935] _nv000816rm+0x334/0x384 [nvidia]
Jun 1 15:50:29 linux kernel: [<fb01f144>] _nv000816rm+0x334/0x384 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1104104/1876935] _nv003801rm+0xd8/0x100 [nvidia]
Jun 1 15:50:29 linux kernel: [<faf8791c>] _nv003801rm+0xd8/0x100 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1723403/1876935] _nv000809rm+0x2f/0x34 [nvidia]
Jun 1 15:50:29 linux kernel: [<fb01ec3f>] _nv000809rm+0x2f/0x34 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1107724/1876935] _nv003816rm+0xf0/0x104 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1101770/1876935] _nv003795rm+0x6ea/0xaec [nvidia]
Jun 1 15:50:29 linux kernel: [<faf86ffe>] _nv003795rm+0x6ea/0xaec [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+487971/1876935] _nv004046rm+0x3a3/0x3b0 [nvidia]
Jun 1 15:50:29 linux kernel: [<faef1257>] _nv004046rm+0x3a3/0x3b0 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+1543011/1876935] _nv001476rm+0x277/0x45c [nvidia]
Jun 1 15:50:29 linux kernel: [<faff2b97>] _nv001476rm+0x277/0x45c [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+398166/1876935] _nv000896rm+0x4a/0x64 [nvidia]
Jun 1 15:50:29 linux kernel: [<faedb38a>] _nv000896rm+0x4a/0x64 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+404336/1876935] rm_isr_bh+0xc/0x10 [nvidia]
Jun 1 15:50:29 linux kernel: [<faedcba4>] rm_isr_bh+0xc/0x10 [nvidia]
Jun 1 15:50:29 linux kernel: [__crc_vc_cons_allocated+273804/1876935] nv_kern_isr_bh+0xb/0xf [nvidia]
Jun 1 15:50:29 linux kernel: [<faebcdc0>] nv_kern_isr_bh+0xb/0xf [nvidia]
Jun 1 15:50:29 linux kernel: [tasklet_action+85/192] tasklet_action+0x55/0xc0
Jun 1 15:50:29 linux kernel: [<c0129fa5>] tasklet_action+0x55/0xc0
Jun 1 15:50:29 linux kernel: [do_softirq+114/224] do_softirq+0x72/0xe0
Jun 1 15:50:29 linux kernel: [<c012a422>] do_softirq+0x72/0xe0
Jun 1 15:50:29 linux kernel: [do_IRQ+332/432] do_IRQ+0x14c/0x1b0
Jun 1 15:50:29 linux kernel: [<c010c2fc>] do_IRQ+0x14c/0x1b0
Jun 1 15:50:29 linux kernel: [smp_apic_timer_interrupt+234/352] smp_apic_timer_interrupt+0xea/0x160
Jun 1 15:50:29 linux kernel: [<c011adca>] smp_apic_timer_interrupt+0xea/0x160
Jun 1 15:50:29 linux kernel: [common_interrupt+24/32] common_interrupt+0x18/0x20
Jun 1 15:50:29 linux kernel: [<c010a148>] common_interrupt+0x18/0x20
Jun 1 15:50:29 linux kernel:

====================

System info:

Gainward ultra780xp GeForceFX 5600, dual DVI-D outputs (using twinview). (BIOS: 04.31.20.40.00)

Driver 5336.

SuSE 9.1 (with all current updates, i.e. SuSE Kernel kernel-smp-2.6.4-54.5 - and nvidia driver rebuilt against that kernel).

Dual processor machine: 2x AMD Athlon MP 1900+ (1.6GHz), in Tyan S2460 Tiger MP motherboard.

The problem occurs with Render acceleration turned on or off, and with APIC enabled or not - neither of those seems to make any difference.

I used to experience occasional (less than once per day) crashes with my previous configuration (same machine, SuSE 8.2), but it is now crashing far more often, making the machine virtually unusable if I use applications which trigger the crash... such as my web browser or email client!

Any suggestions welcome, as would be a fixed kernel module!

[update]: I seem to have found some more stability by forcing AGP2x rather than the default 4x for my system. Does that suggest a race condition somewhere? Obviously I'd rather stay at 4x, but for the time being I'll settle for 2x and stability I think!
gallafent is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 02:52 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.