nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   Call trace with 5328 and kernel 2.6 (http://www.nvnews.net/vbulletin/showthread.php?t=23391)

andersgd 01-11-04 08:52 AM

Call trace with 5328 and kernel 2.6
 
Hi!

I'm using 5328 with minion.de patches with the 2.6.1 kernel. I'm getting a lot of Call traces, every 10 or 15 seconds. At each call trace my second monitor in my twinview configuration flickers once.

Any solutions? Is this common?

Any advice is welcome!
Thanks
/Anders


Jan 11 15:34:17 gonzo kernel: Debug: sleeping function called from invalid context at mm/slab.c:1856
Jan 11 15:34:17 gonzo kernel: in_atomic():1, irqs_disabled():0
Jan 11 15:34:17 gonzo kernel: Call Trace:
Jan 11 15:34:17 gonzo kernel: [<c011ce2b>] __might_sleep+0xab/0xd0
Jan 11 15:34:17 gonzo kernel: [<c01422c6>] __kmalloc+0x96/0xa0
Jan 11 15:34:17 gonzo kernel: [<e0be0cba>] os_alloc_mem+0x7a/0x90 [nvidia]
Jan 11 15:34:17 gonzo kernel: [<e0a75a20>] _nv001308rm+0x10/0x28 [nvidia]
Jan 11 15:34:17 gonzo kernel: [<e0b8e1bd>] _nv001518rm+0x7c9/0xb34 [nvidia]
Jan 11 15:34:17 gonzo kernel: [<e0b182ac>] _nv002464rm+0x88/0x35c [nvidia]
Jan 11 15:34:17 gonzo kernel: [<e0a6b595>] _nv001338rm+0x1d/0x24 [nvidia]
Jan 11 15:34:17 gonzo kernel: [<e0a5c185>] _nv005601rm+0xd/0x34 [nvidia]
Jan 11 15:34:17 gonzo kernel: [<e0a63868>] _nv000858rm+0x300/0xe14 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<c01fe956>] serial8250_interrupt+0x36/0x100
Jan 11 15:34:18 gonzo kernel: [<c010b66a>] handle_IRQ_event+0x3a/0x70
Jan 11 15:34:18 gonzo kernel: [<c010ba24>] do_IRQ+0xb4/0x130
Jan 11 15:34:18 gonzo kernel: [<c0109d88>] common_interrupt+0x18/0x20
Jan 11 15:34:18 gonzo kernel: [<e0a5fb1d>] _nv002962rm+0x2c5/0x3b8 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0a7a4d9>] _nv000899rm+0x4c9/0xf70 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0a7a4ec>] _nv000899rm+0x4dc/0xf70 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0a9e21a>] _nv005046rm+0x52/0x70 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0b4250f>] _nv001614rm+0x23/0x84 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0b4140b>] _nv001556rm+0x5b/0x6c [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0b97170>] _nv001988rm+0x1f0/0x298 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0a6b42e>] _nv001344rm+0x46/0x6c [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0b9308a>] _nv001480rm+0x3a/0x7c [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0a6b595>] _nv001338rm+0x1d/0x24 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0b4250f>] _nv001614rm+0x23/0x84 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0aa306c>] _nv004220rm+0x28/0x30 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<c01fe7b5>] transmit_chars+0xa5/0xd0
Jan 11 15:34:18 gonzo kernel: [<c01fe956>] serial8250_interrupt+0x36/0x100
Jan 11 15:34:18 gonzo kernel: [<c010b66a>] handle_IRQ_event+0x3a/0x70
Jan 11 15:34:18 gonzo kernel: [<c010ba24>] do_IRQ+0xb4/0x130
Jan 11 15:34:18 gonzo kernel: [<c0109d88>] common_interrupt+0x18/0x20
Jan 11 15:34:18 gonzo kernel: [<e0bd6d5f>] _nv000176rm+0x57/0x3ec [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0bd6d5f>] _nv000176rm+0x57/0x3ec [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0a6b595>] _nv001338rm+0x1d/0x24 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0a90adc>] _nv005307rm+0x54/0x544 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0b412cb>] _nv001532rm+0x1f/0x28 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0b4127c>] _nv001534rm+0x20/0x28 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0b41a72>] _nv003621rm+0x1a/0x20 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0bd6d5f>] _nv000176rm+0x57/0x3ec [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0b90623>] _nv003073rm+0x1b/0x30 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0bd6d5f>] _nv000176rm+0x57/0x3ec [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0a6b595>] _nv001338rm+0x1d/0x24 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0a90adc>] _nv005307rm+0x54/0x544 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<c011aa5e>] recalc_task_prio+0x8e/0x1b0
Jan 11 15:34:18 gonzo kernel: [<c0127a40>] process_timeout+0x0/0x10
Jan 11 15:34:18 gonzo kernel: [<c011baa1>] __wake_up_common+0x31/0x60
Jan 11 15:34:18 gonzo kernel: [<c0127686>] update_process_times+0x46/0x60
Jan 11 15:34:18 gonzo kernel: [<c013e9b1>] buffered_rmqueue+0xd1/0x170
Jan 11 15:34:18 gonzo kernel: [<e0a6b595>] _nv001338rm+0x1d/0x24 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<c01109e0>] convert_fxsr_to_user+0xc0/0x160
Jan 11 15:34:18 gonzo kernel: [<c0110c2f>] save_i387_fxsave+0xbf/0xf0
Jan 11 15:34:18 gonzo kernel: [<c0108b01>] setup_sigcontext+0xe1/0x130
Jan 11 15:34:18 gonzo kernel: [<c0108c3d>] setup_frame+0xed/0x1f0
Jan 11 15:34:18 gonzo kernel: [<e0a79be1>] rm_ioctl+0x19/0x20 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<e0bde5d5>] nv_kern_ioctl+0x85/0x440 [nvidia]
Jan 11 15:34:18 gonzo kernel: [<c0110aa5>] convert_fxsr_from_user+0x25/0xf0
Jan 11 15:34:18 gonzo kernel: [<c0110def>] restore_i387_fxsave+0x7f/0x90
Jan 11 15:34:18 gonzo kernel: [<c0110e96>] restore_i387+0x96/0xa0
Jan 11 15:34:18 gonzo kernel: [<c01087f9>] restore_sigcontext+0x119/0x140
Jan 11 15:34:18 gonzo kernel: [<c0168e13>] sys_ioctl+0xf3/0x2a0
Jan 11 15:34:18 gonzo kernel: [<c01093c9>] sysenter_past_esp+0x52/0x71

zander 01-11-04 08:56 AM

You can disable the warning messages with CONFIG_DEBUG_SPINLOCK_SLEEP; when did you download the 1.0-5328 patch? Are you still seing these with the patch posted on 01-09-2004?

andersgd 01-11-04 09:02 AM

My 5328 patch is not the one posted on 01-09-2004. I'll try the new one in a few minutes.

Thanks for your help!!
/Anders

caffeine 01-24-04 11:55 AM

Me Too!
 
I'm having the same problem. It's making my machine quite unstable. I'm on 2.6.1 and nvidia-kernel 1.0.5328.

Code:

Jan 24 19:55:17 espresso kernel: Debug: sleeping function called from invalid context at mm/slab.c:1856
Jan 24 19:55:17 espresso kernel: in_atomic():1, irqs_disabled():0
Jan 24 19:55:17 espresso kernel: Call Trace:
Jan 24 19:55:17 espresso kernel:  [<c011e86c>] __might_sleep+0xac/0xe0
Jan 24 19:55:17 espresso kernel:  [<c0151e4c>] __kmalloc+0x24c/0x260
Jan 24 19:55:17 espresso kernel:  [<f0d2878a>] os_alloc_mem+0x5c/0x87 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0d42ba0>] _nv001308rm+0x10/0x28 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0e5b33d>] _nv001518rm+0x7c9/0xb34 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<c01517f8>] cache_flusharray+0xd8/0x2c0
Jan 24 19:55:17 espresso kernel:  [<f0d29305>] _nv005601rm+0xd/0x34 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0d309e8>] _nv000858rm+0x300/0xe14 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0abcba8>] bttv_irq+0x68/0x3e0 [bttv]
Jan 24 19:55:17 espresso kernel:  [<c010bdeb>] handle_IRQ_event+0x3b/0x70
Jan 24 19:55:17 espresso kernel:  [<c010c494>] do_IRQ+0x1c4/0x3b0
Jan 24 19:55:17 espresso kernel:  [<f0d2cc9d>] _nv002962rm+0x2c5/0x3b8 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0d47659>] _nv000899rm+0x4c9/0xf70 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0d4766c>] _nv000899rm+0x4dc/0xf70 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<c011a778>] kernel_map_pages+0x28/0x60
Jan 24 19:55:17 espresso kernel:  [<c011a778>] kernel_map_pages+0x28/0x60
Jan 24 19:55:17 espresso kernel:  [<c03cb955>] kfree_skbmem+0x25/0x30
Jan 24 19:55:17 espresso kernel:  [<c03cb955>] kfree_skbmem+0x25/0x30
Jan 24 19:55:17 espresso kernel:  [<c011a778>] kernel_map_pages+0x28/0x60
Jan 24 19:55:17 espresso kernel:  [<c011a778>] kernel_map_pages+0x28/0x60
Jan 24 19:55:17 espresso kernel:  [<c03cb955>] kfree_skbmem+0x25/0x30
Jan 24 19:55:17 espresso kernel:  [<c03cb955>] kfree_skbmem+0x25/0x30
Jan 24 19:55:17 espresso kernel:  [<c03cb9cb>] __kfree_skb+0x6b/0xe0
Jan 24 19:55:17 espresso kernel:  [<c02ce71a>] rtl8139_start_xmit+0x14a/0x2b0
Jan 24 19:55:17 espresso kernel:  [<c03d0173>] dev_queue_xmit_nit+0xc3/0x120
Jan 24 19:55:17 espresso kernel:  [<c011a8a8>] recalc_task_prio+0xa8/0x1d0
Jan 24 19:55:17 espresso kernel:  [<f0ea3edf>] _nv000176rm+0x57/0x3ec [nvidia]
Jan 24 19:55:17 espresso kernel:  [<c011f2d5>] autoremove_wake_function+0x25/0x50
Jan 24 19:55:17 espresso kernel:  [<c011a491>] change_page_attr+0x101/0x1c0
Jan 24 19:55:17 espresso kernel:  [<f0ea3edf>] _nv000176rm+0x57/0x3ec [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0e0e44b>] _nv001532rm+0x1f/0x28 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0e0ebf2>] _nv003621rm+0x1a/0x20 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0ea54b8>] _nv000183rm+0x750/0x774 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0e0e44b>] _nv001532rm+0x1f/0x28 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0d38715>] _nv001338rm+0x1d/0x24 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0d5dc5c>] _nv005307rm+0x54/0x544 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0e0e44b>] _nv001532rm+0x1f/0x28 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<c012de24>] update_process_times+0x44/0x50
Jan 24 19:55:17 espresso kernel:  [<c012dc96>] update_wall_time+0x16/0x40
Jan 24 19:55:17 espresso kernel:  [<c012e3a0>] do_timer+0xe0/0xf0
Jan 24 19:55:17 espresso kernel:  [<c011254f>] timer_interrupt+0x8f/0x270
Jan 24 19:55:17 espresso kernel:  [<c010c525>] do_IRQ+0x255/0x3b0
Jan 24 19:55:17 espresso kernel:  [<c03cb758>] alloc_skb+0x48/0xf0
Jan 24 19:55:17 espresso kernel:  [<c011a778>] kernel_map_pages+0x28/0x60
Jan 24 19:55:17 espresso kernel:  [<c03cb758>] alloc_skb+0x48/0xf0
Jan 24 19:55:17 espresso kernel:  [<c011c11a>] __wake_up_common+0x3a/0x70
Jan 24 19:55:17 espresso kernel:  [<c0424907>] unix_stream_sendmsg+0x337/0x530
Jan 24 19:55:17 espresso kernel:  [<c03cb955>] kfree_skbmem+0x25/0x30
Jan 24 19:55:17 espresso kernel:  [<c03c708e>] sock_sendmsg+0x8e/0xb0
Jan 24 19:55:17 espresso kernel:  [<c03c721e>] sock_aio_read+0xbe/0xe0
Jan 24 19:55:17 espresso kernel:  [<f0d46d61>] rm_ioctl+0x19/0x20 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<f0d25406>] nv_kern_ioctl+0x4de/0x529 [nvidia]
Jan 24 19:55:17 espresso kernel:  [<c03c74df>] sock_writev+0x4f/0x60
Jan 24 19:55:17 espresso kernel:  [<c017243d>] do_readv_writev+0x21d/0x2c0
Jan 24 19:55:17 espresso kernel:  [<c018ce27>] sys_ioctl+0x217/0x420
Jan 24 19:55:17 espresso kernel:  [<c0127976>] sys_gettimeofday+0x66/0xe0
Jan 24 19:55:17 espresso kernel:  [<c010a26b>] syscall_call+0x7/0xb


caffeine 01-24-04 12:35 PM

OK, enabling CONFIG_DEBUG_SPINLOCK_SLEEP=y seems to at least suppress error messages. Will there be any performance penalty?

zander 01-24-04 12:53 PM

The warning messages will have an impact on overall system performance if they occur frequently; in any case, the current (as of 01-09-2004) 1.0-5328 patch shouldn't trigger the condition.

caffeine 01-25-04 03:29 AM

My crashes have been getting more frequent. My last crash was such that it set the graphics card in a state where my system wouldn't post.

So, I started unplugging monitors, and discovered that the left monitor's plug (the one with the flickering) will actually spark when it touchs the PC case. I discovered this by accidently holding the plug and touching the grounded case and getting quite a shock. Is this normal? Should the monitor plug be carrying enough voltage to throw a spark? Maybe that's been screwing with my card? Well, I've unplugged that monitor so I'll find out I guess.

[edit]I just discovered the fan on my north bridge wasn't running! Perhaps this is my problem?[/edit]

In my log files now:

Code:

Badness in pci_find_subsys at drivers/pci/search.c:132

caffeine 01-25-04 03:43 PM

Still no stablity. I'm trying the open source "nv" driver to see if I can narrow it down to the nvidia stuff.

zander 01-25-04 04:17 PM

The pci_find_subsys warning messages are after-the-fact symptoms of severe error conditions. There are numerous possible error sources, most common among them broken AGP configurations (hardware, software or both), ACPI and APIC bugs (again in hardware, software or both), fbdev drivers (vesafb, rivafb), insufficient cooling and defective RAM; unfortunately, there are many more. The best I can suggest is that you try to identify the root cause of the failures you're experiencing via experimentation, i.e. check if disabling AGP helps, check if disabling ACPI helps, ... .

caffeine 01-26-04 12:35 AM

Hi Zander. I appreciate your help.

Using the nv module has been stable overnight. My latest theory is that's something to do with 2.6.1 / 2.6.2 + nvidia module as 2.6.0 had been quite stable. (I'm currently on 2.6.1 btw)

I've tried setting NvAGP to 0 and 1 to no effect, I've also tried enabling/disabling Side band addressing and Fastwrites using

#options nvidia NVreg_EnableAGPSBA=1 NVreg_EnableAGPFW=1

I've read about a boot option "mem=nopentium" something about 4k vs 4M pages, but I'm not sure it's needed - it didn't seem to help.

I've seen two methods of killing ACPI, "pci=noacpi" & "acpi=off", although I'm not sure which is the correct method. I don't have any ACPI or APCI compiled into my kernel.

For what it's worth, I've got an ABit KT7A-RAID.

zander 01-26-04 04:47 AM

The XFree86 nv driver interacts very differently with both the installed graphics hardware and the operating system; if you find it to be stable and don't require features it doesn't offer, there's no reason why you shouldn't use it (aside, possibly, from performance considerations), but the fact that it works (or doesn't work) allows no conclusions to be drawn with respect to the nvidia driver.

Both AGP SBA and FW are frequently causing stability problems, you will want to make sure they are disabled (/proc/driver/nvidia/agp/status), which they are by default. Setting NvAgp to "0" prohibits use of the AGP port; you will want to do this for now. The mem=nopentium kernel parameter was an interim workaround for a Linux kernel bug and is no longer required or even desirable with recent kernels; don't pass the parameter to Linux 2.6 kernels. pci=noacpi disables ACPI PCI IRQ routing, acpi=off disables ACPI support alltogether; if your kernel wasn't configured to feature ACPI support, neither option will modify system behavior. Check /proc/interrupts to determine if an APIC is used; if so, the noapic kernel parameter will disable it.

These are only some of the possible error sources, however, and while I named a few others (fbdev, other hardware problems, ...), you should browse the archives of this and similar forums to get additional suggestions.

caffeine 01-27-04 02:14 PM

I've been testing my card's performance under Windows 2000. It's very unstable, with plenty of Errors in the log files regarding True Vector Engine.

I thought it could be a power problem, but I've unplugged 2 extra hard drives and removed my TV in card, and there are still errors.

Is it possible the card is faulty?


All times are GMT -5. The time now is 11:20 PM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.