|
|
#1 | |
|
Registered User
Join Date: Nov 2003
Location: Sweden
Posts: 11
|
Hi!
I'm using 5328 with minion.de patches with the 2.6.1 kernel. I'm getting a lot of Call traces, every 10 or 15 seconds. At each call trace my second monitor in my twinview configuration flickers once. Any solutions? Is this common? Any advice is welcome! Thanks /Anders Jan 11 15:34:17 gonzo kernel: Debug: sleeping function called from invalid context at mm/slab.c:1856 Jan 11 15:34:17 gonzo kernel: in_atomic():1, irqs_disabled():0 Jan 11 15:34:17 gonzo kernel: Call Trace: Jan 11 15:34:17 gonzo kernel: [<c011ce2b>] __might_sleep+0xab/0xd0 Jan 11 15:34:17 gonzo kernel: [<c01422c6>] __kmalloc+0x96/0xa0 Jan 11 15:34:17 gonzo kernel: [<e0be0cba>] os_alloc_mem+0x7a/0x90 [nvidia] Jan 11 15:34:17 gonzo kernel: [<e0a75a20>] _nv001308rm+0x10/0x28 [nvidia] Jan 11 15:34:17 gonzo kernel: [<e0b8e1bd>] _nv001518rm+0x7c9/0xb34 [nvidia] Jan 11 15:34:17 gonzo kernel: [<e0b182ac>] _nv002464rm+0x88/0x35c [nvidia] Jan 11 15:34:17 gonzo kernel: [<e0a6b595>] _nv001338rm+0x1d/0x24 [nvidia] Jan 11 15:34:17 gonzo kernel: [<e0a5c185>] _nv005601rm+0xd/0x34 [nvidia] Jan 11 15:34:17 gonzo kernel: [<e0a63868>] _nv000858rm+0x300/0xe14 [nvidia] Jan 11 15:34:18 gonzo kernel: [<c01fe956>] serial8250_interrupt+0x36/0x100 Jan 11 15:34:18 gonzo kernel: [<c010b66a>] handle_IRQ_event+0x3a/0x70 Jan 11 15:34:18 gonzo kernel: [<c010ba24>] do_IRQ+0xb4/0x130 Jan 11 15:34:18 gonzo kernel: [<c0109d88>] common_interrupt+0x18/0x20 Jan 11 15:34:18 gonzo kernel: [<e0a5fb1d>] _nv002962rm+0x2c5/0x3b8 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0a7a4d9>] _nv000899rm+0x4c9/0xf70 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0a7a4ec>] _nv000899rm+0x4dc/0xf70 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0a9e21a>] _nv005046rm+0x52/0x70 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0b4250f>] _nv001614rm+0x23/0x84 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0b4140b>] _nv001556rm+0x5b/0x6c [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0b97170>] _nv001988rm+0x1f0/0x298 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0a6b42e>] _nv001344rm+0x46/0x6c [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0b9308a>] _nv001480rm+0x3a/0x7c [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0a6b595>] _nv001338rm+0x1d/0x24 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0b4250f>] _nv001614rm+0x23/0x84 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0aa306c>] _nv004220rm+0x28/0x30 [nvidia] Jan 11 15:34:18 gonzo kernel: [<c01fe7b5>] transmit_chars+0xa5/0xd0 Jan 11 15:34:18 gonzo kernel: [<c01fe956>] serial8250_interrupt+0x36/0x100 Jan 11 15:34:18 gonzo kernel: [<c010b66a>] handle_IRQ_event+0x3a/0x70 Jan 11 15:34:18 gonzo kernel: [<c010ba24>] do_IRQ+0xb4/0x130 Jan 11 15:34:18 gonzo kernel: [<c0109d88>] common_interrupt+0x18/0x20 Jan 11 15:34:18 gonzo kernel: [<e0bd6d5f>] _nv000176rm+0x57/0x3ec [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0bd6d5f>] _nv000176rm+0x57/0x3ec [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0a6b595>] _nv001338rm+0x1d/0x24 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0a90adc>] _nv005307rm+0x54/0x544 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0b412cb>] _nv001532rm+0x1f/0x28 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0b4127c>] _nv001534rm+0x20/0x28 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0b41a72>] _nv003621rm+0x1a/0x20 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0bd6d5f>] _nv000176rm+0x57/0x3ec [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0b90623>] _nv003073rm+0x1b/0x30 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0bd6d5f>] _nv000176rm+0x57/0x3ec [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0a6b595>] _nv001338rm+0x1d/0x24 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0a90adc>] _nv005307rm+0x54/0x544 [nvidia] Jan 11 15:34:18 gonzo kernel: [<c011aa5e>] recalc_task_prio+0x8e/0x1b0 Jan 11 15:34:18 gonzo kernel: [<c0127a40>] process_timeout+0x0/0x10 Jan 11 15:34:18 gonzo kernel: [<c011baa1>] __wake_up_common+0x31/0x60 Jan 11 15:34:18 gonzo kernel: [<c0127686>] update_process_times+0x46/0x60 Jan 11 15:34:18 gonzo kernel: [<c013e9b1>] buffered_rmqueue+0xd1/0x170 Jan 11 15:34:18 gonzo kernel: [<e0a6b595>] _nv001338rm+0x1d/0x24 [nvidia] Jan 11 15:34:18 gonzo kernel: [<c01109e0>] convert_fxsr_to_user+0xc0/0x160 Jan 11 15:34:18 gonzo kernel: [<c0110c2f>] save_i387_fxsave+0xbf/0xf0 Jan 11 15:34:18 gonzo kernel: [<c0108b01>] setup_sigcontext+0xe1/0x130 Jan 11 15:34:18 gonzo kernel: [<c0108c3d>] setup_frame+0xed/0x1f0 Jan 11 15:34:18 gonzo kernel: [<e0a79be1>] rm_ioctl+0x19/0x20 [nvidia] Jan 11 15:34:18 gonzo kernel: [<e0bde5d5>] nv_kern_ioctl+0x85/0x440 [nvidia] Jan 11 15:34:18 gonzo kernel: [<c0110aa5>] convert_fxsr_from_user+0x25/0xf0 Jan 11 15:34:18 gonzo kernel: [<c0110def>] restore_i387_fxsave+0x7f/0x90 Jan 11 15:34:18 gonzo kernel: [<c0110e96>] restore_i387+0x96/0xa0 Jan 11 15:34:18 gonzo kernel: [<c01087f9>] restore_sigcontext+0x119/0x140 Jan 11 15:34:18 gonzo kernel: [<c0168e13>] sys_ioctl+0xf3/0x2a0 Jan 11 15:34:18 gonzo kernel: [<c01093c9>] sysenter_past_esp+0x52/0x71 |
|
|
|
|
|
|
#2 | |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,740
|
You can disable the warning messages with CONFIG_DEBUG_SPINLOCK_SLEEP; when did you download the 1.0-5328 patch? Are you still seing these with the patch posted on 01-09-2004?
|
|
|
|
|
|
|
#3 |
|
Registered User
Join Date: Nov 2003
Location: Sweden
Posts: 11
|
My 5328 patch is not the one posted on 01-09-2004. I'll try the new one in a few minutes.
Thanks for your help!! /Anders |
|
|
|
|
|
#4 | |
|
Registered User
Join Date: Jan 2004
Posts: 9
|
I'm having the same problem. It's making my machine quite unstable. I'm on 2.6.1 and nvidia-kernel 1.0.5328.
Code:
Jan 24 19:55:17 espresso kernel: Debug: sleeping function called from invalid context at mm/slab.c:1856 Jan 24 19:55:17 espresso kernel: in_atomic():1, irqs_disabled():0 Jan 24 19:55:17 espresso kernel: Call Trace: Jan 24 19:55:17 espresso kernel: [<c011e86c>] __might_sleep+0xac/0xe0 Jan 24 19:55:17 espresso kernel: [<c0151e4c>] __kmalloc+0x24c/0x260 Jan 24 19:55:17 espresso kernel: [<f0d2878a>] os_alloc_mem+0x5c/0x87 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0d42ba0>] _nv001308rm+0x10/0x28 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0e5b33d>] _nv001518rm+0x7c9/0xb34 [nvidia] Jan 24 19:55:17 espresso kernel: [<c01517f8>] cache_flusharray+0xd8/0x2c0 Jan 24 19:55:17 espresso kernel: [<f0d29305>] _nv005601rm+0xd/0x34 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0d309e8>] _nv000858rm+0x300/0xe14 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0abcba8>] bttv_irq+0x68/0x3e0 [bttv] Jan 24 19:55:17 espresso kernel: [<c010bdeb>] handle_IRQ_event+0x3b/0x70 Jan 24 19:55:17 espresso kernel: [<c010c494>] do_IRQ+0x1c4/0x3b0 Jan 24 19:55:17 espresso kernel: [<f0d2cc9d>] _nv002962rm+0x2c5/0x3b8 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0d47659>] _nv000899rm+0x4c9/0xf70 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0d4766c>] _nv000899rm+0x4dc/0xf70 [nvidia] Jan 24 19:55:17 espresso kernel: [<c011a778>] kernel_map_pages+0x28/0x60 Jan 24 19:55:17 espresso kernel: [<c011a778>] kernel_map_pages+0x28/0x60 Jan 24 19:55:17 espresso kernel: [<c03cb955>] kfree_skbmem+0x25/0x30 Jan 24 19:55:17 espresso kernel: [<c03cb955>] kfree_skbmem+0x25/0x30 Jan 24 19:55:17 espresso kernel: [<c011a778>] kernel_map_pages+0x28/0x60 Jan 24 19:55:17 espresso kernel: [<c011a778>] kernel_map_pages+0x28/0x60 Jan 24 19:55:17 espresso kernel: [<c03cb955>] kfree_skbmem+0x25/0x30 Jan 24 19:55:17 espresso kernel: [<c03cb955>] kfree_skbmem+0x25/0x30 Jan 24 19:55:17 espresso kernel: [<c03cb9cb>] __kfree_skb+0x6b/0xe0 Jan 24 19:55:17 espresso kernel: [<c02ce71a>] rtl8139_start_xmit+0x14a/0x2b0 Jan 24 19:55:17 espresso kernel: [<c03d0173>] dev_queue_xmit_nit+0xc3/0x120 Jan 24 19:55:17 espresso kernel: [<c011a8a8>] recalc_task_prio+0xa8/0x1d0 Jan 24 19:55:17 espresso kernel: [<f0ea3edf>] _nv000176rm+0x57/0x3ec [nvidia] Jan 24 19:55:17 espresso kernel: [<c011f2d5>] autoremove_wake_function+0x25/0x50 Jan 24 19:55:17 espresso kernel: [<c011a491>] change_page_attr+0x101/0x1c0 Jan 24 19:55:17 espresso kernel: [<f0ea3edf>] _nv000176rm+0x57/0x3ec [nvidia] Jan 24 19:55:17 espresso kernel: [<f0e0e44b>] _nv001532rm+0x1f/0x28 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0e0ebf2>] _nv003621rm+0x1a/0x20 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0ea54b8>] _nv000183rm+0x750/0x774 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0e0e44b>] _nv001532rm+0x1f/0x28 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0d38715>] _nv001338rm+0x1d/0x24 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0d5dc5c>] _nv005307rm+0x54/0x544 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0e0e44b>] _nv001532rm+0x1f/0x28 [nvidia] Jan 24 19:55:17 espresso kernel: [<c012de24>] update_process_times+0x44/0x50 Jan 24 19:55:17 espresso kernel: [<c012dc96>] update_wall_time+0x16/0x40 Jan 24 19:55:17 espresso kernel: [<c012e3a0>] do_timer+0xe0/0xf0 Jan 24 19:55:17 espresso kernel: [<c011254f>] timer_interrupt+0x8f/0x270 Jan 24 19:55:17 espresso kernel: [<c010c525>] do_IRQ+0x255/0x3b0 Jan 24 19:55:17 espresso kernel: [<c03cb758>] alloc_skb+0x48/0xf0 Jan 24 19:55:17 espresso kernel: [<c011a778>] kernel_map_pages+0x28/0x60 Jan 24 19:55:17 espresso kernel: [<c03cb758>] alloc_skb+0x48/0xf0 Jan 24 19:55:17 espresso kernel: [<c011c11a>] __wake_up_common+0x3a/0x70 Jan 24 19:55:17 espresso kernel: [<c0424907>] unix_stream_sendmsg+0x337/0x530 Jan 24 19:55:17 espresso kernel: [<c03cb955>] kfree_skbmem+0x25/0x30 Jan 24 19:55:17 espresso kernel: [<c03c708e>] sock_sendmsg+0x8e/0xb0 Jan 24 19:55:17 espresso kernel: [<c03c721e>] sock_aio_read+0xbe/0xe0 Jan 24 19:55:17 espresso kernel: [<f0d46d61>] rm_ioctl+0x19/0x20 [nvidia] Jan 24 19:55:17 espresso kernel: [<f0d25406>] nv_kern_ioctl+0x4de/0x529 [nvidia] Jan 24 19:55:17 espresso kernel: [<c03c74df>] sock_writev+0x4f/0x60 Jan 24 19:55:17 espresso kernel: [<c017243d>] do_readv_writev+0x21d/0x2c0 Jan 24 19:55:17 espresso kernel: [<c018ce27>] sys_ioctl+0x217/0x420 Jan 24 19:55:17 espresso kernel: [<c0127976>] sys_gettimeofday+0x66/0xe0 Jan 24 19:55:17 espresso kernel: [<c010a26b>] syscall_call+0x7/0xb Last edited by caffeine; 01-24-04 at 12:05 PM. |
|
|
|
|
|
|
#5 |
|
Registered User
Join Date: Jan 2004
Posts: 9
|
OK, enabling CONFIG_DEBUG_SPINLOCK_SLEEP=y seems to at least suppress error messages. Will there be any performance penalty?
|
|
|
|
|
|
#6 | |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,740
|
The warning messages will have an impact on overall system performance if they occur frequently; in any case, the current (as of 01-09-2004) 1.0-5328 patch shouldn't trigger the condition.
|
|
|
|
|
|
|
#7 |
|
Registered User
Join Date: Jan 2004
Posts: 9
|
My crashes have been getting more frequent. My last crash was such that it set the graphics card in a state where my system wouldn't post.
So, I started unplugging monitors, and discovered that the left monitor's plug (the one with the flickering) will actually spark when it touchs the PC case. I discovered this by accidently holding the plug and touching the grounded case and getting quite a shock. Is this normal? Should the monitor plug be carrying enough voltage to throw a spark? Maybe that's been screwing with my card? Well, I've unplugged that monitor so I'll find out I guess. [edit]I just discovered the fan on my north bridge wasn't running! Perhaps this is my problem?[/edit] In my log files now: Code:
Badness in pci_find_subsys at drivers/pci/search.c:132 Last edited by caffeine; 01-25-04 at 09:03 AM. |
|
|
|
|
|
#8 |
|
Registered User
Join Date: Jan 2004
Posts: 9
|
Still no stablity. I'm trying the open source "nv" driver to see if I can narrow it down to the nvidia stuff.
|
|
|
|
|
|
#9 |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,740
|
The pci_find_subsys warning messages are after-the-fact symptoms of severe error conditions. There are numerous possible error sources, most common among them broken AGP configurations (hardware, software or both), ACPI and APIC bugs (again in hardware, software or both), fbdev drivers (vesafb, rivafb), insufficient cooling and defective RAM; unfortunately, there are many more. The best I can suggest is that you try to identify the root cause of the failures you're experiencing via experimentation, i.e. check if disabling AGP helps, check if disabling ACPI helps, ... .
|
|
|
|
|
|
#10 |
|
Registered User
Join Date: Jan 2004
Posts: 9
|
Hi Zander. I appreciate your help.
Using the nv module has been stable overnight. My latest theory is that's something to do with 2.6.1 / 2.6.2 + nvidia module as 2.6.0 had been quite stable. (I'm currently on 2.6.1 btw) I've tried setting NvAGP to 0 and 1 to no effect, I've also tried enabling/disabling Side band addressing and Fastwrites using #options nvidia NVreg_EnableAGPSBA=1 NVreg_EnableAGPFW=1 I've read about a boot option "mem=nopentium" something about 4k vs 4M pages, but I'm not sure it's needed - it didn't seem to help. I've seen two methods of killing ACPI, "pci=noacpi" & "acpi=off", although I'm not sure which is the correct method. I don't have any ACPI or APCI compiled into my kernel. For what it's worth, I've got an ABit KT7A-RAID. |
|
|
|
|
|
#11 |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,740
|
The XFree86 nv driver interacts very differently with both the installed graphics hardware and the operating system; if you find it to be stable and don't require features it doesn't offer, there's no reason why you shouldn't use it (aside, possibly, from performance considerations), but the fact that it works (or doesn't work) allows no conclusions to be drawn with respect to the nvidia driver.
Both AGP SBA and FW are frequently causing stability problems, you will want to make sure they are disabled (/proc/driver/nvidia/agp/status), which they are by default. Setting NvAgp to "0" prohibits use of the AGP port; you will want to do this for now. The mem=nopentium kernel parameter was an interim workaround for a Linux kernel bug and is no longer required or even desirable with recent kernels; don't pass the parameter to Linux 2.6 kernels. pci=noacpi disables ACPI PCI IRQ routing, acpi=off disables ACPI support alltogether; if your kernel wasn't configured to feature ACPI support, neither option will modify system behavior. Check /proc/interrupts to determine if an APIC is used; if so, the noapic kernel parameter will disable it. These are only some of the possible error sources, however, and while I named a few others (fbdev, other hardware problems, ...), you should browse the archives of this and similar forums to get additional suggestions. |
|
|
|
|
|
#12 |
|
Registered User
Join Date: Jan 2004
Posts: 9
|
I've been testing my card's performance under Windows 2000. It's very unstable, with plenty of Errors in the log files regarding True Vector Engine.
I thought it could be a power problem, but I've unplugged 2 extra hard drives and removed my TV in card, and there are still errors. Is it possible the card is faulty? |
|
|
|
![]() |
| Thread Tools | |
|
|