|
|
#13 | |
|
Registered User
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
|
Well, 5 days of solid uptime with ACPI, APIC fast writes enabled, while I've only disabled Framebuffer (VesaFB) ... even with playing gl games such as quake2, xscreensaver and ut2004 it's running solid with no hiccups. I guess I'll have to do without bootsplash for now
. I can live with that I guess.Tamran |
|
|
|
|
|
|
#14 | |
|
Registered User
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
|
Darnit, I had another one today. A full week up and running without a problem and then it returns. How is it going for you guys? Teknicolor? Spudman?
|
|
|
|
|
|
|
#15 |
|
550Ti
Join Date: Jan 2004
Location: New Zealand
Posts: 854
|
It has been mentioned before: "low" quality RAM plays havoc with some AMD CPUs. Run memtest86. The folks that sold me this Athlon 64 went through a few DIMMs before finding Kingmax was stable.
|
|
|
|
|
|
#16 | |
|
Registered User
Join Date: Jun 2003
Posts: 5
|
Don't blame AMD. This happens with "Genuine" intel processors too.
|
|
|
|
|
|
|
#17 |
|
Registered User
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
|
OK, funny thing. I've got crashes 3 times in 3 days now. It was all running so smoothly. I again took a look through my /var/log/messages file and was surprised to see multiple incidences of the "badness found" comment all throughout the week. They happend more than once a day, however I didn't get a crash every time. To that end, I'd say disabling VesaFB was only part of the solution. The problem still exists, but disabling vesaFB solved most of the crashes. I have not changed my system configuration in any way that *should* cause a change ...
I found another interesting thing to try from the following thread: http://www.nvnews.net/vbulletin/show...threadid=25167 In that thread, there was mention of using: Option "IgnoreDisplayDevices" "TV" I'm going to keep the configuration that I have and see if I find any more incidents. I'll keep posting with the results. And for what it's worth, I've done memtest86, prime95 (torture test) and memtester for 24 hour periods several times. I'm pretty certain it's not a RAM problem. Also, using the NV driver seems to also make the problem go away, but I did not use it for more than a week. Regards, Tamran |
|
|
|
|
|
#18 | |||
|
Registered User
Join Date: Feb 2004
Location: Holy Roman Empire
Posts: 64
|
Quote:
Quote:
Quote:
I got the "Badness in pci_find_subsys()" problem with nvidia modules 4496, 5328, 5336 and with kernels 2.6.0, 2.6.1, 2.6.2, never with 2.4.X. I have stopped using the nvidia kernel module for now. Reading the documentation (ie. source code) it appears the problem is triggered by the line Code:
WARN_ON(in_interrupt()); Proposal: would it be possible to log some information about the actual problem so that the user isn't left in the dark as to what he can troubleshoot? |
|||
|
|
|
|
|
#19 | ||
|
Registered User
Join Date: Jan 2004
Posts: 10
|
Quote:
Quote:
|
||
|
|
|
|
|
#20 |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,740
|
@maro: if you read my posts carefully, you will notice that I named no more than the most common error sources; I stressed that the list is by no means complete and merely intended as a starting point for those who have absolutely no clue as to what might be causing their systems to fail. The basic idea is that problems of this kind need to be approached systematically and can have any number of root causes. If you find that disabling AGP proves ineffectual, as do any of the other suggestions, and you feel that you are experiencing a genuine driver bug, then report it to NVIDIA; nobody denied the existance of such bugs. If you would like the NVIDIA driver to be more verbose about error conditions it encounters, please request this of NVIDIA. When I claimed that a lot of the common stability problems won't be seen on Windows, I did so based on my experience, without the intent to cast blame; a number of (more or less integral) components simply don't receive the test coverage or (vendor) attention they receive on Windows (best example: AGP GART drivers).
|
|
|
|
|
|
#21 | |||
|
Registered User
Join Date: Feb 2004
Location: Holy Roman Empire
Posts: 64
|
Quote:
anyway, I really wanted to say, the machine has never frozen with 2.4.X, whereas with 2.6.X it does - once a week or once a fortnight.Quote:
Quote:
The whole thing is beginning to look like a dead lock situation to me, where nvidia and the linux kernel developers are both denying responsibility, whereas they should get together and sort this out. One of the reasons why I have been buying nvidia based cards in the past has been my confidence in their software development capabilities, both for windows and linux drivers. Let's see whether it was justified. |
|||
|
|
|
|
|
#22 |
|
Registered User
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
|
ok, the following:
Option "IgnoreDisplayDevices" "TV" Didn't seem to change things. I got yet another crash today. Same "badness" found with a load of nvidia stuff. Alas.... I'll keep posted if anything changes. I "think" I've been noticing a trend with the crashes happening and my cpu temp being right on 60 deg C. I'm not totally sure about this, but the last three times it's happend was right when the bios kicked in the fans. I'll use lm_sensors (which pretty much blasts my fans at full speed .. LOUD) and see if that changes. One thing at a time I guess.Regards, Tamran |
|
|
|
|
|
#23 |
|
Registered User
Join Date: Feb 2004
Location: GREECE
Posts: 4
|
I have the same problem!It must be a bug in the nvidia driver.The problem appears
on all 2.6 kernels(currently i am using 2.6.3) but not on 2.4 kernels. My motherboard is Asus a7n266-vm with the nforce chipset and integrated GPU. The problem has nothing to do with APIC,ACPI,framebuffer or AGP,because I have tried every possible combination!It is also independent of the NVIDIA driver version(currently 5336).The symptoms are slowdowns on 2d and 3d graphics and "badness in pci_find_subsys" messages.I have noticed that irq 11 is shared with GPU and soundcard in my system(I cannot change this,it is a motherbord limitation!).Could this have something to do with the problem?Where can i send a bug report? Sample dmesg error: badness in pci_find_subsys at drivers/pci/search.c:167 Call Trace: [<c020b1c9>] pci_find_subsys+0xe9/0x100 [<c020b20f>] pci_find_device+0x2f/0x40 [<c020b008>] pci_find_slot+0x28/0x50 [<cecbd228>] os_pci_init_handle+0x3e/0x6d [nvidia] [<ceb5185f>] _nv001243rm+0x1f/0x24 [nvidia] [<cebc0edf>] _nv002881rm+0x203/0xbc0 [nvidia] [<ceb6c8ec>] _nv004223rm+0x54/0x1e0 [nvidia] [<cec1d34b>] _nv001532rm+0x1f/0x28 [nvidia] [<ceb7a20a>] _nv005046rm+0x52/0x70 [nvidia] [<cec1e58f>] _nv001614rm+0x23/0x84 [nvidia] [<ceb3ad5d>] _nv005573rm+0x171/0x188 [nvidia] [<cecbcd1f>] os_alloc_mem+0x51/0xa0 [nvidia] [<ceb51839>] _nv001247rm+0x15/0x1c [nvidia] [<ceb3987b>] _nv005631rm+0x97/0x100 [nvidia] [<ceb81c0a>] _nv004919rm+0x3e/0x48 [nvidia] [<ceb864ba>] _nv004950rm+0x3a/0x44 [nvidia] [<ceb7a44c>] _nv004960rm+0x70/0x90 [nvidia] [<ceb87673>] _nv004961rm+0x13/0x18 [nvidia] [<ceb7a780>] _nv005068rm+0x114/0x148 [nvidia] [<ceb7e35d>] _nv005069rm+0x31/0x3c [nvidia] [<cec1e58f>] _nv001614rm+0x23/0x84 [nvidia] [<cebc443c>] _nv002534rm+0x6cc/0x8bc [nvidia] [<cebc21f8>] _nv002547rm+0x4c/0x58 [nvidia] [<ceb46d0e>] _nv001370rm+0x2e/0xcc [nvidia] [<ceb46d0e>] _nv001370rm+0x2e/0xcc [nvidia] [<cec1d3b3>] _nv001558rm+0x5f/0x70 [nvidia] [<ceb5ca32>] _nv004363rm+0x72/0x90 [nvidia] [<ceb80e01>] _nv004556rm+0x25/0x34 [nvidia] [<ceb935e3>] _nv004083rm+0x288b/0x313c [nvidia] [<ceb4740a>] _nv001344rm+0x22/0x6c [nvidia] [<ceb4740a>] _nv001344rm+0x22/0x6c [nvidia] [<ceb4740a>] _nv001344rm+0x22/0x6c [nvidia] [<cec1d48b>] _nv001556rm+0x5b/0x6c [nvidia] [<ceca0d44>] _nv001803rm+0x14/0x18 [nvidia] [<cec1d48b>] _nv001556rm+0x5b/0x6c [nvi [<c0157e17>] __find_get_block+0x67/0x [<c0157ebb>] __getblk+0x2b/0x60 [<c0157e17>] __find_get_block+0x67/0x [<c0157ebb>] __getblk+0x2b/0x60 [<c01a7f6b>] is_tree_node+0x6b/0x70 [<c01a84a2>] search_by_key+0x532/0xee [<c01b182a>] check_journal_end+0x18a/ [<c01b1e4b>] do_journal_end+0xeb/0xc8 [<c0158bc3>] __block_commit_write+0x9 [<c01593da>] generic_commit_write+0x4 [<c0197136>] reiserfs_commit_write+0x [<ceca0d44>] _nv001803rm+0x14/0x18 [n [<ceb47595>] _nv001338rm+0x1d/0x24 [n [<cec866ac>] _nv005722rm+0x888/0x960 [<ceb5a267>] _nv005638rm+0x5f/0xb0 [n [<cebffc2d>] _nv003795rm+0x309/0xaec [<ceb6a267>] _nv004046rm+0x3a3/0x3b0 [<cec6bba7>] _nv001476rm+0x277/0x45c [<ceb5439a>] _nv000896rm+0x4a/0x64 [n [<ceb55bb4>] rm_isr_bh+0xc/0x10 [nvid [<cecbab01>] nv_kern_isr_bh+0xf/0x13 [<c01242f6>] tasklet_action+0x46/0x70 [<c0124115>] do_softirq+0x95/0xa0 [<c010bc07>] do_IRQ+0x107/0x140 [<c0109e08>] common_interrupt+0x18/0x |
|
|
|
|
|
#24 |
|
Registered User
Join Date: Jun 2003
Posts: 5
|
xaos, your stack trace looks like chaos. It looks like two stack traces in parallell, one for nvidia and one for reiserfs. The latter is definitely serious.
|
|
|
|
![]() |
| Thread Tools | |
|
|