Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 02-18-04, 09:10 PM   #13
tamran
Registered User
 
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
Default Removing framebuffer did it for me.

Well, 5 days of solid uptime with ACPI, APIC fast writes enabled, while I've only disabled Framebuffer (VesaFB) ... even with playing gl games such as quake2, xscreensaver and ut2004 it's running solid with no hiccups. I guess I'll have to do without bootsplash for now . I can live with that I guess.

Tamran
tamran is offline   Reply With Quote
Old 02-20-04, 07:31 PM   #14
tamran
Registered User
 
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
Default Oh Oh ... more badness.

Darnit, I had another one today. A full week up and running without a problem and then it returns. How is it going for you guys? Teknicolor? Spudman?
tamran is offline   Reply With Quote
Old 02-20-04, 10:03 PM   #15
whig
550Ti
 
Join Date: Jan 2004
Location: New Zealand
Posts: 854
Default

It has been mentioned before: "low" quality RAM plays havoc with some AMD CPUs. Run memtest86. The folks that sold me this Athlon 64 went through a few DIMMs before finding Kingmax was stable.
whig is offline   Reply With Quote
Old 02-22-04, 12:15 PM   #16
robinr
Registered User
 
Join Date: Jun 2003
Posts: 5
Default

Don't blame AMD. This happens with "Genuine" intel processors too.
robinr is offline   Reply With Quote
Old 02-22-04, 02:22 PM   #17
tamran
Registered User
 
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
Default Strange ... it was all working then boom!

OK, funny thing. I've got crashes 3 times in 3 days now. It was all running so smoothly. I again took a look through my /var/log/messages file and was surprised to see multiple incidences of the "badness found" comment all throughout the week. They happend more than once a day, however I didn't get a crash every time. To that end, I'd say disabling VesaFB was only part of the solution. The problem still exists, but disabling vesaFB solved most of the crashes. I have not changed my system configuration in any way that *should* cause a change ...

I found another interesting thing to try from the following thread:

http://www.nvnews.net/vbulletin/show...threadid=25167

In that thread, there was mention of using:

Option "IgnoreDisplayDevices" "TV"

I'm going to keep the configuration that I have and see if I find any more incidents. I'll keep posting with the results.

And for what it's worth, I've done memtest86, prime95 (torture test) and memtester for 24 hour periods several times. I'm pretty certain it's not a RAM problem. Also, using the NV driver seems to also make the problem go away, but I did not use it for more than a week.

Regards,

Tamran
tamran is offline   Reply With Quote
Old 02-23-04, 01:50 AM   #18
maro
Registered User
 
Join Date: Feb 2004
Location: Holy Roman Empire
Posts: 64
Default

Quote:
Originally posted by zander
The Badness in pci_find_subsys at drivers/pci/search.c problem really is one of many possible symptoms of common stability problems; it has been discussed here and elsewhere in the past.
accepted

Quote:
Originally posted by zander
many are Linux specific in the sense that they do not reproduce on Windows
I hope you are not trying to put the blame elsewhere, since this would not be a logical conclusion.

Quote:
Originally posted by zander
For what it's worth, it typically is UP I/O APIC configurations that are broken, SMP systems should work fine. ACPI mileage varies with the system BIOS and system software (i.e. the kernel), the latter is still not as mature as one would like it to be, the former broken more often than not. The problem with vesafb is highly configuration (hardware, BIOS, kernel, driver) dependent and supposedly got worse with recent NVIDIA driver releases, 1.0-4365 and earlier have been reported to be able to coexist with vesafb where 1.0-4496 and later releases fail; most of the fbdev related problems occur at the time of the initial startup or in response to vt switches, though. I don't know how well the vesafb vs. nvidia interaction difficulties are understood, but since neither driver is aware of the other, trouble is to be expected.
I think you slowly have to accept you have a problem. My machine is SMP, has ACPI disabled in the BIOS and no vesafb compiled in. The reg. ECC RAM has been tested intensively and the machine is otherwise rock stable.

I got the "Badness in pci_find_subsys()" problem with nvidia modules 4496, 5328, 5336 and with kernels 2.6.0, 2.6.1, 2.6.2, never with 2.4.X. I have stopped using the nvidia kernel module for now.

Reading the documentation (ie. source code) it appears the problem is triggered by the line
Code:
WARN_ON(in_interrupt());
Looks like the driver calls pci_find_subsys() from inside an interrupt on occasions which apparently it shouldn't. On the other hand, you have already stated that this happens in the recovery procedure for a problem that has already occured at that time, so it probably doesn't matter...

Proposal: would it be possible to log some information about the actual problem so that the user isn't left in the dark as to what he can troubleshoot?
maro is offline   Reply With Quote
Old 02-23-04, 04:46 AM   #19
Spudman
Registered User
 
Join Date: Jan 2004
Posts: 10
Default Re: Strange ... it was all working then boom!

Quote:
Originally posted by tamran
OK, funny thing. I've got crashes 3 times in 3 days now. It was all running so smoothly. I again took a look through my /var/log/messages file and was surprised to see multiple incidences of the "badness found" comment all throughout the week. They happend more than once a day, however I didn't get a crash every time. To that end, I'd say disabling VesaFB was only part of the solution. The problem still exists, but disabling vesaFB solved most of the crashes. I have not changed my system configuration in any way that *should* cause a change ...
I was getting the badness error many times a minute, but to date I've not had a single crash. In the end I had to comment out the offending line in search.c because it rendered syslog useless & just kept filling up the partition. I haven't tried any games yet, as it's a new machine which I'm still in the process of setting up, however I do use a GL screen-saver.

Quote:
Originally posted by maro
I got the "Badness in pci_find_subsys()" problem with nvidia modules 4496, 5328, 5336 and with kernels 2.6.0, 2.6.1, 2.6.2, never with 2.4.X. I have stopped using the nvidia kernel module for now.
If you check the source for 2.4.X, that error doesn't exist, so you'll never see it.
Spudman is offline   Reply With Quote
Old 02-23-04, 05:46 AM   #20
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default

@maro: if you read my posts carefully, you will notice that I named no more than the most common error sources; I stressed that the list is by no means complete and merely intended as a starting point for those who have absolutely no clue as to what might be causing their systems to fail. The basic idea is that problems of this kind need to be approached systematically and can have any number of root causes. If you find that disabling AGP proves ineffectual, as do any of the other suggestions, and you feel that you are experiencing a genuine driver bug, then report it to NVIDIA; nobody denied the existance of such bugs. If you would like the NVIDIA driver to be more verbose about error conditions it encounters, please request this of NVIDIA. When I claimed that a lot of the common stability problems won't be seen on Windows, I did so based on my experience, without the intent to cast blame; a number of (more or less integral) components simply don't receive the test coverage or (vendor) attention they receive on Windows (best example: AGP GART drivers).
zander is offline   Reply With Quote

Old 02-23-04, 02:19 PM   #21
maro
Registered User
 
Join Date: Feb 2004
Location: Holy Roman Empire
Posts: 64
Default

Quote:
Originally posted by Spudman
If you check the source for 2.4.X, that error doesn't exist, so you'll never see it.
good point. anyway, I really wanted to say, the machine has never frozen with 2.4.X, whereas with 2.6.X it does - once a week or once a fortnight.

Quote:
Originally posted by zander
@maro: if you read my posts carefully, you will notice that I named no more than the most common error sources; I stressed that the list is by no means complete and merely intended as a starting point for those who have absolutely no clue as to what might be causing their systems to fail. The basic idea is that problems of this kind need to be approached systematically and can have any number of root causes.
fair enough.

Quote:

If you find that disabling AGP proves ineffectual, as do any of the other suggestions, and you feel that you are experiencing a genuine driver bug, then report it to NVIDIA; nobody denied the existance of such bugs. If you would like the NVIDIA driver to be more verbose about error conditions it encounters, please request this of NVIDIA.
Sorry if I thought you were in some way affiliated to nvidia. In any case, do you know how to contact nvidia support about this topic? I have tried sending an email before via the only contact I could find on their web site, but I don't think it got anywhere.

The whole thing is beginning to look like a dead lock situation to me, where nvidia and the linux kernel developers are both denying responsibility, whereas they should get together and sort this out.

One of the reasons why I have been buying nvidia based cards in the past has been my confidence in their software development capabilities, both for windows and linux drivers. Let's see whether it was justified.
maro is offline   Reply With Quote
Old 02-24-04, 07:14 PM   #22
tamran
Registered User
 
Join Date: Feb 2004
Location: Ft. Myers, FL
Posts: 67
Default Again!

ok, the following:

Option "IgnoreDisplayDevices" "TV"

Didn't seem to change things. I got yet another crash today. Same "badness" found with a load of nvidia stuff. Alas....

I'll keep posted if anything changes.

I "think" I've been noticing a trend with the crashes happening and my cpu temp being right on 60 deg C. I'm not totally sure about this, but the last three times it's happend was right when the bios kicked in the fans. I'll use lm_sensors (which pretty much blasts my fans at full speed .. LOUD) and see if that changes. One thing at a time I guess.

Regards,

Tamran
tamran is offline   Reply With Quote
Old 02-25-04, 04:30 AM   #23
xaos
Registered User
 
Join Date: Feb 2004
Location: GREECE
Posts: 4
Default

I have the same problem!It must be a bug in the nvidia driver.The problem appears
on all 2.6 kernels(currently i am using 2.6.3) but not on 2.4 kernels.
My motherboard is Asus a7n266-vm with the nforce chipset and integrated GPU.
The problem has nothing to do with APIC,ACPI,framebuffer or AGP,because I have tried every possible combination!It is also independent of the NVIDIA driver version(currently 5336).The symptoms are slowdowns on 2d and 3d graphics and "badness in pci_find_subsys" messages.I have noticed that irq 11 is shared with GPU and soundcard in my system(I cannot change this,it is a motherbord limitation!).Could this have something to do with the problem?Where can i send a bug report?

Sample dmesg error:


badness in pci_find_subsys at drivers/pci/search.c:167
Call Trace:
[<c020b1c9>] pci_find_subsys+0xe9/0x100
[<c020b20f>] pci_find_device+0x2f/0x40
[<c020b008>] pci_find_slot+0x28/0x50
[<cecbd228>] os_pci_init_handle+0x3e/0x6d [nvidia]
[<ceb5185f>] _nv001243rm+0x1f/0x24 [nvidia]
[<cebc0edf>] _nv002881rm+0x203/0xbc0 [nvidia]
[<ceb6c8ec>] _nv004223rm+0x54/0x1e0 [nvidia]
[<cec1d34b>] _nv001532rm+0x1f/0x28 [nvidia]
[<ceb7a20a>] _nv005046rm+0x52/0x70 [nvidia]
[<cec1e58f>] _nv001614rm+0x23/0x84 [nvidia]
[<ceb3ad5d>] _nv005573rm+0x171/0x188 [nvidia]
[<cecbcd1f>] os_alloc_mem+0x51/0xa0 [nvidia]
[<ceb51839>] _nv001247rm+0x15/0x1c [nvidia]
[<ceb3987b>] _nv005631rm+0x97/0x100 [nvidia]
[<ceb81c0a>] _nv004919rm+0x3e/0x48 [nvidia]
[<ceb864ba>] _nv004950rm+0x3a/0x44 [nvidia]
[<ceb7a44c>] _nv004960rm+0x70/0x90 [nvidia]
[<ceb87673>] _nv004961rm+0x13/0x18 [nvidia]
[<ceb7a780>] _nv005068rm+0x114/0x148 [nvidia]
[<ceb7e35d>] _nv005069rm+0x31/0x3c [nvidia]
[<cec1e58f>] _nv001614rm+0x23/0x84 [nvidia]
[<cebc443c>] _nv002534rm+0x6cc/0x8bc [nvidia]
[<cebc21f8>] _nv002547rm+0x4c/0x58 [nvidia]
[<ceb46d0e>] _nv001370rm+0x2e/0xcc [nvidia]
[<ceb46d0e>] _nv001370rm+0x2e/0xcc [nvidia]
[<cec1d3b3>] _nv001558rm+0x5f/0x70 [nvidia]
[<ceb5ca32>] _nv004363rm+0x72/0x90 [nvidia]
[<ceb80e01>] _nv004556rm+0x25/0x34 [nvidia]
[<ceb935e3>] _nv004083rm+0x288b/0x313c [nvidia]
[<ceb4740a>] _nv001344rm+0x22/0x6c [nvidia]
[<ceb4740a>] _nv001344rm+0x22/0x6c [nvidia]
[<ceb4740a>] _nv001344rm+0x22/0x6c [nvidia]
[<cec1d48b>] _nv001556rm+0x5b/0x6c [nvidia]
[<ceca0d44>] _nv001803rm+0x14/0x18 [nvidia]
[<cec1d48b>] _nv001556rm+0x5b/0x6c [nvi
[<c0157e17>] __find_get_block+0x67/0x
[<c0157ebb>] __getblk+0x2b/0x60
[<c0157e17>] __find_get_block+0x67/0x
[<c0157ebb>] __getblk+0x2b/0x60
[<c01a7f6b>] is_tree_node+0x6b/0x70
[<c01a84a2>] search_by_key+0x532/0xee
[<c01b182a>] check_journal_end+0x18a/
[<c01b1e4b>] do_journal_end+0xeb/0xc8
[<c0158bc3>] __block_commit_write+0x9
[<c01593da>] generic_commit_write+0x4
[<c0197136>] reiserfs_commit_write+0x
[<ceca0d44>] _nv001803rm+0x14/0x18 [n
[<ceb47595>] _nv001338rm+0x1d/0x24 [n
[<cec866ac>] _nv005722rm+0x888/0x960
[<ceb5a267>] _nv005638rm+0x5f/0xb0 [n
[<cebffc2d>] _nv003795rm+0x309/0xaec
[<ceb6a267>] _nv004046rm+0x3a3/0x3b0
[<cec6bba7>] _nv001476rm+0x277/0x45c
[<ceb5439a>] _nv000896rm+0x4a/0x64 [n
[<ceb55bb4>] rm_isr_bh+0xc/0x10 [nvid
[<cecbab01>] nv_kern_isr_bh+0xf/0x13
[<c01242f6>] tasklet_action+0x46/0x70
[<c0124115>] do_softirq+0x95/0xa0
[<c010bc07>] do_IRQ+0x107/0x140
[<c0109e08>] common_interrupt+0x18/0x
xaos is offline   Reply With Quote
Old 02-25-04, 04:56 AM   #24
robinr
Registered User
 
Join Date: Jun 2003
Posts: 5
Default

xaos, your stack trace looks like chaos. It looks like two stack traces in parallell, one for nvidia and one for reiserfs. The latter is definitely serious.
robinr is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 11:01 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.