Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 08-12-06, 08:03 PM   #13
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: Crashing when SMP enabled

BTW.: applying "pci=conf1" as boot option sometimes helps
(mostly for multi GPU multi CPU setups, but you may give it
a try nevertheless).

Concerning realtime preemption, have checked the following thread
http://www.nvnews.net/vbulletin/showthread.php?t=70776
and applied the appropriate patch to the nvidia driver?

Concerning high latencies with realtime preemption: the nvidia4
SATA controller can cause latencies up to 16.6ms (seems that
there is no way to easily fix that in software). Maybe the ATA
controller is also affected.

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 08-12-06, 08:28 PM   #14
StevenChamberla
Registered User
 
Join Date: Jul 2006
Posts: 14
Default Re: Crashing when SMP enabled

Quote:
Originally Posted by nukem
I found that by disabling any parts of the mother board that I did not use helped alot. So for example I disabled the nvidia ethernet card(I use the marvel one) and Sil SATA controller(I use the nvida one). What I was finding is that the problem really comes down to a timing issue. It seems the system speeds up the clock under load. Try compiling something and look at your time(make sure you can see seconds). I put ntp on my system and all seems to be fine. O one more thing it was more of a problem on older kernels(like yours) im on the latest(2.6.17.8) and its fine.
Oh, very interesting.

Most of the motherboard features are already disabled, at one point I'd disabled everything except the built-in ethernet. Please could you explain about using the 'marvell' instead of 'nvidia ethernet'? I'm currently using the 'forcedeth' kernel driver for the built-in ethernet. I know the kernel has drivers for a Marvell PHY which is apparently featured on this motherboard but I haven't been able to figure out what it does. I only have one RJ-45 port on the back of this motherboard (it's not the 'premium' model or anything).

I've heard a lot about Athlon 64 X2 suffering from timer issues with SMP enabled, so I was curious about this. My 'dmesg' under 2.6.14 does report some timer-related errors. But under 2.6.17 I didn't but still get the problem with the 'nvidia' driver. I will try 2.6.17.8 without the realtime-preempt patch right now, since it's working for you.

Under 2.6.17 once while running a kernel compile 'make' warned of a clock slew, even though ntp isn't installed on here. I had to restart the build because some files ended up with timestamps 3000 seconds in the future. I doubt the timer error is as severe as that though, so I assume that was something else.

I'll try and monitor the system clock somehow. I can get nanoseconds from 'data %s.%N' but then I need a reliable reference. I think it's best if instead use something like 'ntpdate', put the CPU under load for some time, and then run it again to see how much offset there is.

Thanks for this... it's really helpful to hear from someone with very similar hardware.
StevenChamberla is offline   Reply With Quote
Old 08-12-06, 09:23 PM   #15
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: Crashing when SMP enabled

I've seen the vesa driver spew this warning on systems as well, and there was no instability associated with it, so I believe its just a harmless warning (or perhaps a harmless vesa driver bug):
(WW) VESA(0): Bad V_BIOS checksum

If you only have one onboard NIC, and its running with forcedeth, then you don't have a Marvell NIC.

The fact that older kernels were stable, yet newer kernels are not could suggest a kernel bug, or perhaps a BIOS bug that gets triggered by changes in newer kernels' behavior/codepaths.

I'd definitely encourage you to get a serial console setup. That will provide a good starting point to debug the instability, regardless of its source.

Thanks,
Lonni
netllama is offline   Reply With Quote
Old 08-12-06, 09:27 PM   #16
nukem
Registered User
 
Join Date: Dec 2004
Posts: 226
Default Re: Crashing when SMP enabled

hmm im not sure about the ethernet card, mine actually has two ethernet ports. Im not sure if the forcedeth causes this problem I just used the Marvell because it has Linux support for gigabit ethernet while the nForce one did not(only 100meg). If dmesg is reporting timer issues thats your problem, as soon as I got those timer issues to go away everything was fine. Im using the gentoo-sources attached is my kernel config incase you want to look at it or try it(not sure of your distro or what sources you use but you should be able to get it to work with little effort). To see how far off my clock was getting id start compiling something and then Id do this

Code:
date && time ./setclock && date
setclock is a little script I wrote

Code:
rdate -s time-b.nist.gov
hwclock --systohc --localtime
date tells you the current time, time tells you how long a program/script took to run, and setclock gets the current time from nist and sets the system time to that.

Finally about my motherboard im using the latest stable BIOS(1009) nForce ethernet is off, Sil SATA is off, and the on board sound card is off(I have an Audigy).
Attached Files
File Type: txt kconfig.txt (31.8 KB, 141 views)
__________________
AMD64 X2 4400, ASUS A8N-SLI Preimium, eVGA nvidia 7800 GT, OCZ Platinum EL-PC3200 2-3-2-5 2gig 200gig WD SATA150, SB Audigy 2 Platinum, Pioneer 108, Dell 2405W, all running on Gentoo Linux ~amd64.
nukem is offline   Reply With Quote
Old 08-12-06, 09:31 PM   #17
StevenChamberla
Registered User
 
Join Date: Jul 2006
Posts: 14
Default Re: Crashing when SMP enabled

I've got a 2.6.17.8 kernel to 'work' at the moment. It sometimes takes several hours before a crash happens though. And I also need a realtime-preemption patch, but they are available for 2.6.17 but not 2.6.17.x. Using realtime-preempt with 2.6.17 is what I had originally, where the 'nvidia' driver worked but the rest of the system was not stable.

Quote:
Originally Posted by JaXXoN
Concerning realtime preemption, have checked the following thread
http://www.nvnews.net/vbulletin/showthread.php?t=70776
and applied the appropriate patch to the nvidia driver?
Thanks for showing me this, I didn't think to check for a realtime patch to the nvidia driver itself. I'm now using this patch; this might stop the infrequent xruns in jackd that I used to suffer from, but until I get a realtime kernel working I won't know.
StevenChamberla is offline   Reply With Quote
Old 08-12-06, 10:39 PM   #18
StevenChamberla
Registered User
 
Join Date: Jul 2006
Posts: 14
Default Re: Crashing when SMP enabled

Quote:
Originally Posted by netllama
I'd definitely encourage you to get a serial console setup. That will provide a good starting point to debug the instability, regardless of its source.
Okay, I'll do that as soon as I possibly can.

I now have older kernels that crash when the 'nvidia' driver loads or don't work at all, and newer kernels that crash under load whether the driver is loaded or not. In both cases kernel output seems like the only thing to pin it down, so I'll have to find a serial cable. Hopefully the kernel will still dump the error message over the serial port even though it crashes hard (even magic SysRq breaks).

Thanks for helping.
StevenChamberla is offline   Reply With Quote
Old 08-12-06, 10:43 PM   #19
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: Crashing when SMP enabled

Quote:
Originally Posted by StevenChamberla
I only have one RJ-45 port on the back of this motherboard (it's not the 'premium' model or anything).
The "plain" A8N-SLI (no "deluxe" or "premium") has only the
nvidia on-chip ethernet device (plus an off-chip/on-board
"external" Marevell PHY). The deluxe and premium models have
an additional on-board Marvell ethernet chip (88E8001).
So you don't need to be worried using the forcedeth driver.

Quote:
Originally Posted by StevenChamberla
I've heard a lot about Athlon 64 X2 suffering from timer issues with SMP enabled
AFAIK, the boot option "idle=poll" should fix that, i.e. without this
option, i somtimes had strange "time warp" effects while playing
ut2004.

BTW.: if 2.6.13 and earlier kernels worked ok for you, then you should
definitly try "pci=conf1": 2.6.14 and later changed the default pci
configuration method to "mmconfig", which doesn't seem to work
very well for some mainboards.

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 08-12-06, 10:53 PM   #20
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: Crashing when SMP enabled

Quote:
Originally Posted by StevenChamberla
this might stop the infrequent xruns in jackd that I used to suffer from
Is it possible to determine the worst case latency (i.e. in micro
or milliseconds) with jackd? Otherwise "cyclictest" gives some
accurate numbers, please check
http://www.tglx.de/projects/misc/cyclictest/

The situation is that even with PAT support enabled (see other post),
the worst case latency for a high priority task should always stay well
below one millisecond. So if you have high latencies even if not using
any 3D applications, then i suspect some kind of hardware problem.

If you have a PCI IDE controller laying around, you may try that one
instead of the on-board nforce4 IDE controller, just to check if it makes
a difference.

regards

Bernhard
JaXXoN is offline   Reply With Quote

Old 08-13-06, 12:17 AM   #21
StevenChamberla
Registered User
 
Join Date: Jul 2006
Posts: 14
Default Re: Crashing when SMP enabled

Quote:
Originally Posted by JaXXoN
Is it possible to determine the worst case latency (i.e. in micro
or milliseconds) with jackd? Otherwise "cyclictest" gives some
accurate numbers, please check
http://www.tglx.de/projects/misc/cyclictest/
I'm trying to use this tool now but the value for 'Max' just rapidly increases, into 6 figures and beyond. I didn't find any documentation but I'm guessing it's not supposed to do that.

Luckily I've got 2.6.17-rt8 to work for now, and it hasn't crashed yet after several minutes, which is good going. I'm using 'noirqbalancer pci=conf1 idle=poll' as boot options but I haven't tried this exact build without those options.

I guess whatever problems I've had getting the 'nvidia' driver working on older kernels, probably don't matter if this kernel does everything properly.

The only bad thing about this kernel are the occaisional latencies. jackd suffers xruns coinciding with heavy disk access (eg. running 'sync'). The realtime patch for the 'nvidia' patch is probably working, and although 'glxgears' slows things down a lot it doesn't seem to trigger any of these latencies.

Quote:
Originally Posted by JaXXoN
So if you have high latencies even if not using any 3D applications, then i suspect some kind of hardware problem.
With my previous setup the 'nvidia' driver may have been causing some of the latencies but now I'm only suspicious of the disk controller (an old Compaq SCSI RAID controller with the 'cpqarray' kernel driver) or maybe just a configuration issue as this is a fresh Linux install.

I triggered a 4000 millisecond xrun by running 'sync' but I haven't been able to repeat that. If I'm not doing anything on the machine I don't get any xruns, but setting a compile going can cause xruns of 10 to 500 millseconds, so I'll investigate that.

Quote:
Originally Posted by JaXXoN
If you have a PCI IDE controller laying around, you may try that one
instead of the on-board nforce4 IDE controller, just to check if it makes
a difference.
I don't actually use IDE disks now, or SATA. I have a CD-ROM drive on the second IDE channel but other than that they're not used. SATA is disabled in the BIOS.
StevenChamberla is offline   Reply With Quote
Old 08-13-06, 11:00 AM   #22
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: Crashing when SMP enabled

Quote:
Originally Posted by StevenChamberla
I'm trying to use this tool now but the value for 'Max' just rapidly increases, into 6 figures and beyond.
This typically means that the High Resolution Timer support (HRT)
is not enabled (kernel configuration option).

BTW.: can you please check if jackd is actually working at real
time priority? Download http://www.tglx.de/projects/misc/rtnice/rtnice.tar.bz2
and compile it with "rm rtnice ; gcc -W -Wall -O2 -o rtnice rtnice.c".
Then do something like "rtnice -p `pidof jackd`". If it says
"scheduler is TS", then jackd doesn't run at realtime priority.

I'm not familiar with jackd, but i guess that any processes supplying
jackd with audio data also need to run at real time priority?
Measn: even if jackd runs a high priority, but the processes
supplying jackd do not, then it's quite clear that there might
be underuns under heavy disk load.

You may use "rtnice" to give all sound related applications a
realtime priority, then check if there are still high latencies.

Attention: make sure not give a process that sucks up 100%
CPU time a real time priority: your machine will lock up because
the X-Server, bash or whatever won't get any processing time
any more.

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 08-13-06, 11:46 PM   #23
StevenChamberla
Registered User
 
Join Date: Jul 2006
Posts: 14
Default Re: Crashing when SMP enabled

Quote:
Originally Posted by JaXXoN
This typically means that the High Resolution Timer support (HRT)
is not enabled (kernel configuration option).
Enabling this seems to have had no effect, the 'Max' figure is at zero for about 50 iterations then rapidly increases.

Quote:
Originally Posted by JaXXoN
BTW.: can you please check if jackd is actually working at real
time priority?
Oh thanks, I've been trying to find a tool like 'rtnice'. I've applied the necessary patches to PAM in order for users in the 'audio' group to gain realtime priority, but until now had no way to be sure they were working.

Quote:
Originally Posted by JaXXoN
Then do something like "rtnice -p `pidof jackd`".
Running that command line did initially tell me jackd was using the TS scheduler, but, I examined '/proc/`pidof jackd`/task/' and there are in fact five 'jackd' threads. Three are TS but there are also two realtime threads (one for the actual processing, and the other is a watchdog I believe).

I've done a little JACK programming recently and believe all apps using JACK I/O make use of callback, whereby 'jackd' will run the audio processing parts of your app. in a realtime thread. So the apps 'feeding' JACK don't have to be told to run in realtime. I was able to verify this by looking at XMMS's threads, and one of them (presumably the callback function for JACK) was running with realtime priority.

I'm happy to say that I'm free of any xruns so far on 2.6.16-rt29, and the 'nvidia' driver isn't giving me any of the hassle I had with earlier kernels. My problems with 2.6.17 are I think largely caused by bugs in the -rt patches, which still seem present in the latest -rt8 patch. I can hopefully do as Lonni suggested and get serial logging to work; this would allow me to report the -rt bugs to LKML or wherever is appropriate.

Also the realtime patch for the 'nvidia' driver appears to be doing the job perfectly. Thanks for pointing that out, because I really hadn't thought such a thing would exist.

As long as this kernel remains stable, it seems I'm finally back to a working system. I really appreciate everyone's help getting this to work. Thanks!
--
Steven Chamberlain
steven@pyro.eu.org
StevenChamberla is offline   Reply With Quote
Old 08-14-06, 08:14 AM   #24
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: Crashing when SMP enabled

Quote:
Originally Posted by StevenChamberla
Enabling this seems to have had no effect, the 'Max' figure is at zero for about 50 iterations then rapidly increases.
Just to be sure: did you run cyclictest as root user?

regards

Bernhard
JaXXoN is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Similar Threads
Thread Thread Starter Forum Replies Last Post
Acer Completes its Windows 8 Lineup with Touch Enabled All-in-Ones News Archived News Items 0 06-04-12 07:10 AM
OpenGL apps crashing in NVIDIA driver when switching desktop dinosaur NVIDIA Linux 0 06-01-12 08:09 PM
AGP not being enabled on Geforce2 mx frankjr NVIDIA Linux 1 09-22-02 03:01 PM
SMP system hangs with OpenGL chazmati NVIDIA Linux 9 08-26-02 10:28 AM
X freezes when agp 4x is enabled jinksed NVIDIA Linux 1 08-07-02 05:56 PM

All times are GMT -5. The time now is 04:53 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.