PDA

View Full Version : Nvidia drivers crash smp systems?


Pages : [1] 2

Mathman
02-15-03, 12:29 PM
Hi, I'm just wondering if anyone else has problems with the nvidia drivers crashing their smp system? They don't crash my system right away mind you, initially things are great, but then at random times my machine will just lock up. I Can't ssh to it, ctrl-alt-backspace doesn't work, nothing. I already tried contacting nvidia and they just directed me to here. So if anyone has any suggestions I would greatly appriciate it.

mpannen
02-15-03, 04:32 PM
I installed the latest driver (4191?) from the source code, and it would freeze up on me as well. It was almost random in a sense when it would go bad. Sometimes it would go after 5 minutes, sometimes after 1/2 hour. I bet this happened about 8 or 9 times within a half a day after installing the drivers. :mad:

I then switched back to the previous version of drivers. Those worked for a while longer before crashing. :(

My video card is a gf4 ti4200 64mb, my mobo is an MSI-6380, and I have AMD Athlon at 1.4Ghz. I am using Mandrake 9.0, and the original Kernel that came with that (no patches).

I would describe the freezes as a bunch of random colored pixels filling the whole screen. It reminds me of the time I overclocked my card a bit too much in windows and everything got "corrupted". This appears to be a hard freeze. If I hit the reset button, the screen is still funky when I reboot. In order to get it back to normal I have to actually turn the computer off and back on again (reset alone no good). :confused:

I would appreciate any comments that anyone has. Also, can someone tell me how to get the default drivers back? If I recall, the install of nvidia drivers changed some links to system opengl object files that will have to be removed, and some entries in the XF86Config-4 file need to be changed. I can get the entries right in the config file, I am just worried about the opengl files. I would like to get default drivers back to check stability with those.

Mathman
02-15-03, 05:13 PM
Man, nvidia needs to open source their drivers and let some real Linux coders take over.

Noth
02-15-03, 06:08 PM
I've been running the 4191s on a Dual Athlon for some time now with very few problems, have over 40 days uptime currently and that includes playing Q3 regularly.

Are you using any additional kernel patches like low-latency or preempt?

mpannen
02-16-03, 12:02 PM
No patches like that that I know of. I did a normal Mandrake install that I know of. How could I check to make sure that stuff is not installed?

Exner
02-16-03, 03:56 PM
Originally posted by mpannen
No patches like that that I know of. I did a normal Mandrake install that I know of. How could I check to make sure that stuff is not installed?

If you are trying to make sure you are using NVIDIA binary drivers, make sure after everything else you do a force reinstall of the NVIDIA_GLX-...rpm.

If you are trying to rid yourself of NVIDIA drivers, remove both NVIDIA packages (kernel, and GLX) and force reinstall the Mesa packages.

Dont forget to adjust your XF86Config file as well.

michael
02-17-03, 07:19 AM
Hi there,

I explored frequent crashes while using the nvidia agp module. using agpgart solved the problem.

Regards

Michael

disney
02-17-03, 12:27 PM
I use agpgart on my smp 2.4.20 debian system, but that crashes consistently throughout the day as well.

I've got an Asus CUVX4-D board with a VIA VT82C693A/694x [Apollo PRO133x] chip, and a GeForce 4 Ti4400.

I didn't notice this many crashes when using the older versions of the drivers, I'm pretty sure of that.

Any ideas?


BTW, I've got 2 PIII 1GHz cpus, in case that matters.

michael
02-18-03, 12:42 AM
*hm* did you install the source rpm's ? it muste be _something_ kernel related.

Michael

disney
02-18-03, 06:49 AM
I've stripped just about everything from my kernel that I didn't need at this point and it doesn't seem to do much.

Only recent changes are the card itself and the new drivers. I've also added the HIGHMEM portion of code to my kernel, but the crashes were happening prior to that (I'm sure since I purchased the memory about 1-2 weeks after the video card).

I just rolled back the driver to 3123 for a while to see what happens.

Noth
02-18-03, 10:43 PM
I'm using 2.4.20 with XFS patches and the 4191s running Dual Athlons and still have no issues. I actually just played Q3 for a while this evening with no lockups. I also just looked and it appears I'm using the nVidia AGP driver.

My card is a GF3 though, and I don't have a ****ty VIA chipset either.

disney
02-19-03, 06:44 AM
I'm using 3123 now and still have lockups.

Mind you, I don't have them when playing 3D games, only at random times (although a lot of the time I'm using Mozilla).

I'm going to try some more stuff and see what happens.

Any help would be appreciated.

thx

michael
02-19-03, 07:18 AM
do you use RenderAccell "true" ?

disney
02-19-03, 06:19 PM
Well, I was, but not after your reply. :)

I'll run it for a little while and see how it performs.

disney
02-21-03, 06:46 AM
No difference. Still locking up. Yesterday it locked up about 5 times while performing normal operations.

Please help!

michael
02-21-03, 07:03 AM
I am running 2 AMD MP1900+ with MP760 Chipset andTI4200.

WHat I did to get it running is the following :

Bios setting :

4x AGP enabled SBA enabled

Kernel Boot Params :

mem=nopentium acpi=off

- The kernel is manually configured and installed over the SUSE8.1 stock kernel

- NVIDA_SRPMS are used and compiled against the customized kernel

- AGPGART is used instead of NVAGP
- RenderAccel is false

Maybe ist worth to set acpi=off. I have discovered a lot of problems with that. Not only on AMD or SMP Machines even with "normal" intel boxes and stuff like sound or network cards

Michael

disney
02-21-03, 10:57 AM
I've got 2 PIII's with a VIA chipset m-board and a Ti4400.

I'm also using the .tar.gz version of the drivers.

BIOS:
- 2x AGP (for some reason I can't get anything faster than this - believe it's a known VIA issue)
- Don't have a way of setting sba, but it looks like it's disabled

Kernel:
- mce=on
- manually configured over Debian Woody
- AGPGART

I don't have ACPI compiled in the kernel - would passing that option make a difference?

I'll try the acpi change and see what happens.

disney
03-10-03, 06:24 PM
I'm not having any luck with these drivers and there's obviously no support to be found from Nvidia.

I've tried just about every combination of BIOS settings and kernel parameters with no luck. Unfortunately, there doesn't seem to be a way to debug this with no logs to speak of.

I'm going to switch to a card that's a little more friendly to Linux...

Kriston
03-11-03, 12:33 PM
I have had the same problem on all of my SMP systems. The problem has something to do with AGP cache and SMP. Windows XP will do the same thing. I now use a PCI card in the SMP system. I have found that the speed difference is not great for the simulations that we do.

You might try turning off the AGP accel to see if it makes any difference.

eric42
03-11-03, 12:53 PM
I've posted about my problems a couple of times here and found little help. I'm using 2.4.18-3SMP and I've managed to somewhat stabilize the system. Sometimes I can ssh in and restart X to fix the problem, sometimes I can't. I'm almost certain this is an Xwindows/Nvidia Driver Problem not so much a kernel issue. The longest I've kept the system up so far is 21 days. (Average is less than a week.) Xwindows crashes (without taking the system with it) an average of 3-5 times a day.
Also there appears to be a memory leak in my system somewhere. (I suspect video) I have 2GB in RAM and in just a day or two the system will take up ~1.7GB of it with no programs running......:eek: If anyone has any way of tracking down the leak, I could appreciate it. Thanks,
Eric

I'm using
Red Hat 7.3 - 2.4.18-3 SMP Kernel
AGPART (4x AGP)
acpi=off
4191 Drivers (From source)
Xfree 4.2.1
KDE 3


System :cool:
Dell Precision 530
2 x 2.2GHz Xeons
2GB RAM
Quadro 750 (64 MB)
IDE Harddrives (no SCSI's)

<rant>For the record, the Linux Support from Nvidia SUCKS. These drivers provide horrible performance on my system. My OpenGL apps (not games) don't work worth a $#!+ and a simple 2D screen refresh can take almost 10 seconds. If I didn't get an improvment in processing times compared to windows I would have already gone back. This is supposed to be a professional video board, but the performance so far has been that of a unrehearsed amateur. </rant>

bwkaz
03-11-03, 01:15 PM
Originally posted by eric42
I have 2GB in RAM and in just a day or two the system will take up ~1.7GB of it with no programs running......:eek: OK, how are you finding this out? From /proc/meminfo? Or from free? Or from ps/top/something like that?

Linux uses physical RAM to cache parts of the filesystem, so that simple things like listing a large directory twice don't take inordinately long the second time through (I don't know if you've noticed, but on Win98 this obviously doesn't happen -- when you open up an Explorer window in a large directory twice, it takes just as long the second time through). This cache is the first candidate for release when a program needs RAM, though, so it doesn't cause issues.

/proc/meminfo, and any programs based on it (like ps and top) can mislead people into thinking there are huge memory leaks because it reports that cached FS data's space as used. free has a "+/- buffers/cache" line that has entries for "free" and "used" -- the "free" here is the total physical unused RAM (usually very little) plus the cache size (since it's effectively free to application programs), so it's a better indication.

If you're totalling up something like the VSZ column in ps, that's WAY off. That's just the total virtual memory allocated; certain programs (like X) use a ton of it to store off-screen pixmaps and the like. But the RSS and %MEM columns are what are important -- these are the amounts of actual physical memory allocated to different processes.

disney
03-11-03, 01:54 PM
Thing is, this didn't start until I got my Ti4400. I had a TNT2 card and that worked fine for the last few years.

eric42
03-13-03, 09:17 AM
Thanks for the memory help, it was the buffers and cache that was "taking" up all my memory.
If anyone has a good idea as to how to stabilize the system as a whole I'd appreciate it. I'm willing to try different things and experiement, but I don't know were to start. I can't get at log files becasue the system crashes without any warnings (it's just a hard lockup).

Thanks,
Eric

Erik
03-13-03, 01:25 PM
try the noapic boot option.

michael
03-13-03, 05:13 PM
I have to agree on that. noapic is allways a good idea. Most of the problems I had have been solved with that. IC's, Sound, etc etc...

michael