nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   X cpu at 95% using 4496 (http://www.nvnews.net/vbulletin/showthread.php?t=21246)

rednuht 11-20-03 07:13 AM

X cpu at 95% using 4496
 
GeForce FX5600 (1.0-4496 drivers)
AMD 1.4 Thunderbird (runs at a happy < 45 degrees)
KT-133A (checked for update patches)
512mb ram
30gig hdd
twinview using two crt monitors running at 2560x960
Debian 2.2 with kernel 2.4.18-bf2.4
with agpgart as a module
proc file system
KDE 2 (but I did try gnome and the same happened)

Ok classic issue I have read about here lots of times but with no complete resolution.
Run X and after a random amount of time (can be a few minutes, or more likely a few hours to very likely a few days) becomes unresponsive, telneting in from another machine shows X is consuming 95%+ of the CPU (top), only option is to kill -9 X and start again.

I first tried to identify what program/s were causing the issue.
I tried running the Xscreensaver openGL hacks ten at a time, but I could not get it to hang reliably.
I disabled arts in KDE, I switched to gnome.
Each time the hang always happened but it was not related to time or which programs were running.
(also tried disconecting one of the monitors and twin view)

I added the mem=nopentium, I tried NVAgp 1,2,3 (renaming agpgart.o to .old to insure it was not loading when necessary)

Then I tried NVAgp 0 everything seemed fine and although slow I decided this was the only way to run stabily (ran for 6 days before I manually rebooted)

Then I read more on the forums about the KT-133 chipset and went into the BIOS and set the AGP to x2 (four is the max on this board and was the default) with
agpgart nothing changed (still hung randomly) so I renamed the .o and switched to NvAGP ran for 9 days no problem (with openGL xscreensavers going most of the time) then disater, it hung.
So I switched back to Nvagp 0 thinking agp was to blame, 8 hours later it hung again.

I tried strace on X during the hang and during a normal session a couple of times (both normal and verbose) but a diff on the files revealved nothing of interest.

I doubt there is anything else I can try with these NV drivers so my question is what can I do to help locate what is going wrong ?

Alot of people seem to be having this issue with a multitude of configurations so it is not an isolated case.

I would love to be able to see what is going on inside the process i.e. what functions are being used file handles etc and more importantly how they relate to the NVidia drivers.

any help appreciated.

Note 1 : any time I use the term hang or hung I did check remotely that X was using stuck using 95%+ fo the CPU time and that no other processes were causing any issues.

Note 2 : the Xfree86 drivers had no issue but of course had only Mesa and no twin view.

Note 3 : most of the hangs happened while xcreensaver was running, but that was becuase it was running overnight and I found it hung in the morning, both opengl and non opengl hacks were found to be in the hung state (visualy).

Note n : yes I was going for the longest post without pasting a in a file ;)

rednuht 11-24-03 10:54 AM

70%+ CPU
 
For the first time ever I saw (from a remote telnet) the X processes CPU activity at 70%(ish).
The Matrix screen saver was running at the time and was giving me 1 frame a second (if that) neither mouse movement nor keyboard interaction would exit the screensaver (running as part of xscreensaver but not a package distrubed hack).
I killed the X process before it got any higher.

I then (anoyed) switched not only the AGP back to 4x in the bios but also selected all the AGP read/write accelorators I could find !
Hung 2 hours later, but since then has been behaving quite well.

I forgot to note before, if I have one of these hangs and then kill X the kdm restarts X (get the NVidia logo etc) and then any attempt to exit X is meet with a garbaled screen although typing (carefully) root password shutdown -r now works.

Neutro 11-24-03 10:01 PM

moz?
 
When it happens to me, the mouse pointer is still functionnal, but since I can't telnet / ssh to my machine to kill X, only alt-sysrq-S/U/B does the job for me. ctrl-alt-backspace to restart X doesn't start, ctrl-alt-Fn to change vc doesn't do anything.

For me, its seems to always be when Mozilla is open, although the folks at Mozilla's Bugzilla say this can't be Moz's fault.

rednuht 11-25-03 08:52 AM

I do not use mozilla
 
I do not use mozilla, but only becuase I do not have broadband and usualy surf from a windows machine at work.

rednuht 11-25-03 09:33 AM

something else to try ? debug source
 
I was wondering if anybody had tried adding debug code to the kernel interface source that comes with the NVidia binary drivers.
(if you had looked at the code you would know why I think this a good idea)

Is it only for start up and shutdown to init/de-init ialise the nvidia binary ?

Or is it used continuasly as X(etc) request access to the GFX resources ?

And am fast running out of ideas, I doubt looking a the XFree86 code would help unless we have some idea what the problem is, to start looking in the right place.

rednuht 12-04-03 07:34 AM

ran 51 openGL programs but in windows
 
I noted that the most common time my system locks up is when xscreensaver is running.

Well I ran onpenGL screensaver (via xscreensaver) for over 48 hours with no problem (was the bouncing cow, if you are interested).

I also tried running all the openGL hacks of which out of the 51, 17 failed to display anything other that a blank window.

interestingly some were updating at at least 30fps while others barely managed 1 frame per two seconds.

screen shot here

<a href="http://www.jumpstation.co.uk/flog/openglmadnessvol1.jpg">http://www.jumpstation.co.uk/flog/openglmadnessvol1.jpg</a>

I am still investigating.

montyp 12-04-03 05:54 PM

This bug has been a problem for me since 4191. It seems particularly bad for TNT2 cards (I have a TNT2/M64). I would get it when displaying large images in Mozilla (whether scaled down with height and width tags or not) and sometimes when using gv and zooming in a lot. I could kill the X server remotely, and restart but the text mode virtual terminals would be permanently messed up until reboot...

For this reason, I've been using build 3123, which pretty much does what I need it to and doesn't have any stability problems. It's odd that this bug has been outstanding for over a year now, without any fix or mention of it in the errata... Several months ago, it was posted on the boards that nvidia had reproduced it, but it never made it into the list of things to fix. It may only affect old cards, though, that don't need the new features in the 4xxx versions anyway. Is anyone having this kind of problems with more recent cards?

mdk_mike 12-04-03 08:47 PM

Re: moz?
 
You know, that's interesting. It only seems to happen on my box when my father-in-law is browsing w/Firebird. Same thing, the mouse is still functional but X is locked.

Quote:

Originally posted by Neutro
When it happens to me, the mouse pointer is still functionnal, but since I can't telnet / ssh to my machine to kill X, only alt-sysrq-S/U/B does the job for me. ctrl-alt-backspace to restart X doesn't start, ctrl-alt-Fn to change vc doesn't do anything.

For me, its seems to always be when Mozilla is open, although the folks at Mozilla's Bugzilla say this can't be Moz's fault.


jago25_98 12-05-03 10:17 AM

possibly not helpful but worth a mention as a check:

- run top and look to see if it's "events/0" using all the CPU, if so that's a kernel bug, i think 2.6 only

rednuht 12-08-03 09:48 AM

damn eveything is working fine
 
Drat, there is nothing worse than trying to find aout why something fails and it refuses to fail.

I am now running

NvAGP = "1"
UseInt10Module ="true"
RenderAccel="true"

not only is xscreensaver running (new GL hack every 1 min) but on exiting X I get a normal console, not a random scrambles.

anyone know EXACTLY what RenderAccel option does ??

Neutro 12-10-03 10:36 AM

Concerning CPU usage...
 
Concerning CPU usage, I ran top in batch mode, dumping its output into a file, while provoking the crash with the 4349 drivers and Moz 1.1 at the time. After rebooting, examination of the dump file showed that X was using almost all the CPU.

Now with 4496 and Moz 1.4, the problem still sometimes happen, but much less often, and I can't reliably reproduce it.


All times are GMT -5. The time now is 09:43 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.