nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   wfb-related stability problems caused by wmcpu (http://www.nvnews.net/vbulletin/showthread.php?t=118754)

pgimeno 08-31-08 09:42 AM

wfb-related stability problems caused by wmcpu
 
2 Attachment(s)
I bought a computer some time ago which came with an 8600 GS. I experienced stability problems, so I returned it. Now I've bought another one in pieces (edit: I mean in parts), including an 8600 GT. The stability problems are still there, so I've investigated them as much as I could. Here are the results.

All of them turn out to be related to a small CPU monitor applet called wmcpu. I'm using version 1.3 (link), the Debian flavour (link). My X server is also the Debian flavour, sid version. I'm using WindowMaker but that doesn't seem relevant, as I could reproduce the crashes I'm getting also with IceWM. wmcpu is designed as a WindowMaker dock app but runs in a window in other window managers.

Of course I could live without wmcpu (although it's quite useful) but I'm afraid of launching a different program some time which uses the same drawing function and that crashes my X server.

Description of the problem
I'm using version 173.14.12 of the drivers. After starting the X server, including wmcpu, everything is fine. It takes a random amount of time, ranging from a few minutes to a couple of days of uninterrupted session, to start failing. When it does, the symptoms are that the whole screen starts blinking at exactly the refresh rate of wmcpu. A look at /var/log/Xorg.0.log reveals many repetitions of this (obtained with -logverbose 6), one per blink, I assume:

(II) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
(II) NVIDIA(0): recover...
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Error recovery was successful.
...

When that happens, killing wmcpu brings the apparent stability back (when using the earier version of the driver, 173.14.09 I think, the system crashed after just a few blinks, so I couldn't even diagnose anything and I didn't even know wmcpu was the cause). If I don't kill it, switching to a console via Ctrl+Alt+F1 (in my case the consoles are in text mode, I have no framebuffer) immediately crashes the X server with a segfault. I often have to regain control of the keyboard with SysReq+R (unRaw); on rare occasions, the computer gets locked, even failing to respond to ssh, and I have to press the reset button. The same type of crashes happens if I launch wmcpu again instead of using Ctrl+Alt+F1. Sometimes the backtrace is generated and sometimes it's not; when it is, it always contains the same function in the first line: wfbCopyNtoN.

I've said apparent stability because, from then on, the system (or the card) is in a permanent failure status, in the following sense. When I launch X again, starting wmcpu immediately causes the segfault, unless I first run "nvidia-settings -a InitialPixmapPlacement=2" (seen in the performance problems thread), which instead of a segfault, causes the blink to reappear. That setting doesn't prevent the crash if I switch to a console, though. The GPU GART initialization apparently resets InitialPixmapPlacement to 1, so if I kill wmcpu and restart it without readjusting InitialPixmapPlacement, I get a segfault again.

Only a reboot makes the system stable again and wmcpu to work, until it starts failing some time later. Unloading the nvidia module with rmmod does not help. Seems like it's the card which is in a failure status.

If that matters, I always launch X via startx from the console; I have no xdm, kdm, gdm or similar. I have already tried all settings of NvAGP and everything else listed in the stability problems thread, to no avail. I have only failed to check if my MB's BIOS is the last version, but I doubt that would help.

Attached are the nvidia-bug-report.log (renamed to nvidia-bug-report_1.log) after an X crash provoked by starting wmcpu while in the failure status, and a verbose log made by forcing the blinking with the InitialPixmapPlacement=2 setting, then pressing Ctrl+Alt+F1 to force a crash.

pgimeno 09-02-08 04:50 AM

wfb-related stability problems caused by wmcpu - Update
 
I've upgraded to the Beta driver 177.70 and I get the same problems, except that now it seems that sometimes I have to run SecondLife for the blinking (flashing) to start while wmcpu is running. After a while I no longer needed to run SL to get the blinking and the symptoms are the same as in the report above.

It might be related to the System CPU indicator of wmcpu, which is drawn in red, as opposed to the User CPU one, which is drawn in green, but I'm not sure. EDIT: That seems unrelated.

I forgot to add that I also tried to use the libwfb library included with the nVidia drivers instead of the one included with the X11 drivers, but that made no difference.

Also I forgot to add that in both computers I tried, the CPU was a Phenom quad-core (9500 in the one I returned, 9550 in the newer one), running in 32-bit mode. As I said I tried all in the stability problems thread; specifically, setting maxcpus=1 didn't help. I havent yet tried with the 64-bit beta drivers. I'm not sure about the motherboard used by the former; this one is a Gigabyte MA790FX-DS5.

pgimeno 10-11-08 02:00 PM

Re: wfb-related stability problems caused by wmcpu
 
I've upgraded to the 177.80 driver and the problem is still there.

pgimeno 02-05-09 03:57 AM

Re: wfb-related stability problems caused by wmcpu
 
After upgrading to 180.22 the problem still persists.

I installed 180.27 but it made the problem much worse, so I downgraded. I don't know whether that was an unrelated problem; all I know is that when the bug was triggered it made my X session unusable, making X irresponsive for long periods and substituting my window frames with random bitmaps. That didn't happen with 177.80 and hasn't yet happened with 180.22.

The lack of response on your side even after carefully following the required steps for a bug report makes me think that it's a known hardware problem that you can't solve without replacing my card and that I should not have bought it in the first place. So, in case of lack of response I'll take the actions that I consider appropriate, which can of course be damaging for the sales of this card that I'm already regretting to have bought.


All times are GMT -5. The time now is 12:16 PM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.