Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 03-27-06, 11:04 AM   #1
tear
Registered User
 
Join Date: Nov 2004
Posts: 6
Default FC5 T3, 2.6.16, GF6200, 8178, Xorg spinning in XDAMAGE codepath

Here's what I've managed to investigate wrt Xorg eating 100% CPU problem.

The machine is just a regular desktop, it mainly serves as a development
workstation, I also browse web, read my mails, nothing fancy. Mostly used
GL app is an GLSlideshow Xscreensaver.

Sometimes it does happen that X 'hangs' eating all CPU, you probably
do know that stuff. One can login remotely and -KILL X.

FWIW I did a strace and gdb backtrace of running X. In three cases of
three X was spinning in some XDAMAGE codepath. Also, strace
looked like the process was being hit by SIGALRM continuously.
Every case was accomapnied by
NVRM: Xid: 6, PE0000 0400 ffffffff 00005bb0 00000000 03db00e3
classe message in the kernel log.

After fresh 2.6.16 boot-up I hit this three times in ca. 24 hours. Details are
in attached archive and are, I believe, self-explanatory. In case something
needs clarification - ask anyway (nvidia-bug-report.log was generated
after fresh reboot)

HTH.

Cheers,
tear
Attached Files
File Type: zip nvidia-1.zip (35.4 KB, 121 views)
tear is offline   Reply With Quote
Old 03-31-06, 03:33 PM   #2
tear
Registered User
 
Join Date: Nov 2004
Posts: 6
Default Re: FC5 T3, 2.6.16, GF6200, 8178, Xorg spinning in XDAMAGE codepath

And it seems that I have found quite reliable way to reporoduce it locally (if anyone interested),
few minutes and puff, there it goes, pretty much the same way everytime..
Will try to brew some code soon...

Anyway, maybe it's me or kind of driver setup which is causing this? Any suggestions maybe?
(this is a freshly built system, so anything's possible)


Cheers,
tear
tear is offline   Reply With Quote
Old 04-01-06, 12:44 PM   #3
tear
Registered User
 
Join Date: Nov 2004
Posts: 6
Default Re: FC5 T3, 2.6.16, GF6200, 8178, Xorg spinning in XDAMAGE codepath

Okaay, here it goes. Since I am not that much into X protocol I attached
a patch to alsamixer (from alsa-utils-1.0.11rc2). Running patched version
of alsamixer (x86/x86_64 binaries are there as well) on 80x25 xterm
results in hung X in less than a minute here.

I suspect than number of mixer 'elements' is a factor here too - the more
the better chance to hit the problem.

Anyone cares to try that?

Cheers,
tear
Attached Files
File Type: zip nvidia-2.zip (35.0 KB, 118 views)
tear is offline   Reply With Quote
Old 04-01-06, 07:29 PM   #4
tear
Registered User
 
Join Date: Nov 2004
Posts: 6
Default Re: FC5 T3, 2.6.16, GF6200, 8178, Xorg spinning in XDAMAGE codepath

Lowering AGP rate (after enabling module parameter NVreg_ReqAGPRate in os-registry.c) to 4x
kind of 'fixed' that behaviour.

Since the topic is out of interest, I'll trouble you no more.

Cheers,
tear
tear is offline   Reply With Quote
Old 04-17-06, 05:50 PM   #5
tear
Registered User
 
Join Date: Nov 2004
Posts: 6
Question Re: FC5 T3, 2.6.16, GF6200, 8178, Xorg spinning in XDAMAGE codepath

Possibly interesting update (btw, same things happen with 8756):

It seems like I am hitting MCEs along with that 'hanging' events, and some
corruption (?) occurs in the datapath, here's mcelog's output:

MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC 6051950744ff8
ADDR 3eac1048
Northbridge GART error
bit61 = error uncorrected
TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC 60597092507ac
ADDR 3eac1048
Northbridge GART error
bit61 = error uncorrected
TLB error 'generic transaction, level generic'
STATUS a40000000005001b MCGSTATUS 0

Interesting is the fact that it always is the very same kind of error
with the same memory offset.

BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)

And it seems like it's just regular memory address. I did few passess
of memtest86 some time ago and no problems were revealed....

Since I don't know the relation between northbridge and ddram
controller - can anyone sched some light on this? Can it be
a problem in northbridge <-> ddram controller interaction?
If so, why does it only happen in 8x mode? (Shouldn't AGP
speed be irrelevant here?)

I did some digging on x86-64 list but found not too much.

I am wondering if changing AGP aperture size can make a difference
here. Or excluding that area (with 3eac1048) from memory map....

Cheers,
tear


P.S.
I am not whining here
tear is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 01:42 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.