nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   Weird graphical trash/hard lock. 6600 AGP (http://www.nvnews.net/vbulletin/showthread.php?t=63255)

TheMoken 01-12-06 09:44 PM

Weird graphical trash/hard lock. 6600 AGP
 
So I've been having this problem for a bit... seemingly randomly, whether in a 3D game or a 2D desktop (though more quickly in the 3D) the screen will get trashed with some kinda weird repeating pattern that covers the whole screen. Shortly thereafter it takes Linux down with it. I say randomly because it takes anywhere between 10 minutes and 8 hours, in both environments.

It's happened with SBA, AGP 8x and Fast Writes, as well as nothing by AGP 4x. (Maximal and Minimal).

It's happened in 7676, and the 8 series too. I'm running 2.6.14-r4 x86-64 on an AMD64 3000+, Gentoo. I'm thinking maybe it's a heat or power issue, as my system runs pretty hot (I underclock my proc. to 1.8GHz).

What do you all think?

Can I under(clock/volt) my 6600 AGP in Linux?

whig 01-12-06 09:50 PM

Re: Weird graphical trash/hard lock. 6600 AGP
 
Monitor your gpu (and cpu) temperature. Above 70C is getting hot. Check your psu is supplying enough juice: your rail voltages in bios setup. Check your ram with memtest86 (my cpu was picky about ram).

TheMoken 01-12-06 11:55 PM

Re: Weird graphical trash/hard lock. 6600 AGP
 
RAM is all good, I've run memtest86+ several times to ensure that. I'm already underclocking my proc and it doesn't overheat (BIOS shutdown temp is 70 and it never shutdowns from the motherboard, just locks).

I'm not really sure what to do about the GPU overheating, as I'm not sure how to monitor that in Linux, but the PSU might be a problem. Which is why I'm wondering if undervolting might help me a bit...

whig 01-13-06 02:33 AM

Re: Weird graphical trash/hard lock. 6600 AGP
 
On Gentoo there is a package called nvidia-settings you can emerge for what it says and more like temperatures.

TheMoken 01-18-06 09:18 AM

Re: Weird graphical trash/hard lock. 6600 AGP
 
Well, I couldn't get nvidia-settings to compile due to using the newest xorg 7.0 (weird stuff), but I did put a new Antec 480W psu in my computer and haven't had the problem (yet). I'll post here if the problem crops up again.

TheMoken 01-18-06 07:08 PM

Re: Weird graphical trash/hard lock. 6600 AGP
 
Well, this is a doozy. So, despite my cooler CPU temps and better voltages, the problem turned out to be solely with the card. After getting fed up with the card when it froze yet again, I metaphorically ripped it out of the motherboard and gave it an inch by inch inspection. (In actuality I gently removed it and gave it a look see).

To my surprised, the fan/heatsink apparatus was mismounted on the GPU. Only one of the two pins that holds it onto the board was tight. A few attempts to fix with dental floss later =) my roommate dobbed a dot or two of superglue on the post to fix it into the right place.

A few hours later, I'm still running lock free. We'll see what happens.

whig 01-19-06 12:16 AM

Re: Weird graphical trash/hard lock. 6600 AGP
 
Quote:

Originally Posted by TheMoken
Well, I couldn't get nvidia-settings to compile due to using the newest xorg 7.0 (weird stuff), but I did put a new Antec 480W psu in my computer and haven't had the problem (yet). I'll post here if the problem crops up again.

For gentoo I put >nvidia-settings-1.0.20050729 in package.mask. Good to see you solved the problem.

TheMoken 02-02-06 11:48 AM

Re: Weird graphical trash/hard lock. 6600 AGP
 
*sigh* not quite so fast...

I know it's been about two weeks. But this is really starting to get weird. I've recompiled the whole system since then (not because I'm a gentoo fanatic or anything, but because I wanted to consolidate two partitions) and I'm using the modular X.org 7.0-r1 release. I've also been alternating between 7676-r1 and 8178 nvidia drivers. Also, I've moved to the latest vanilla kernel (2.6.16-rc1).

Same problem. I got nvidia-settings working and the card isn't overheating, so that's not it. My CPU is running an easy 55C after compiling for a few hours, so that's not it. I've tried both drivers, but only 8178-r2 and later will even compile for 2.6.16.

I'm starting to wonder if maybe X.Org is the problem... that seems to be the only constant here... Seems strange that nobody else would be complaining if it broke everywhere considering it's an officially released piece of software...

Could this be the case?

Also, I noticed that while the screen gets the same repeating pattern if I'm in the console, the machine is still running underneath. I can still switch consoles and execute commands and it's still pingable... Sadly, I tried to start X and it BSODed (black screen of death).

Is there anyway that I can do a memory test on the card itself? That seems to me to be the most likely cause of having the weird graphical trash dumped to the screen.... Or perhaps a way that I can force the device to do a full refresh or something?

_john_i_ 02-02-06 03:01 PM

Re: Weird graphical trash/hard lock. 6600 AGP
 
I bet you do have a heating problem on the GPU. I have a 6600GT AGP, and I had lots of similar problems and it was all due to GPU overheating.

To monitor the GPU temp (on the 6600GT anyway), you have to edit the video bios to turn on the temp sensor (why the default is disabled I don't know). Look for a windows program called "Nibitor" (it runs fine under vanilla wine if you don't have windows). Dump your BIOS out to a file with the nvflash utility, and use Nibitor to edit the temp sensor bit. Save the edited BIOS, and flash it onto your card. After that, nvidia-settings will display and monitor your GPU temperature.

I eventually solved my problem by installing a bug copper head sink/fan combo in place of the fectory one. It's run great every since.

TheMoken 02-02-06 05:16 PM

Re: Weird graphical trash/hard lock. 6600 AGP
 
But wouldn't nvidia-settings temp sensor return nothing, if the card was temp sensor challenged? I get a consisten 50C from nvidia-settings...

TheMoken 02-02-06 06:14 PM

Re: Weird graphical trash/hard lock. 6600 AGP
 
Wow. So I managed to mis-flash my video BIOS and lose the backup =P, but then I reflashed it with the PNY 6600 AGP 256MB rom (even though it's an XFX). *shudder* thank God I could blind flash.

Anyway, I had to enable the temp sensor on this rom and now it looks a little more sane, as the values fluctuate. The previous sensor (with the XFX rom) sat at 48-52, this one is chugging at 68 at the moment (although this is a RenderAccel-ed desktop).

The second post said above 70C is getting hot and the throttle down temp is 145C (don't know if I can change that), so is this a normal temp?

EDIT: Oh, and on a positive note, I've switched kernels (again) to 2.6.15-r2 and the problem hasn't reared it's ugly head.

EDIT2: Another reason I don't think it's temp: This has happened without doing anything GLX extensive for hours. In fact, I installed Vista beta 2 and Far Cry, and I'm able to play that until the cows come home... which leads me to believe it's some piece of software within Linux, rather than a hardware issue.

gleepy 02-02-06 07:19 PM

Re: Weird graphical trash/hard lock. 6600 AGP
 
Well, I had hard lockups where the sound would stutter first on my GeForce 6600 PCI-E card while flying about with X-Plane. I popped the hood on the case and touched the heatsink on the card. I nearly burnt my finger on it, so I shut off the system, pulled the video card out and found the fan motor broke. The commutator contacts made noise when I attempted to spin the fan, so here I am without a fan motor for the card, about where I was before my inspection.

Now I have to find a better heatsink/fan or maybe a better chipset.


All times are GMT -5. The time now is 08:50 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.