nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   7600 GT is NOT identified correctly, possible cause of MASSIVE crashes (http://www.nvnews.net/vbulletin/showthread.php?t=94419)

quantumsummers 07-06-07 11:55 AM

7600 GT is NOT identified correctly, possible cause of MASSIVE crashes
 
2 Attachment(s)
Hi,

I have an XFX 7600GT XXX PCI-e GPU. Currently I am running on the old 87.76 drivers, and they work fine.

However, I have been dealing with a major issue when attempting to upgrade.

The crux of the matter is that about 1 of 50 times the new drivers (100.14.06 to .11) will work. The other 49 I am greeted with massive screen distortion which locks the X server and requires a hard reset if I don't switch to a virtual terminal _before_ the login sequence finishes. Regardless, if I switch to a VT the screen distorts but I am still able to enter commands at the CLI.

I have narrowed the issue down a bit to what I think may be the cause.

I notice that the 1/50 times it woks the card is identified by the driver as G73, however, when it crashes it is identified as G70.

The odd thing is that I can't seem to get any output in the logs that give much of a clue to what is happening. I will attach the bug report you request & some other log files. Also I have started a thread at Gentoo's forums concerning this issue ( http://forums.gentoo.org/viewtopic-t-567150.html ) There I have posted the only other clue besides the mis-identification of the card.

I also have questions concerning a few modules loaded during X initialization.
ramdac: sometimes it loads other times nvidia says it doesn't need it.
wfb: sometimes it loads other times nvidia says it can't load it.

Regardless, this needs to be fixed, whatever the cause. I am a system builder/admin, and I have always recommended nvidia cards. I have many of these cards in other production & mission critical systems, and I am afraid to upgrade due to the issues I am having on the development machine responsible for producing the images for ALL the other machines. These troubles are shaking my belief in your company's ability to produce reliable hardware and drivers. Also I was planning on developing using the CUDA library set & these issues are terribly troubling.

Please, help me to help you and together we can wade through this mess.

Regards,

M. Summers

netllama 07-06-07 12:00 PM

Re: 7600 GT is NOT identified correctly, possible cause of MASSIVE crashes
 
If your hardware is not being correctly identified, its most likely due to a hardware problem, not a driver bug. Are you seeing this problem with more than one set of hardware or just on a single system?

Also note that CUDA requires a G8X GPU, and will not run natively with a GeForce 7600GT.

quantumsummers 07-11-07 10:45 AM

Re: 7600 GT is NOT identified correctly, possible cause of MASSIVE crashes
 
3 Attachment(s)
netllama,

In response to your suggestion of hardware issues I installed another identical card.

The same MASSIVE crashes & hardlocks continue. Therefore, I do not believe that the hardware is at fault. In addition to this I built a new kernel using gentoo sources for 2.6.22, which resulted in the same issue. This is unacceptable behavior. As a side note, using this same card/drivers, it is running stable on an x86 workstation. Perhaps there is a driver issue with x86_64.

I will attach the most recent bug report as well as an excerpt from /var/log/messages and Xorg.0.log when this occurs. There are interesting error fragments that may lead to a solution.

Regards,

Summers

EDIT: One more thing to note, the correct chipset label is G70 as reported by pciutils on many different machines, why then is the nvidia driver identifying it as G73?

netllama 07-11-07 10:51 AM

Re: 7600 GT is NOT identified correctly, possible cause of MASSIVE crashes
 
In this bug report I see:

(WW) NVIDIA(0): WAIT (0, 6, 0x8000, 0x000007c8, 0x000007c8)
(EE) NVIDIA(0): Error recovery failed.
(EE) NVIDIA(0): *** Aborting ***
(WW) NVIDIA(0): The NVIDIA X driver has encountered too many errors. Falling
(WW) NVIDIA(0): back to legacy PCI mode.
(WW) NVIDIA(0): WAIT (0, 6, 0x8000, 0x00006da8, 0x00006da8)
(WW) NVIDIA(0): WAIT (0, 6, 0x8000, 0x0000d0b0, 0x0000d0b0)
(WW) NVIDIA(0): WAIT (0, 7, 0x8000, 0x00000058, 0x00000058)
(WW) NVIDIA(0): WAIT (0, 6, 0x8000, 0x00000068, 0x00000068)
(WW) NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000008c, 0x0000008c)
(EE) NVIDIA(0): Error recovery failed.
(EE) NVIDIA(0): *** Aborting ***
(EE) NVIDIA(0): Error recovery failed.
(EE) NVIDIA(0): *** Aborting ***
(WW) NVIDIA(0): WAIT (0, 6, 0x8000, 0x00000028, 0x00000028)
(WW) NVIDIA(0): WAIT (0, 7, 0x8000, 0x00000054, 0x00000054)
(WW) NVIDIA(0): WAIT (0, 6, 0x8000, 0x00000028, 0x00000028)

which looks like a different problem from what you originally reported. Those WAIT messages indicate that the nvidia driver stopped receiving interrupts from the graphics card.

Have you verified that you're using the latest motherboard BIOS?
Does this problem persist with a kernel.org kernel as well?

zander 07-11-07 11:26 AM

Re: 7600 GT is NOT identified correctly, possible cause of MASSIVE crashes
 
From what I can tell, the problem may be corrupted DMA transfers, even after the driver has fallen back to PCI mode; this is most likely a system-level problem. Is the card seated correctly?

quantumsummers 07-11-07 12:29 PM

Re: 7600 GT is NOT identified correctly, possible cause of MASSIVE crashes
 
Hi netllama & zander,

I will build a vanilla kernel to check, although the gentoo patches are fairly minimal.

As far as the BIOS, I'm running 1205 which is not the latest, but close.

1305 is stable, 1405 is beta, which do you recommend?

zander, yes the card is seated properly.

Regarding possible DMA transfer corruption, how would I check this?

Would it be beneficial for me to post my kernel config?

It is already posted on gentoo forums here:
http://forums.gentoo.org/viewtopic-t-567150.html

Someone mentioned kernel headers possibly causing an issue, but I can not see this issue on any other machine that I have control over.

Many thanks,

Summers

netllama 07-11-07 12:33 PM

Re: 7600 GT is NOT identified correctly, possible cause of MASSIVE crashes
 
I couldn't recommend a specific BIOS version, however its usually a good idea to update to the latest released version whenever there are stability problems.

quantumsummers 07-11-07 12:57 PM

Re: 7600 GT is NOT identified correctly, possible cause of MASSIVE crashes
 
I simply can't believe this, but I am now running the latest nvidia drivers cleanly. I read some random post in your forums with someone having difficulties with an 8600GT on an asus board running windows. The solution for them was to disable PEG Link in BIOS. So I tried it & its working at the moment. I had never messed with this bios setting, leaving it at normal.

So the solution for the moment is:

DISABLE PEG LINK MODE IN BIOS!!!!

So perhaps this is causing others issues as well.

I will update this post after I have some time on the driver, but so far so good.

Many thanks,

Summers


All times are GMT -5. The time now is 01:10 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.