nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   General Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=27)
-   -   FC6 on Dell XPS710 H2C - OS "loses" hard drives (http://www.nvnews.net/vbulletin/showthread.php?t=90108)

doctor_octagon 04-20-07 10:33 AM

FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
Hey yall,

I inherited a crummy Dell XPS 710 - it's huge, heavy, loud, there's hardly any slots left, and a general PITA.

I installed FC6 x86_64. It uses the sata_nv driver to access the nVidia Mediashield (nforce 590, I think?) RAID -- 2 SATA drives in RAID1. Everything installed ok.

After running for a few hours the OS will lose the ability to write to the disks. The kernel errors with "journal commit I/O error" when it can't write the ext3 journal to the disk. After this any command will generally result in "end_request: I/O error, dev sda".

At this point I can't even shutdown cleanly and a hard restart is necessary.

I tried passing iiommu=soft at boot time but it didn't seem to make a difference.

Ideas?

TIA,
Doc Oc

doctor_octagon 04-23-07 02:54 PM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
Quote:

Originally Posted by doctor_octagon
Hey yall,

I inherited a crummy Dell XPS 710 - it's huge, heavy, loud, there's hardly any slots left, and a general PITA.

I installed FC6 x86_64. It uses the sata_nv driver to access the nVidia Mediashield (nforce 590, I think?) RAID -- 2 SATA drives in RAID1. Everything installed ok.

After running for a few hours the OS will lose the ability to write to the disks. The kernel errors with "journal commit I/O error" when it can't write the ext3 journal to the disk. After this any command will generally result in "end_request: I/O error, dev sda".

At this point I can't even shutdown cleanly and a hard restart is necessary.

I tried passing iiommu=soft at boot time but it didn't seem to make a difference.

Ideas?

Nobody has any ideas?? Do you need more information? If so, what? I'll be glad to oblige.

This is a REAL nuisance...
Oc

doctor_octagon 04-25-07 09:38 AM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
Quote:

Originally Posted by doctor_octagon
Nobody has any ideas?? Do you need more information? If so, what? I'll be glad to oblige.

This is a REAL nuisance...

OK, come on... is there some debugging I can enable? Another guy here loaded the i386 version of FC6 on his XPS710 and it seems to be ok. Could there be a problem with the x86_64 version of the sata_nv driver??

Tips? Ideas? Some kind of response?

-do

doctor_octagon 05-01-07 11:03 AM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
Quote:

Originally Posted by doctor_octagon
OK, come on... is there some debugging I can enable? Another guy here loaded the i386 version of FC6 on his XPS710 and it seems to be ok. Could there be a problem with the x86_64 version of the sata_nv driver??

Tips? Ideas? Some kind of response?

I'll give you 10 dollars for a verbal response. 10 dollars. Anybody want to make 10 dollars and respond verbally?

:thumbdwn:

oc

Wolfhound 05-03-07 04:54 AM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
I donīt want your 10 dollar, I do it for free :D, please post /var/log/messages and your /var/log/kern.log, to see when exactly the error happens

doctor_octagon 05-03-07 11:56 AM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
Quote:

Originally Posted by Wolfhound
I donīt want your 10 dollar, I do it for free :D, please post /var/log/messages and your /var/log/kern.log, to see when exactly the error happens

A response! I love it. :afro:

Because the OS loses the ability to write to the hard drives, none of the log files contain any information. Also, I don't have a kern.log.

What I'll try is attaching an external USB hard drive and modifying syslog.conf to write kernel debug to a file on this HDD. This should (I hope) give us some more information.

Thanks for the idea, keep 'em coming.

-doc

Wolfhound 05-04-07 01:10 AM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
Post it when you can, hope I can help you

doctor_octagon 05-07-07 11:47 AM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
1 Attachment(s)
Quote:

Originally Posted by doctor_octagon
I installed FC6 x86_64. It uses the sata_nv driver to access the nVidia Mediashield (nforce 590, I think?) RAID -- 2 SATA drives in RAID1. Everything installed ok.

After running for a few hours the OS will lose the ability to write to the disks. The kernel errors with "journal commit I/O error" when it can't write the ext3 journal to the disk. After this any command will generally result in "end_request: I/O error, dev sda".

At this point I can't even shutdown cleanly and a hard restart is necessary.

All,

I have managed to capture the death of my box using an offboard USB hard drive (attached).

I think you may be able to ignore the USB errors - the OS seems to have a hard time with the card readers in my Dell monitor - but maybe I need a better driver for the MCP55?. I'll also attach the lspci output.

Please take a look and let me know if I can collect more information which would help. This happens VERY frequently: 8-10 times a week!

Thanks,
Dr. Ock.

syslog kern.*; *.emerg - attached (file: kern.log.20070504.txt)

lspci:
Code:

00:00.0 Host bridge: nVidia Corporation Unknown device 0071 (rev c1)
00:00.1 RAM memory: nVidia Corporation Unknown device 007f (rev a1)
00:00.2 RAM memory: nVidia Corporation Unknown device 0075 (rev a1)
00:00.3 RAM memory: nVidia Corporation Unknown device 006f (rev a1)
00:00.4 RAM memory: nVidia Corporation Unknown device 00b4 (rev a1)
00:01.0 RAM memory: nVidia Corporation Unknown device 0076 (rev a1)
00:01.1 RAM memory: nVidia Corporation Unknown device 0078 (rev a1)
00:01.2 RAM memory: nVidia Corporation Unknown device 0079 (rev a1)
00:01.3 RAM memory: nVidia Corporation Unknown device 007a (rev a1)
00:01.4 RAM memory: nVidia Corporation Unknown device 007b (rev a1)
00:01.5 RAM memory: nVidia Corporation Unknown device 007c (rev a1)
00:01.6 RAM memory: nVidia Corporation Unknown device 007d (rev a1)
00:02.0 PCI bridge: nVidia Corporation Unknown device 007e (rev a2)
00:04.0 PCI bridge: nVidia Corporation Unknown device 007e (rev a2)
00:05.0 PCI bridge: nVidia Corporation Unknown device 007e (rev a2)
00:09.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a1)
00:0a.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a2)
00:0a.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2)
00:0b.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:0b.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:0d.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:0e.0 RAID bus controller: nVidia Corporation MCP55 SATA Controller (rev a2)
00:0e.1 RAID bus controller: nVidia Corporation MCP55 SATA Controller (rev a3)
00:0e.2 RAID bus controller: nVidia Corporation MCP55 SATA Controller (rev a4)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:0f.1 Audio device: nVidia Corporation MCP55 High Definition Audio (rev a2)
00:13.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
00:18.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
01:00.0 VGA compatible controller: nVidia Corporation Unknown device 0191 (rev a2)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express (rev 21)
04:04.0 PCI bridge: Digital Equipment Corporation DECchip 21153 (rev 04)
04:0a.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
05:00.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:00.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
05:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
05:02.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:02.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
05:03.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:03.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
07:00.0 VGA compatible controller: nVidia Corporation Unknown device 0191 (rev a2)


netllama 05-07-07 11:48 AM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
According to that log, you have bad sectors on the disk. In other words, you have faulty hardware.

doctor_octagon 05-07-07 11:58 AM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
Quote:

Originally Posted by netllama
According to that log, you have bad sectors on the disk. In other words, you have faulty hardware.

Which line(s) indicate that?

And since I'm running two SATA drives through an NVIDIA RAID, how do I determine if it's the hard drive (which hdd?) or the controller which is faulty?

Thanks llama,
-Ock

netllama 05-07-07 12:02 PM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
The lines that reference bad sectors. Just search on sector, and you'll find loads of them. Also, you're using DMRAID, not NVIDIA RAID.

doctor_octagon 05-07-07 02:20 PM

Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives
 
Quote:

Originally Posted by netllama
The lines that reference bad sectors. Just search on sector, and you'll find loads of them. Also, you're using DMRAID, not NVIDIA RAID.

There are lines listing I/O errors on both sda and sdb. There's no way both drives have bad sectors - this is a brand new machine. I'll run hard drive diagnostics over night and let you know tomorrow, but I've got $10 that says those errors are a result of something else - the controller maybe.

So, questions: can I get debug information from dmraid? And my problem sounds a lot like this guy's issue http://lkml.org/lkml/2006/11/14/290 including the reference to USB errors (my log shows issues with USB too). I'm also running a similar kernel - 2.6.18.1.

So, with that in mind, any other information I can collect which may help?

-Ock


All times are GMT -5. The time now is 08:44 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright Đ1998 - 2014, nV News.