View Full Version : FC6 on Dell XPS710 H2C - OS "loses" hard drives
doctor_octagon
04-20-07, 12:33 PM
Hey yall,
I inherited a crummy Dell XPS 710 - it's huge, heavy, loud, there's hardly any slots left, and a general PITA.
I installed FC6 x86_64. It uses the sata_nv driver to access the nVidia Mediashield (nforce 590, I think?) RAID -- 2 SATA drives in RAID1. Everything installed ok.
After running for a few hours the OS will lose the ability to write to the disks. The kernel errors with "journal commit I/O error" when it can't write the ext3 journal to the disk. After this any command will generally result in "end_request: I/O error, dev sda".
At this point I can't even shutdown cleanly and a hard restart is necessary.
I tried passing iiommu=soft at boot time but it didn't seem to make a difference.
Ideas?
TIA,
Doc Oc
doctor_octagon
04-23-07, 04:54 PM
Hey yall,
I inherited a crummy Dell XPS 710 - it's huge, heavy, loud, there's hardly any slots left, and a general PITA.
I installed FC6 x86_64. It uses the sata_nv driver to access the nVidia Mediashield (nforce 590, I think?) RAID -- 2 SATA drives in RAID1. Everything installed ok.
After running for a few hours the OS will lose the ability to write to the disks. The kernel errors with "journal commit I/O error" when it can't write the ext3 journal to the disk. After this any command will generally result in "end_request: I/O error, dev sda".
At this point I can't even shutdown cleanly and a hard restart is necessary.
I tried passing iiommu=soft at boot time but it didn't seem to make a difference.
Ideas?
Nobody has any ideas?? Do you need more information? If so, what? I'll be glad to oblige.
This is a REAL nuisance...
Oc
doctor_octagon
04-25-07, 11:38 AM
Nobody has any ideas?? Do you need more information? If so, what? I'll be glad to oblige.
This is a REAL nuisance...
OK, come on... is there some debugging I can enable? Another guy here loaded the i386 version of FC6 on his XPS710 and it seems to be ok. Could there be a problem with the x86_64 version of the sata_nv driver??
Tips? Ideas? Some kind of response?
-do
doctor_octagon
05-01-07, 01:03 PM
OK, come on... is there some debugging I can enable? Another guy here loaded the i386 version of FC6 on his XPS710 and it seems to be ok. Could there be a problem with the x86_64 version of the sata_nv driver??
Tips? Ideas? Some kind of response?
I'll give you 10 dollars for a verbal response. 10 dollars. Anybody want to make 10 dollars and respond verbally?
:thumbdwn:
oc
Wolfhound
05-03-07, 06:54 AM
I donīt want your 10 dollar, I do it for free :D, please post /var/log/messages and your /var/log/kern.log, to see when exactly the error happens
doctor_octagon
05-03-07, 01:56 PM
I donīt want your 10 dollar, I do it for free :D, please post /var/log/messages and your /var/log/kern.log, to see when exactly the error happens
A response! I love it. :afro:
Because the OS loses the ability to write to the hard drives, none of the log files contain any information. Also, I don't have a kern.log.
What I'll try is attaching an external USB hard drive and modifying syslog.conf to write kernel debug to a file on this HDD. This should (I hope) give us some more information.
Thanks for the idea, keep 'em coming.
-doc
Wolfhound
05-04-07, 03:10 AM
Post it when you can, hope I can help you
doctor_octagon
05-07-07, 01:47 PM
I installed FC6 x86_64. It uses the sata_nv driver to access the nVidia Mediashield (nforce 590, I think?) RAID -- 2 SATA drives in RAID1. Everything installed ok.
After running for a few hours the OS will lose the ability to write to the disks. The kernel errors with "journal commit I/O error" when it can't write the ext3 journal to the disk. After this any command will generally result in "end_request: I/O error, dev sda".
At this point I can't even shutdown cleanly and a hard restart is necessary.
All,
I have managed to capture the death of my box using an offboard USB hard drive (attached).
I think you may be able to ignore the USB errors - the OS seems to have a hard time with the card readers in my Dell monitor - but maybe I need a better driver for the MCP55?. I'll also attach the lspci output.
Please take a look and let me know if I can collect more information which would help. This happens VERY frequently: 8-10 times a week!
Thanks,
Dr. Ock.
syslog kern.*; *.emerg - attached (file: kern.log.20070504.txt)
lspci:00:00.0 Host bridge: nVidia Corporation Unknown device 0071 (rev c1)
00:00.1 RAM memory: nVidia Corporation Unknown device 007f (rev a1)
00:00.2 RAM memory: nVidia Corporation Unknown device 0075 (rev a1)
00:00.3 RAM memory: nVidia Corporation Unknown device 006f (rev a1)
00:00.4 RAM memory: nVidia Corporation Unknown device 00b4 (rev a1)
00:01.0 RAM memory: nVidia Corporation Unknown device 0076 (rev a1)
00:01.1 RAM memory: nVidia Corporation Unknown device 0078 (rev a1)
00:01.2 RAM memory: nVidia Corporation Unknown device 0079 (rev a1)
00:01.3 RAM memory: nVidia Corporation Unknown device 007a (rev a1)
00:01.4 RAM memory: nVidia Corporation Unknown device 007b (rev a1)
00:01.5 RAM memory: nVidia Corporation Unknown device 007c (rev a1)
00:01.6 RAM memory: nVidia Corporation Unknown device 007d (rev a1)
00:02.0 PCI bridge: nVidia Corporation Unknown device 007e (rev a2)
00:04.0 PCI bridge: nVidia Corporation Unknown device 007e (rev a2)
00:05.0 PCI bridge: nVidia Corporation Unknown device 007e (rev a2)
00:09.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a1)
00:0a.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a2)
00:0a.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2)
00:0b.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:0b.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:0d.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:0e.0 RAID bus controller: nVidia Corporation MCP55 SATA Controller (rev a2)
00:0e.1 RAID bus controller: nVidia Corporation MCP55 SATA Controller (rev a3)
00:0e.2 RAID bus controller: nVidia Corporation MCP55 SATA Controller (rev a4)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:0f.1 Audio device: nVidia Corporation MCP55 High Definition Audio (rev a2)
00:13.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
00:18.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
01:00.0 VGA compatible controller: nVidia Corporation Unknown device 0191 (rev a2)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express (rev 21)
04:04.0 PCI bridge: Digital Equipment Corporation DECchip 21153 (rev 04)
04:0a.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
05:00.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:00.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
05:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
05:02.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:02.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
05:03.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:03.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
07:00.0 VGA compatible controller: nVidia Corporation Unknown device 0191 (rev a2)
netllama
05-07-07, 01:48 PM
According to that log, you have bad sectors on the disk. In other words, you have faulty hardware.
doctor_octagon
05-07-07, 01:58 PM
According to that log, you have bad sectors on the disk. In other words, you have faulty hardware.
Which line(s) indicate that?
And since I'm running two SATA drives through an NVIDIA RAID, how do I determine if it's the hard drive (which hdd?) or the controller which is faulty?
Thanks llama,
-Ock
netllama
05-07-07, 02:02 PM
The lines that reference bad sectors. Just search on sector, and you'll find loads of them. Also, you're using DMRAID, not NVIDIA RAID.
doctor_octagon
05-07-07, 04:20 PM
The lines that reference bad sectors. Just search on sector, and you'll find loads of them. Also, you're using DMRAID, not NVIDIA RAID.
There are lines listing I/O errors on both sda and sdb. There's no way both drives have bad sectors - this is a brand new machine. I'll run hard drive diagnostics over night and let you know tomorrow, but I've got $10 that says those errors are a result of something else - the controller maybe.
So, questions: can I get debug information from dmraid? And my problem sounds a lot like this guy's issue http://lkml.org/lkml/2006/11/14/290 including the reference to USB errors (my log shows issues with USB too). I'm also running a similar kernel - 2.6.18.1.
So, with that in mind, any other information I can collect which may help?
-Ock
netllama
05-07-07, 04:23 PM
Perhaps its the controller, or just a dmraid bug. You'd be best to file a bug with Fedora.
doctor_octagon
05-10-07, 12:47 PM
All,
I have managed to capture the death of my box using an offboard USB hard drive (attached).
I think you may be able to ignore the USB errors - the OS seems to have a hard time with the card readers in my Dell monitor - but maybe I need a better driver for the MCP55?. I'll also attach the lspci output.
Please take a look and let me know if I can collect more information which would help. This happens VERY frequently: 8-10 times a week!
Thanks,
Dr. Ock.
syslog kern.*; *.emerg - attached (file: kern.log.20070504.txt)
lspci:<snip>
All,
Things are looking promising. After noticing that someone on the 2.6.18 kernel was having similar issues I upgraded my FC6 box to 2.6.20 and the box has been running for 3 days without experiencing the same issue.
I also disabled irqbalance at the same time, so there's two variables, but I'll try turning on irqbalance again in a few days and see how that goes. My guess is that if it's fixed (cross your fingers) it was the kernel that fixed it.
-Ock
doctor_octagon
05-14-07, 01:02 PM
All,
Things are looking promising. After noticing that someone on the 2.6.18 kernel was having similar issues I upgraded my FC6 box to 2.6.20 and the box has been running for 3 days without experiencing the same issue.
I also disabled irqbalance at the same time, so there's two variables, but I'll try turning on irqbalance again in a few days and see how that goes. My guess is that if it's fixed (cross your fingers) it was the kernel that fixed it.
-Ock
Just an update for anyone who finds this thread - my uptime is over 6 days now so I think I'm going to say it's fixed. Thanks to all who replied.
-Ock
AbdusSalam
05-26-07, 10:48 AM
The XPS 710 is loud? I have FC6 installed and it whispers silently. I'm guessing you dont have the correct graphic card driver installed, and so the graphic card fans aren't operating at the correct speed. I have two NVIDIA GeForce 7900 GS running in SLI mode.
As for the rest of the hardware in my particular XPS 710 - no problems whatsoever with FC6 - and I assume with other linux distributions as well.
Abdus Salam
doctor_octagon
05-29-07, 12:09 PM
The XPS 710 is loud? I have FC6 installed and it whispers silently. I'm guessing you dont have the correct graphic card driver installed, and so the graphic card fans aren't operating at the correct speed. I have two NVIDIA GeForce 7900 GS running in SLI mode.
As for the rest of the hardware in my particular XPS 710 - no problems whatsoever with FC6 - and I assume with other linux distributions as well.
Abdus Salam
Yes, I find it extremely loud. Mine has dual 8800s, I don't know if they're in SLI mode or not; I don't even know what SLI mode is. I assumed it was loud because of it's liquid cooled BS. I briefly looked into APCI but with no success.
Quieter would be better. I'm just glad it's not hanging on me anymore. You didn't have that problem? Maybe you're not using the (phony) NVIDIA RAID...?
F
afloyd77
06-16-07, 03:14 PM
I have an ASUS M2N32-SLI Deluxe board, downloaded the Nvidia raid drivers loaded for Linux (Fedora 6) and it stop before the drive formatting. So after that I went back to a non raid drive and loaded it that way the (sata_nv.c). then I proceeded to hook my raid drives up from windows with the dmraid tool and it shows loaded but when it does it says detects nvidia and hpt43x format using hpt43x format, that's not what i have. I do have a highpiont raid controller that i used at one time in my other machine, but i've reformmated those drives and are using them now in my raid setup. If only it use the nvidia maybe i can see and mount the raid and perhaps get it moved over to the raid drives. PLEASE anyone with any ideas or who have done this???
vBulletin® v3.7.1, Copyright ©2000-2012, Jelsoft Enterprises Ltd.