Go Back   nV News Forums > Linux Support Forums > General Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 04-20-07, 10:33 AM   #1
doctor_octagon
Registered User
 
Join Date: Apr 2007
Posts: 13
Question FC6 on Dell XPS710 H2C - OS "loses" hard drives

Hey yall,

I inherited a crummy Dell XPS 710 - it's huge, heavy, loud, there's hardly any slots left, and a general PITA.

I installed FC6 x86_64. It uses the sata_nv driver to access the nVidia Mediashield (nforce 590, I think?) RAID -- 2 SATA drives in RAID1. Everything installed ok.

After running for a few hours the OS will lose the ability to write to the disks. The kernel errors with "journal commit I/O error" when it can't write the ext3 journal to the disk. After this any command will generally result in "end_request: I/O error, dev sda".

At this point I can't even shutdown cleanly and a hard restart is necessary.

I tried passing iiommu=soft at boot time but it didn't seem to make a difference.

Ideas?

TIA,
Doc Oc
doctor_octagon is offline   Reply With Quote
Old 04-23-07, 02:54 PM   #2
doctor_octagon
Registered User
 
Join Date: Apr 2007
Posts: 13
Unhappy Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

Quote:
Originally Posted by doctor_octagon
Hey yall,

I inherited a crummy Dell XPS 710 - it's huge, heavy, loud, there's hardly any slots left, and a general PITA.

I installed FC6 x86_64. It uses the sata_nv driver to access the nVidia Mediashield (nforce 590, I think?) RAID -- 2 SATA drives in RAID1. Everything installed ok.

After running for a few hours the OS will lose the ability to write to the disks. The kernel errors with "journal commit I/O error" when it can't write the ext3 journal to the disk. After this any command will generally result in "end_request: I/O error, dev sda".

At this point I can't even shutdown cleanly and a hard restart is necessary.

I tried passing iiommu=soft at boot time but it didn't seem to make a difference.

Ideas?
Nobody has any ideas?? Do you need more information? If so, what? I'll be glad to oblige.

This is a REAL nuisance...
Oc
doctor_octagon is offline   Reply With Quote
Old 04-25-07, 09:38 AM   #3
doctor_octagon
Registered User
 
Join Date: Apr 2007
Posts: 13
Angry Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

Quote:
Originally Posted by doctor_octagon
Nobody has any ideas?? Do you need more information? If so, what? I'll be glad to oblige.

This is a REAL nuisance...
OK, come on... is there some debugging I can enable? Another guy here loaded the i386 version of FC6 on his XPS710 and it seems to be ok. Could there be a problem with the x86_64 version of the sata_nv driver??

Tips? Ideas? Some kind of response?

-do
doctor_octagon is offline   Reply With Quote
Old 05-01-07, 11:03 AM   #4
doctor_octagon
Registered User
 
Join Date: Apr 2007
Posts: 13
Default Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

Quote:
Originally Posted by doctor_octagon
OK, come on... is there some debugging I can enable? Another guy here loaded the i386 version of FC6 on his XPS710 and it seems to be ok. Could there be a problem with the x86_64 version of the sata_nv driver??

Tips? Ideas? Some kind of response?
I'll give you 10 dollars for a verbal response. 10 dollars. Anybody want to make 10 dollars and respond verbally?



oc
doctor_octagon is offline   Reply With Quote
Old 05-03-07, 04:54 AM   #5
Wolfhound
Moving to new home
 
Wolfhound's Avatar
 
Join Date: Jan 2005
Location: Spain
Posts: 1,358
Send a message via MSN to Wolfhound Send a message via Yahoo to Wolfhound
Default Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

I don´t want your 10 dollar, I do it for free , please post /var/log/messages and your /var/log/kern.log, to see when exactly the error happens
__________________
Intel Quad Core Q6600 @ 3,4 | 4Gigs (2x2) OCZ Reaper PC-8500 | Gigabyte P35-DS3R | Western Digital 500G Caviar SE16 SATA2 NCQ | Seagate160 Gig 7200.9 SATA NCQ | Maxtor 300Gig SATA NCQ | Asus Geforce 260GTX | Samsung Syncmaster 930BF 19'' | BeQuiet Straight Power 700W | Cooler Master CM690 |Windows Vista x64 SP2 |Windows XP SP3 [Deceased PSU, no money for new one, stucked with laptop and XBOX 360]

Macbook Pro 13': 4gigs | Core2 Duo 2,53Mhz | 250Gigs HDD |geforce 9400M| aluminium unibody | Mac OS X 10.6.2 Snow Leopard |gentoo Linux 2.631 kernel


Wolfhound´s Brute: http://wolfhound77.mybrute.com
Wolfhound is offline   Reply With Quote
Old 05-03-07, 11:56 AM   #6
doctor_octagon
Registered User
 
Join Date: Apr 2007
Posts: 13
Default Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

Quote:
Originally Posted by Wolfhound
I don´t want your 10 dollar, I do it for free , please post /var/log/messages and your /var/log/kern.log, to see when exactly the error happens
A response! I love it.

Because the OS loses the ability to write to the hard drives, none of the log files contain any information. Also, I don't have a kern.log.

What I'll try is attaching an external USB hard drive and modifying syslog.conf to write kernel debug to a file on this HDD. This should (I hope) give us some more information.

Thanks for the idea, keep 'em coming.

-doc
doctor_octagon is offline   Reply With Quote
Old 05-04-07, 01:10 AM   #7
Wolfhound
Moving to new home
 
Wolfhound's Avatar
 
Join Date: Jan 2005
Location: Spain
Posts: 1,358
Send a message via MSN to Wolfhound Send a message via Yahoo to Wolfhound
Default Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

Post it when you can, hope I can help you
__________________
Intel Quad Core Q6600 @ 3,4 | 4Gigs (2x2) OCZ Reaper PC-8500 | Gigabyte P35-DS3R | Western Digital 500G Caviar SE16 SATA2 NCQ | Seagate160 Gig 7200.9 SATA NCQ | Maxtor 300Gig SATA NCQ | Asus Geforce 260GTX | Samsung Syncmaster 930BF 19'' | BeQuiet Straight Power 700W | Cooler Master CM690 |Windows Vista x64 SP2 |Windows XP SP3 [Deceased PSU, no money for new one, stucked with laptop and XBOX 360]

Macbook Pro 13': 4gigs | Core2 Duo 2,53Mhz | 250Gigs HDD |geforce 9400M| aluminium unibody | Mac OS X 10.6.2 Snow Leopard |gentoo Linux 2.631 kernel


Wolfhound´s Brute: http://wolfhound77.mybrute.com
Wolfhound is offline   Reply With Quote
Old 05-07-07, 11:47 AM   #8
doctor_octagon
Registered User
 
Join Date: Apr 2007
Posts: 13
Default Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

Quote:
Originally Posted by doctor_octagon
I installed FC6 x86_64. It uses the sata_nv driver to access the nVidia Mediashield (nforce 590, I think?) RAID -- 2 SATA drives in RAID1. Everything installed ok.

After running for a few hours the OS will lose the ability to write to the disks. The kernel errors with "journal commit I/O error" when it can't write the ext3 journal to the disk. After this any command will generally result in "end_request: I/O error, dev sda".

At this point I can't even shutdown cleanly and a hard restart is necessary.
All,

I have managed to capture the death of my box using an offboard USB hard drive (attached).

I think you may be able to ignore the USB errors - the OS seems to have a hard time with the card readers in my Dell monitor - but maybe I need a better driver for the MCP55?. I'll also attach the lspci output.

Please take a look and let me know if I can collect more information which would help. This happens VERY frequently: 8-10 times a week!

Thanks,
Dr. Ock.

syslog kern.*; *.emerg - attached (file: kern.log.20070504.txt)

lspci:
Code:
00:00.0 Host bridge: nVidia Corporation Unknown device 0071 (rev c1)
00:00.1 RAM memory: nVidia Corporation Unknown device 007f (rev a1)
00:00.2 RAM memory: nVidia Corporation Unknown device 0075 (rev a1)
00:00.3 RAM memory: nVidia Corporation Unknown device 006f (rev a1)
00:00.4 RAM memory: nVidia Corporation Unknown device 00b4 (rev a1)
00:01.0 RAM memory: nVidia Corporation Unknown device 0076 (rev a1)
00:01.1 RAM memory: nVidia Corporation Unknown device 0078 (rev a1)
00:01.2 RAM memory: nVidia Corporation Unknown device 0079 (rev a1)
00:01.3 RAM memory: nVidia Corporation Unknown device 007a (rev a1)
00:01.4 RAM memory: nVidia Corporation Unknown device 007b (rev a1)
00:01.5 RAM memory: nVidia Corporation Unknown device 007c (rev a1)
00:01.6 RAM memory: nVidia Corporation Unknown device 007d (rev a1)
00:02.0 PCI bridge: nVidia Corporation Unknown device 007e (rev a2)
00:04.0 PCI bridge: nVidia Corporation Unknown device 007e (rev a2)
00:05.0 PCI bridge: nVidia Corporation Unknown device 007e (rev a2)
00:09.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a1)
00:0a.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a2)
00:0a.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2)
00:0b.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:0b.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:0d.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:0e.0 RAID bus controller: nVidia Corporation MCP55 SATA Controller (rev a2)
00:0e.1 RAID bus controller: nVidia Corporation MCP55 SATA Controller (rev a3)
00:0e.2 RAID bus controller: nVidia Corporation MCP55 SATA Controller (rev a4)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:0f.1 Audio device: nVidia Corporation MCP55 High Definition Audio (rev a2)
00:13.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
00:18.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
01:00.0 VGA compatible controller: nVidia Corporation Unknown device 0191 (rev a2)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express (rev 21)
04:04.0 PCI bridge: Digital Equipment Corporation DECchip 21153 (rev 04)
04:0a.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
05:00.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:00.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
05:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
05:02.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:02.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
05:03.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
05:03.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
07:00.0 VGA compatible controller: nVidia Corporation Unknown device 0191 (rev a2)
Attached Files
File Type: txt kern.log.20070504.txt (15.8 KB, 216 views)
doctor_octagon is offline   Reply With Quote

Old 05-07-07, 11:48 AM   #9
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

According to that log, you have bad sectors on the disk. In other words, you have faulty hardware.
netllama is offline   Reply With Quote
Old 05-07-07, 11:58 AM   #10
doctor_octagon
Registered User
 
Join Date: Apr 2007
Posts: 13
Default Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

Quote:
Originally Posted by netllama
According to that log, you have bad sectors on the disk. In other words, you have faulty hardware.
Which line(s) indicate that?

And since I'm running two SATA drives through an NVIDIA RAID, how do I determine if it's the hard drive (which hdd?) or the controller which is faulty?

Thanks llama,
-Ock
doctor_octagon is offline   Reply With Quote
Old 05-07-07, 12:02 PM   #11
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

The lines that reference bad sectors. Just search on sector, and you'll find loads of them. Also, you're using DMRAID, not NVIDIA RAID.
netllama is offline   Reply With Quote
Old 05-07-07, 02:20 PM   #12
doctor_octagon
Registered User
 
Join Date: Apr 2007
Posts: 13
Default Re: FC6 on Dell XPS710 H2C - OS "loses" hard drives

Quote:
Originally Posted by netllama
The lines that reference bad sectors. Just search on sector, and you'll find loads of them. Also, you're using DMRAID, not NVIDIA RAID.
There are lines listing I/O errors on both sda and sdb. There's no way both drives have bad sectors - this is a brand new machine. I'll run hard drive diagnostics over night and let you know tomorrow, but I've got $10 that says those errors are a result of something else - the controller maybe.

So, questions: can I get debug information from dmraid? And my problem sounds a lot like this guy's issue http://lkml.org/lkml/2006/11/14/290 including the reference to USB errors (my log shows issues with USB too). I'm also running a similar kernel - 2.6.18.1.

So, with that in mind, any other information I can collect which may help?

-Ock
doctor_octagon is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 09:24 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright ©1998 - 2014, nV News.