sata_nv issues with MCP51 SATA controller
I'm having serious disk-issues when using the on-board nvidia controller for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia chipset, cpu is intel Core2Quad)
excerpt from "lspci":
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
I have a normal IDE-disk attached to the IDE-"bus" and that works fine (/dev/hda)
However, any number of disks (I have tried 2 and 4) connected to the SATA-controller, will eventually fail. - See attached log (excerpt from /var/log/messages)
At first, disks were REALLY unstable, but then I disabled S.M.A.R.T. (both in BIOS and Linux), and I updated from the official CentOS5 (equivalent of RHEL5) kernel (2.6.18) to the latest (at that time) official kernel form kernel.org:
> uname -a
Linux mirakel 184.108.40.206-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007 i686 i686 i386 GNU/Linux
Now it will normally take a day or two before SATA crashes, so things are better, but still rather useless.
First error when sata_nv get into problems is always:
"exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
(as shown in the attached log-file.) - when this happens to one device, it'll almost instantly happen to the other disk attached to that controller as well. A couple of minutes (or so) later, the disk(s) connected to the other controller will start acting up as well (in the same manner). - I/O freezes, and nothing helps except a reboot...
As I run a rather large (software / md) RAID-5 disk array on this server (I'm doing a bit of video editing), every crash means a time-consuming rebuild of the disk-array...
I have given up on the sata_nv / nvidia-controllers for the time being.
I now resort to some old PCI-connected sata-controllers which work fine (but slow, as they are outdated and "overloaded").
So, if anyone has a good solution / suggestion / improved driver (over the one supplied with the official 220.127.116.11-kernel) I am eager to give it a go and see if the situation can be resolved.
I appreciate any sensible suggestions.