Go Back   nV News Forums > Linux Support Forums > General Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 12-05-06, 03:58 AM   #1
domasj
Registered User
 
Join Date: Dec 2006
Posts: 11
Default nForce 4 corrupting data written to HDD

Hello,
I want to describe a serious problem which I had since I got my new PC with Asus M2NPV-VM, 2 Samsung HDDs (one SATA and one PATA). The problem is that when bigger files are written they get corrupted. Sometimes it happends even with smaller ones. The issue occurs using both 32bit and 64bit kernels 2.6.17 through 2.6.19 (some were compiled by me some stock Debian kernels). The problem is explained more broadly here, at LKML: http://lkml.org/lkml/2006/12/2/197

My lspci:
00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2)
00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2)
00:00.3 RAM memory: nVidia Corporation C51 Memory Controller 5 (rev a2)
00:00.4 RAM memory: nVidia Corporation C51 Memory Controller 4 (rev a2)
00:00.5 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.6 RAM memory: nVidia Corporation C51 Memory Controller 3 (rev a2)
00:00.7 RAM memory: nVidia Corporation C51 Memory Controller 2 (rev a2)
00:02.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:03.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:04.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:05.0 VGA compatible controller: nVidia Corporation C51PV [GeForce 6150] (rev a2)
00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2)
00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a3)
00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a3)
00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a3)
00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2)
00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev a2)
00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
04:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
04:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)

Last edited by domasj; 12-05-06 at 04:14 AM.
domasj is offline   Reply With Quote
Old 12-05-06, 10:58 AM   #2
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: nForce 4 corrupting data written to HDD

How reliably does this corruption reproduce?
Do you have a file that will become corrupted every time it is copied?

Thanks,
Lonni
netllama is offline   Reply With Quote
Old 12-05-06, 03:53 PM   #3
domasj
Registered User
 
Join Date: Dec 2006
Posts: 11
Default Re: nForce 4 corrupting data written to HDD

Although I can't reproduce it well right now, the fact is it occurs with this computer. I had a computer with nForce 2 it was working very well.
However, when I changed the computer to a newer one I transferred a PATA disk to the new computer and it didn't last a month till I get first ext3 inconsistency. When fsck passed everything were looking all right till it started to happen more frequently in the end. Some system files seemed corrupted and were being load.
Thus, I decided to do a clean debian amd64 in my second SATA disk. I have been living happy since recently it started to show the same signs. Now it boots with some scary messages appearing and it stops when it cames to starting xorg (nvidia logo flashes a few times) and a message appears that xorg startup didn't suceed. Even mplayer segfaults.
I don't have a spare HDD to corrupt right now but I will probably get one after the week end. Then I will be able to something more.
I would be glad if you looked at the mailing list I wrote about. There are some more sophisticated problem reports than mine is.

Thank you
domasj is offline   Reply With Quote
Old 12-05-06, 04:03 PM   #4
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: nForce 4 corrupting data written to HDD

I reviewed the LKML thread that you referenced and the problem descriptions there sound vastly different from what you're reporting. Your issue sounds like filesystem and/or in-memory corruption, however the issue on LKML isn't occuring on the filesystem level, but in the files themselves.

Additionally, you stated that "it boots with some scary messages" and "mplayer segfaults". It sounds like the data on your disk(s) is getting corrupted even when its not being actively written to, which is not the same issue as was reported on LKML.

At this point, the information that you've provided suggests a hardware problem (faulty RAM or disk). If you can provide information that suggests otherwise, I can look into your issue further.

I'll look into the LKML issue, however the information that you've provided here is not the same as what was reported on the LKML.

Thanks,
Lonni
netllama is offline   Reply With Quote
Old 12-05-06, 06:32 PM   #5
krader
Registered User
 
Join Date: Dec 2006
Posts: 11
Default Re: nForce 4 corrupting data written to HDD

I think it highly likely the problem reported in the LKML thread is the same problem reported here by ~domasj. Please read all my messages in that LKML thread. On my system the problem is reproducible 100% of the time. My system had exhibited odd behavior from day one. But the problems occurred extremely infrequently and typically involved symptoms such as being unable to uncompress large archives of diagnostic data sent by customers. I simply shrugged off such problems as being due to the file being corrupted before it reached me.

But eventually my VMware Windows XP guest image exhibited problems. Again, I initially shrugged it of to MS Windows being its usual flakey self. But attempts to do a scratch install kept randomly failing during the installation. So I rebuilt the VMware image on my home Linux server. I then brought that image to work and finished the install by loading IBM specific apps (e.g., Lotus Notes). I then compressed the VMware image with bzip2 and took a copy home. When I attempted to uncompress I found three files were corrupted. When I attempted to uncompress the original files on my nVidia based workstation I found it impossible to do so. At that point I started doing controlled tests. The results are documented in the LKML thread.

Feel free to contact me if you wish to investigate this further. But as a system support specialiast for the past sixteen years who has seen all manner of silent data corruption I'm pretty confident there is a nForce chipset problem.
krader is offline   Reply With Quote
Old 12-05-06, 06:42 PM   #6
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: nForce 4 corrupting data written to HDD

krader,
I read through your input on the LKML thread, and had setup a system with the same A8N-SLI Deluxe motherboard that you're using. I've been copying a single 30GB file between two SATA disks on the nFORCE SATA controller, and running a comparison of the sha512sum of the two files after each copying iteration, and haven't run into any differences thus far. I'll let it continue to run, however I should note that we had an issue exactly like this brought to our attention several months ago and after investigation it was determined to be an SBIOS bug on that particular vendor's board (it was not Asus, however).

From your LKML post, you stated "copying certain 2 GiB files would result in at least five bytes, and as many as thirty, being corrupted every single time". Can you provide me with these 2GB files, along with details on how you were detecting the corruption? If this issue is specific to the type of data in the file(s), then I'll need you or someone else experiencing this problem to provide the files that trigger the problem.

Thanks,
Lonni
netllama is offline   Reply With Quote
Old 12-06-06, 03:56 PM   #7
krader
Registered User
 
Join Date: Dec 2006
Posts: 11
Default Re: nForce 4 corrupting data written to HDD

Based on my testing, and the behavior of my system, I don't think there is much doubt that the failure is sensitive to the data pattern and overall "load" on the system. I attempted to uploaded the four files which consistently exhibit corruption when copied but the attempt failed. They range from 316 MiB to 1.1 GiB bzip2 compressed. If you give me an FTP location I'll be happy to upload them.

When you eventually get the file you'll need to uncompress them first. Note that my system had a 2.2 Ghz AMD Athlon 64 dual-core CPU. The filesystems were ext3 with no unusual options. Also, while others have reported corruption with PATA disks I was unable to do so. It appeared that the speed of the disks is a factor.

You'll also note from the LKML thread that I updated the BIOS to the current GA version. That had no noticeable effect. A coworker and I think a likely explanation is the BIOS is being too aggressive in configuring the chipset (i.e., choosing settings that would maximize performance at the expense of stability). Thus leading to exceeding the capabilities of some component. But I triple-checked that all BIOS settings were in the most conservative setting so if the BIOS is at fault then it is beyond my control.
krader is offline   Reply With Quote
Old 12-06-06, 04:04 PM   #8
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: nForce 4 corrupting data written to HDD

krader,
I'll send you a PM shortly with details on where you can upload the files.

In this thread, please detail specifically how you are reproducing the problem, along with how much time is needed to reliably & consistantly reproduce the problem.

Thanks,
Lonni
netllama is offline   Reply With Quote

Old 12-06-06, 07:52 PM   #9
krader
Registered User
 
Join Date: Dec 2006
Posts: 11
Default Re: nForce 4 corrupting data written to HDD

Below the script I was using. It should be fairly obvious how to adapt it to your configuration. In short

1) cd to the destination directory
2) copy from the source directory to the current directory
3) flush dirty pages to disk by reading a couple of large files
4) calculate md5sums for the copied files
5) compare the checksums to known correct checksums
6) if any checksums are incorrect do a byte comparison of all files

On my system this would report at least one file being corrupted on every iteration. How long this takes will depend on the speed of your system. Obviously since we don't have root cause, and therefore don't know where the problem lies, there is a risk that you won't be able to reproduce the failure. Your best chance will be to adhere as closely to my configuration as possible.

I'll upload the output of a few runs of the following script as attachments to this thread.

#!/bin/ksh
integer i=0
cd /vmware/c
while : ; do
i=i+1
cp /home/krader/WinXP/* .
cat /vmware/WinXP/Windows\ XP-f001.vmdk /vmware/WinXP/Windows\ XP-f002.vmdk > /dev/null
md5sum Windows* > /tmp/x
echo iteration $i
if diff ~/good.vmware.md5sums /tmp/x ; then
:
else
for f in Windows* ; do
echo $f
cmp -l "/home/krader/WinXP/$f" "$f"
done
fi
done
Attached Files
File Type: log cmp_sata.nv.log (6.3 KB, 223 views)
File Type: log cmp_sata.promise.log (5.1 KB, 214 views)
krader is offline   Reply With Quote
Old 12-06-06, 09:22 PM   #10
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: nForce 4 corrupting data written to HDD

It doesn't look like you've uploaded the files required to reproduce this.

-Lonni
netllama is offline   Reply With Quote
Old 12-06-06, 09:42 PM   #11
krader
Registered User
 
Join Date: Dec 2006
Posts: 11
Default Re: nForce 4 corrupting data written to HDD

Correct, I have not uploaded the files which should reproduce this. From my update five hours ago (comment #7):

I attempted to uploaded the four files which consistently exhibit corruption when copied but the attempt failed. They range from 316 MiB to 1.1 GiB bzip2 compressed. If you give me an FTP location I'll be happy to upload them.
krader is offline   Reply With Quote
Old 12-06-06, 10:51 PM   #12
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: nForce 4 corrupting data written to HDD

Right and see my comment #8
netllama is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Similar Threads
Thread Thread Starter Forum Replies Last Post
Maintain Your Privacy by Manually Accepting and Rejecting "Cookies" (nV News) MikeC Open Forum 2 02-02-13 08:15 PM
Verizon's shared data plans won't save solo users much money News Archived News Items 0 06-12-12 11:40 AM
Verizon announces 'Share Everything' plans ' the future of mobile data (sort of News Archived News Items 0 06-12-12 11:40 AM
New Paper: MPI-ACC ' An Integrated Approach to Data Movement in Accelerators News Archived News Items 0 06-02-12 04:00 AM

All times are GMT -5. The time now is 03:26 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.