PDA

View Full Version : Oh no....!


Subtestube
09-06-06, 05:09 PM
So, this morning I fired up SP2004 on my new C2D rig, and was _very_ disappointed when it failed the test after about 10 minutes. Coretemp, seemingly the only program that gives me believable temperature reports (everything else reports them as absurdly low: 15c idle, 25c load - Coretemp gives me 25 idle, 40 load), did not report anything out of the ordinary, and everything else seemed fine. I've had no crashes since I set up the system on Sunday, and as far as I know, all hardware was installed properly (though this is my first complete system build, so I guess I could be wrong).

Basically, I'm looking for suggestions - I guess the first thing to do is run Memtest to see if it's the memory that's failing or the CPU. If it is the CPU failing, what then? Should I RMA? Or is there a chance it's the motherboard (God forbid)? Everything is at stock - no OCing at all.

Considering the system seems stable apart from that, any ideas?

EDIT: Note, I'm at work for the next 9 hours, so won't be able to actually do anything until I get home tonight.

Bearclaw
09-06-06, 06:19 PM
Try running memtest as you suggested their. If that doesn't bring up anything out of the ordinary try testing RAM. If not, it could more than likely be a MOBO failure of some kind. I am not the best at problem solving on this kind of stuff but that's what I would suggest.

Roadhog
09-06-06, 06:23 PM
I would go for memory.

Heinz68
09-06-06, 09:02 PM
Try running memtest as you suggested their. If that doesn't bring up anything out of the ordinary try testing RAM.MemTest is the RAM reliability test.

Make sure none of the connections are loose, you can also try to clear the CMOS

Bearclaw
09-06-06, 09:08 PM
MemTest is the RAM reliability test.

Make sure none of the connections are loose, you can also try to clear the CMOS
Haha, wow, I don't know what I was thinking when I wrote that. :p

Subtestube
09-06-06, 09:35 PM
A'right guys... cheers! It's such a pain... but I guess it's better to pick up hardware errors than be ignorant of them.

ArrowMk84
09-06-06, 09:40 PM
Also, make sure your RAM voltage is high enough. The XMS CL5 should be 1.9v, but I know the CL4 is 2.1v.

Subtestube
09-07-06, 05:02 PM
Hmmm.. Weird question - is it at all possible that a rounding error could be caused by a weird software interaction? The only reason I ask is that last night I spent the evening running various diagnostics (Memtest, SP2004 on two different settings), and everything showed up fine - no errors at all. That seemed especially weird after I got an error after 10 mins yesterday morning. Even stranger though was that when I ran it this morning, it errored at almost exactly 8 AM - after an hour of Prime95, and at roughly the same 'real-time' it would've yesterday. Coincidentally, that's the time that AVG runs its timed daily scan, and I just wondered if it were possible that that was causing some kind of weird error. I have seen AVG cause bizarre problems before - it's daily scan started crashing my old computer for no apparent reason back in the day, so I wonder if it could be the cause... though I've never heard of that kind of problem anywhere else.

Does anyone know if that's even possible?

(Also Arrow - cheers for that - I actually found a post from a Corsair Memory Rep on another board suggesting that on these Gigabyte boards, the RAM should be running at 2.1v anyhow! I'm still running it at 1.9v, but if I have further problems, and can't relate them to AVG, I'll ramp up the RAM voltage a little and see if it goes away. Link: http://www.houseofhelp.com/forums/showthread.php?t=53068)

ArrowMk84
09-07-06, 06:36 PM
Oh, one more thing. There was post in HardOCP's Intel forum about some errata for the C2Ds, and someone though it could be causing rounding error they had in Prime95. This could be your issue, and if so, a BIOS update might solve your problem (that was stated in that thread).

Subtestube
09-07-06, 08:59 PM
You don't happen to have a link to that post, do you?

ArrowMk84
09-07-06, 09:55 PM
You don't happen to have a link to that post, do you?

This is it: http://www.hardforum.com/showthread.php?t=1092146

Subtestube
09-08-06, 05:26 PM
Thanks for all that help guys. I'm now beginning to wonder if it might be a heat problem. I noticed that the Zalman heatsink (9500) can actually still twist on top of the CPU, which would probably suggest that I haven't quite seated it properly (I'm not afraid to admit when I've made a mistake). Now, I would've thought that the low temps that Core Temp is reporting would indicate that heat dissapation wasn't a problem, however the heatsink feels pretty cool to touch, even when the computer is running, and I get the feeling it shouldn't be. Basically, I'm thinking that the heat isn't being properly spread across the surface because there isn't sufficient surface pressure (or possibly the thermal grease has become smeared a little unevenly _because_ of the insufficient surface pressure). That, I suppose, would allow certain parts of the CPU to run very cool (as the temps are reporting), and other parts to run hot enough to not do any damage, but cause problems such as those that occur when overclocking too high.

So, what I'm going to do later today is remove the heatsink, make sure that the thermal grease is spread evenly, and put it back on, basically - then see if SP2004 still fails (I've now confirmed that it's not AVG). If it does, I'll raise the RAM voltage as that chap in the post I linked to earlier suggested, and if that _still_ doesn't work... well.. then I guess it's RMA time, and I'll try to see if I can figure out what exactly is faulting. Again.

Any suggestions regarding that? I do appreciate all the advice I've had here, and any comments on my plan (or theories) would be very much.. um... appreciated!

Subtestube
09-10-06, 05:27 PM
If anyone is interested - it looks a lot like it's a RAM problem. With Stick 1 (arbitrarily numbered) in slot one alone, the computer is prime stable for in excess of 9 hours (which should be long enough, I think). With Stick 2 in slot one alone, the computer is prime stable for between a minute and an hour, and which point it will always fail.

So... yes. RMAing the RAM now.