What would cause memory errors

Posted on

What would cause memory errors – A server stack is the collection of software that forms the operational infrastructure on a given machine. In a computing context, a stack is an ordered pile. A server stack is one type of solution stack — an ordered selection of software that makes it possible to complete a particular task. Like in this post about What would cause memory errors was one problem in server stack that need for a solution. Below are some tips in manage your linux server when you find problem about linux, memory, dell-poweredge, , .

We had three boxes in the span of 3 days that went down due to memory errors, two of which went down within 2 hours of each other. All the boxes showed errors like:

ECC single bit correction warning rate exceeded, ECC single bit correction failure rate exceeded.

which is pretty self explanatory. My question is is it random lock that they had issues with in a few days or can it be something environmental causing it? ON reboot one box is hanging on

Configuring memory ...Done.

The other two boxes came up after a reboot. I want to be scientific about the issue. If there is a bad DIMM should a stress test show the issue or can the issue randomly creep up?

I am running some basic test and so far everything looks clean. Shouldn’t a stress test re-produce the issue?

Update: I tested with memtest+ and it came back clean.

If several machines fail at the same time (or report significantly increased error rates) it’s either a vast coincidence, bad power, heat, or radiation.

You’ll want to check the power, temperatures and locate the errors, swap the DIMMs around a bit and check whether the errors move along with them.

Leave a Reply

Your email address will not be published.