[Novalug] need help: server freezing -- How to troubleshoot

Jay Hart jhart at kevla.org
Fri Oct 23 08:59:04 EDT 2009


The only thing I saw was this:

Oct 22 17:38:09 fileserver kernel: [   29.076525] ata3.00: failed to set max
address
(err_mask=0x1)


Don't know if it has anything to do with anything (though it might).

I'm wondering if maybe somehow your CPU is being heavily taxed and its
"appears" for all intent purposes your box is locked up.

Install gkrellm and monitor the CPU usage (visually) to see if your CPU
utilization spikes.

In summary (following Jerry W suggestion):

1. Install gkrellm
2. Monitor CPU utilization
3. Report findings to list

Jay

> Jay, here is my /var/log/messages. ugh it's big.
> i was running the system with just the two raid-1 drives last night
> (oct 22), then it locked, so i powered it down and went to bed. turned
> it back on this morning (oct 23) around 6am. anyone looking at this
> log should start there and work backwards to the previous boot, i
> guess.
>
>
> On Thu, Oct 22, 2009 at 21:56, Jay Hart <jhart at kevla.org> wrote:
>> Richard,
>>
>> Please post your messages file here.  Paul has a good idea, but hardware
>> problems are not always captured in log files if the *Sg&S(# PC locks up
>> prior
>> to entry being written.
>>
>> I have successfully troubleshot hundreds of PCs, and the first thing I
>> always
>> try to do is go bare bones and see if problem still exists, then start
>> adding
>> one thing back into the system at a time.  Used this type of method in the
>> Nuclear Navy to great effect.
>>
>> So post your messages file, so we can look it over.  Dmesg on startup would
>> be
>> nice too.  If you post the DMESG file, go with a full up configured PC.
>>
>> Jay
>>
>>> Richard Ertel wrote:
>>>> *sigh*
>>>>
>>>> ok, so my fileserver is locking up. seems to always happen, anywhere
>>>> from 1 minute to 4 hours after booting.  if i disconnect all four SATA
>>>> hard drives (all for storage) and just have the boot drive (PATA)
>>>> connected, it seems to stay up indefinitely.
>>>>
>>>> i've ran the SATA drives that i thought were problematic through
>>>> Seagate's SeaTools, and they passed all tests.
>>>>
>>>> i've looked through /var/log/messages for entries when the lockup
>>>> occurred, but nothing looks odd (to me, what do i know?)
>>>>
>>>> can anyone tell me where to start troubleshooting to get to the bottom of
>>>> this?
>>>>
>>>> Ubuntu Server 8.04.3, all updates as of this morning.
>>>
>>> Rich Ertel,
>>>
>>>       On the one hand, the responses that you have received from Jay Hart
>>> and
>>> Gerald Williams are not bad, indeed, they are good ideas. On the other
>>> hand, that is not the best way to troubleshoot. IMO, blindly guessing as
>>> to the cause (of a problem) is rarely the best way to troubleshoot. When
>>> troubleshooting, you should always _first_ attempt to generate
>>> diagnostic information (DI), and, in order to do that, you must identify
>>> the tools (e.g., software packages) that will help you to generate DI.
>>> Some of these tools are built into a standard Linux distribution, but
>>> others must be installed. I cannot recall the names of any such tools
>>> for HDD's and HDD controllers, but I am certain that they exist.
>>>
>>> Sincerely,
>>> Paul Bain
>>>
>>> _______________________________________________
>>> Novalug mailing list
>>> Novalug at calypso.tux.org
>>> http://calypso.tux.org/mailman/listinfo/novalug
>>>
>>
>>
>> _______________________________________________
>> Novalug mailing list
>> Novalug at calypso.tux.org
>> http://calypso.tux.org/mailman/listinfo/novalug
>>
>





More information about the Novalug mailing list