[Novalug] need help: server freezing -- How to troubleshoot

Richard Ertel richard.ertel at gmail.com
Fri Oct 23 09:04:21 EDT 2009


gkrellm appears to be a graphical application. my server is
console-only. are there any daemon-type cpu monitors that can write to
a log file, to be checked after a lockup?

On Fri, Oct 23, 2009 at 08:59, Jay Hart <jhart at kevla.org> wrote:
> The only thing I saw was this:
>
> Oct 22 17:38:09 fileserver kernel: [   29.076525] ata3.00: failed to set max
> address
> (err_mask=0x1)
>
>
> Don't know if it has anything to do with anything (though it might).
>
> I'm wondering if maybe somehow your CPU is being heavily taxed and its
> "appears" for all intent purposes your box is locked up.
>
> Install gkrellm and monitor the CPU usage (visually) to see if your CPU
> utilization spikes.
>
> In summary (following Jerry W suggestion):
>
> 1. Install gkrellm
> 2. Monitor CPU utilization
> 3. Report findings to list
>
> Jay
>
>> Jay, here is my /var/log/messages. ugh it's big.
>> i was running the system with just the two raid-1 drives last night
>> (oct 22), then it locked, so i powered it down and went to bed. turned
>> it back on this morning (oct 23) around 6am. anyone looking at this
>> log should start there and work backwards to the previous boot, i
>> guess.
>>
>>
>> On Thu, Oct 22, 2009 at 21:56, Jay Hart <jhart at kevla.org> wrote:
>>> Richard,
>>>
>>> Please post your messages file here.  Paul has a good idea, but hardware
>>> problems are not always captured in log files if the *Sg&S(# PC locks up
>>> prior
>>> to entry being written.
>>>
>>> I have successfully troubleshot hundreds of PCs, and the first thing I
>>> always
>>> try to do is go bare bones and see if problem still exists, then start
>>> adding
>>> one thing back into the system at a time.  Used this type of method in the
>>> Nuclear Navy to great effect.
>>>
>>> So post your messages file, so we can look it over.  Dmesg on startup would
>>> be
>>> nice too.  If you post the DMESG file, go with a full up configured PC.
>>>
>>> Jay
>>>
>>>> Richard Ertel wrote:
>>>>> *sigh*
>>>>>
>>>>> ok, so my fileserver is locking up. seems to always happen, anywhere
>>>>> from 1 minute to 4 hours after booting.  if i disconnect all four SATA
>>>>> hard drives (all for storage) and just have the boot drive (PATA)
>>>>> connected, it seems to stay up indefinitely.
>>>>>
>>>>> i've ran the SATA drives that i thought were problematic through
>>>>> Seagate's SeaTools, and they passed all tests.
>>>>>
>>>>> i've looked through /var/log/messages for entries when the lockup
>>>>> occurred, but nothing looks odd (to me, what do i know?)
>>>>>
>>>>> can anyone tell me where to start troubleshooting to get to the bottom of
>>>>> this?
>>>>>
>>>>> Ubuntu Server 8.04.3, all updates as of this morning.
>>>>
>>>> Rich Ertel,
>>>>
>>>>       On the one hand, the responses that you have received from Jay Hart
>>>> and
>>>> Gerald Williams are not bad, indeed, they are good ideas. On the other
>>>> hand, that is not the best way to troubleshoot. IMO, blindly guessing as
>>>> to the cause (of a problem) is rarely the best way to troubleshoot. When
>>>> troubleshooting, you should always _first_ attempt to generate
>>>> diagnostic information (DI), and, in order to do that, you must identify
>>>> the tools (e.g., software packages) that will help you to generate DI.
>>>> Some of these tools are built into a standard Linux distribution, but
>>>> others must be installed. I cannot recall the names of any such tools
>>>> for HDD's and HDD controllers, but I am certain that they exist.
>>>>
>>>> Sincerely,
>>>> Paul Bain
>>>>
>>>> _______________________________________________
>>>> Novalug mailing list
>>>> Novalug at calypso.tux.org
>>>> http://calypso.tux.org/mailman/listinfo/novalug
>>>>
>>>
>>>
>>> _______________________________________________
>>> Novalug mailing list
>>> Novalug at calypso.tux.org
>>> http://calypso.tux.org/mailman/listinfo/novalug
>>>
>>
>
>
>



More information about the Novalug mailing list