[Gimle-users] Gimle login node rebooted again

KentEngström kent at nsc.liu.se
Thu Nov 18 16:44:11 CET 2010


kent at nsc.liu.se (Kent Engström) writes:
> kent at nsc.liu.se (Kent Engström) writes:
>> The Gimle login node was not responding to logins as it should
>> and had to be rebooted.
>
> And tonight the login node seemed to be in a bad state again, so I had
> to reboot it. We will try to do more troubleshooting when we are back in
> the office.

Another reboot was needed recently. We will analyze saved data and try
to work out what causes this, as soon as we can. In the meantime, we
will restart the login node when neeed. Please send an email to
smhi-support at nsc.liu.se if you experience problems.

As always, we urge you to run code that may be CPU or filesystem
intensive on a compute node (using batch scripts or interactively 
using "interactive"). An interesting point is that we have not seen any
compute node failing in the same way as the login node. If we saw
a compute node fail in the same way (absurdly high "load average" value,
Lustre access problems), it could be easier to pinpoint the problem.
If you run on a compute node and see this, please contact us.

-- 
Kent Engström, National Supercomputer Centre
kent at nsc.liu.se, +46 13 28 4444


More information about the Gimle-users mailing list