[Neolith-users] We're yet again back online
Mattias Slabanja
slabanja at nsc.liu.se
Fri Apr 4 19:24:43 CEST 2008
Once again, we're back.
The recent unexpected power loss unfortunately killed all compute nodes,
and hence all running jobs.
The login node, system nodes, and storage servers were unaffected though.
For the reader interested in technical details, here is how it rough
description of how it works in our server room.
In the case of a power loss, our UPS system is configured to power
compute nodes for a maximum of 5 minutes (after that, the UPS will
simply cut their power), and then continue to power central servers,
switches, storage, and cooling, for as long as there still is any
electrochemical juice left in the batteries (which would be roughly two
hours. think of it as a laptop computer, only less portable).
The idea is to be able to fully tolerate short power failures, and in
case of a longer unexpected outage, allow users and admins to have the
time to perform any crucial operations before the system goes down
(saving source code and such).
I wish you all a pleasant weekend!
Regards,
Mattias
More information about the neolith-users
mailing list