[Neolith-users] We're yet again back online

Mattias Slabanja slabanja at nsc.liu.se
Fri Apr 4 19:24:43 CEST 2008


Once again, we're back.

The recent unexpected power loss unfortunately killed all compute nodes, 
and hence all running jobs.
The login node, system nodes, and storage servers were unaffected though.

For the reader interested in technical details, here is how it rough 
description of how it works in our server room.

In the case of a power loss, our UPS system is configured to power 
compute nodes for a maximum of 5 minutes (after that, the UPS will 
simply cut their power), and then continue to power central servers, 
switches, storage, and cooling, for as long as there still is any 
electrochemical juice left in the batteries (which would be roughly two 
hours. think of it as a laptop computer, only less portable).

The idea is to be able to fully tolerate short power failures, and in 
case of a longer unexpected outage, allow users and admins to have the 
time to perform any crucial operations before the system goes down 
(saving source code and such).

I wish you all a pleasant weekend!

Regards,
Mattias


More information about the neolith-users mailing list