[Snic-users] Update on power outage at NSC

Mats Kronberg kronberg at nsc.liu.se
Fri Apr 7 12:17:39 CEST 2017


Dear Triolith and Gamma users,

Our Kärnhuset data center is still running on on emergency diesel
power, and will probably continue to do so until at least early next
week.

When we get normal power back, we can start all compute nodes
immediately. However, to physically disconnect the diesel generator we
will need to stop the compute nodes for a short time (30 minutes or
so). This will happen within a few days of getting normal power back.

To be able to arrange a stop of all compute nodes with such short
notice, we have placed a reservation of all compute nodes starting
Wednesday at 10:00. This reservation will prevent any job that cannot
finish before Wednesday from starting. We will then move the
reservation forward until we know exactly when the stop will be.

If you want to see exactly when the current reservation starts (e.g to
adjust the time limit of your jobs), run this command:
  sinfo -Tl | egrep '^service-powermaint-maybe' | awk '{print $3}'


Current status:
===========

Storage and login nodes are available, and we believe they will
continue to be available without planned stops.

Gamma compute nodes: 96 nodes available, a stop will be needed as
described above.

Triolith compute nodes: 189 nodes available, a stop will be needed as
described above. We might add a few more Triolith nodes later today.

We have tried to keep the best nodes (e.g all the "fat" nodes) online,
and a subset of all "special" nodes (e.g GPU nodes, DCS nodes, ...).
If you absolutely need a certain node type that we've left off,
contact support at nsc.liu.se and we'll se what we can do.


What happened?
============

A short summary of the recent power outages and especially the last
one can be found on
https://www.nsc.liu.se/systemstatus/#recent-power-outages



-- 
Mats Kronberg, NSC Support <support at nsc.liu.se>



More information about the Snic-users mailing list