[Snic-users] Update on power outage at NSC

Mats Kronberg kronberg at nsc.liu.se
Mon Apr 10 10:51:29 CEST 2017


Dear Triolith and Gamma users,

Power has been restored to Kärnhuset, and we have started all Triolith
and Gamma compute nodes.

On Wednesday from 10:00 there will be a short stop (approximately 30
minutes) of all Triolith and Gamma compute nodes. Login nodes and
storage will continue to be available during the stop. Please note
that this means that only jobs that can finish before Wednesday at
10:00 will start.

This extra stop is needed to physically disconnect the diesel
generator. Since Kärnhuset was never designed to run on an external
generator, it had to be connected in a way that means it cannot safely
be disconnected while normal power is being supplied to the building.
To disconnect it we will have to run on battery power for around 30
minutes, and to make the batteries last that long we need to shut down
most compute nodes.

-- 
Mats Kronberg, NSC Support <support at nsc.liu.se>



On Fri, Apr 7, 2017 at 12:17 PM, Mats Kronberg <kronberg at nsc.liu.se> wrote:
> Dear Triolith and Gamma users,
>
> Our Kärnhuset data center is still running on on emergency diesel
> power, and will probably continue to do so until at least early next
> week.
>
> When we get normal power back, we can start all compute nodes
> immediately. However, to physically disconnect the diesel generator we
> will need to stop the compute nodes for a short time (30 minutes or
> so). This will happen within a few days of getting normal power back.
>
> To be able to arrange a stop of all compute nodes with such short
> notice, we have placed a reservation of all compute nodes starting
> Wednesday at 10:00. This reservation will prevent any job that cannot
> finish before Wednesday from starting. We will then move the
> reservation forward until we know exactly when the stop will be.
>
> If you want to see exactly when the current reservation starts (e.g to
> adjust the time limit of your jobs), run this command:
>   sinfo -Tl | egrep '^service-powermaint-maybe' | awk '{print $3}'
>
>
> Current status:
> ===========
>
> Storage and login nodes are available, and we believe they will
> continue to be available without planned stops.
>
> Gamma compute nodes: 96 nodes available, a stop will be needed as
> described above.
>
> Triolith compute nodes: 189 nodes available, a stop will be needed as
> described above. We might add a few more Triolith nodes later today.
>
> We have tried to keep the best nodes (e.g all the "fat" nodes) online,
> and a subset of all "special" nodes (e.g GPU nodes, DCS nodes, ...).
> If you absolutely need a certain node type that we've left off,
> contact support at nsc.liu.se and we'll se what we can do.
>
>
> What happened?
> ============
>
> A short summary of the recent power outages and especially the last
> one can be found on
> https://www.nsc.liu.se/systemstatus/#recent-power-outages
>
>
>
> --
> Mats Kronberg, NSC Support <support at nsc.liu.se>



More information about the Snic-users mailing list