[Snic-users] Notice: temporary unplanned outage on NSC Centre Storage tonight between 03:30 and 05:30

Mats Kronberg kronberg at nsc.liu.se
Wed Oct 31 06:37:37 CET 2012


Dear Triolith, Kappa, Matter and Neolith users,

Between approximately 03:48 and 05:30 CET tonight, the shared
filesystems (/home, /nobackup and /software) on NSC's Centre Storage
(i.e on  Triolith, Kappa, Matter and Neolith) were unavailable due to
a technical problem.

As far as we can tell, no data on the file systems was damaged or lost.

All processes trying to access the file systems during this time would
"hang" and then continue, so no jobs failed directly due to this
tonight

If you have jobs still running that might not finish correctly if they
run for 2 hours longer than normal, please send the job IDs to
support at nsc.liu.se to have their time limit extended.

If you still see some kind of disk-related problem (e.g hangs or bad
data) or if you need help determining if a failed job was due to this,
please contact support at nsc.liu.se.

The problem was caused by a deadlock triggered by a compute node
failing in a very unusual way. As soon as that node was powered off,
the system returned to normal.


-- 
Mats Kronberg, NSC Support <support at nsc.liu.se>


More information about the Snic-users mailing list