[Snic-users] Notice: unplanned outage on NSC Centre Storage tonight between 00:00 and 04:08

Mats Kronberg kronberg at nsc.liu.se
Thu Dec 6 08:18:28 CET 2012


Dear Triolith, Kappa, Matter and Neolith users,

Between approximately 00:00 and 04:08 CET tonight (2012-12-06), the
shared filesystems (/home, /nobackup and /software) on NSC's Centre
Storage (i.e on Triolith, Kappa, Matter and Neolith) were unavailable
due to a technical problem.

As far as we can tell, no data on the file systems was damaged or lost.

All processes trying to access the file systems during this time would
"hang" and then continue, so no jobs failed directly due to this
tonight.

If you have jobs still running that might not finish correctly if they
run for 4 hours longer than normal, please send the job IDs to
support at nsc.liu.se to have their time limit extended.

If you still see some kind of disk-related problem (e.g hangs or bad
data) or if you need help determining if a failed job was due to this,
please contact support at nsc.liu.se.

The problem was caused by a deadlock triggered by a compute node
failing in a very unusual way. As soon as that node was powered off,
the system returned to normal.


-- 
Mats Kronberg, NSC Support <support at nsc.liu.se>


More information about the Snic-users mailing list