[Snic-users] Notice: unplanned outage on NSC Centre Storage

Mats Kronberg kronberg at nsc.liu.se
Mon Aug 11 07:43:54 CEST 2014


Dear Triolith, Kappa, Matter and Neolith users,

Since at least yesterday afternoon (possibly longer) and until 07:15
today, accessing the shared file systems (/home, /nobackup) was
impossible or slow from Triolith, Kappa and Matter.

As far as we can tell, no data on the file systems was damaged or lost.

All processes trying to access the file systems during this time would
"hang" and then continue once the problem was solved, so no jobs
failed directly due to this problem. It would also have made it
impossible to log in to the login nodes for some or all users.

If you have jobs still running that might not finish correctly if they
run for longer than normal due to this problem, please send the job
IDs to support at nsc.liu.se to have their time limit extended.

If you still see some kind of disk-related problem (e.g hangs or bad
data) or if you need help determining if a failed job was due to this,
please contact support at nsc.liu.se.

The problem was caused by a deadlock triggered by a compute node
failing in a very unusual way. As soon as that node was powered off,
the system returned to normal.


-- 
Mats Kronberg, NSC Support <support at nsc.liu.se>


More information about the Snic-users mailing list