[Snic-users] Slow file system access on NSC Centre Storage (/home, /proj)

Mats Kronberg kronberg at nsc.liu.se
Wed May 20 07:42:53 CEST 2015


Status update: things should be back to normal now, but the storage
system is operating with slightly reduced performance and redundancy
while it's being repaired.

Impact: several long and short periods of file system "freezes"
between approximately 03:29 and 06:50, and most jobs that ran on
Kappa/Matter around 06:32 failed.


Details:

We have now identified a faulty network connection between one of the
storage servers and Kappa/Matter. The affected storage server has been
shut down.

We believe that this led to an intermittent connection between
Kappa/Matter and the storage system starting around 03:29 tonight. Due
to the distributed nature of the GPFS file system it is highly
dependent on having a working network connecting between all parts of
the system. This is why Triolith was also affected by the file system
freezes.

While we were investigating the problem, around 06:32 the faulty
storage server completely lost its Kappa/Matter network connection,
which unfortunately caused many Kappa/Matter jobs to fail. Triolith
was not affected by this, as far as I can tell.


-- 
Mats Kronberg, NSC Support <support at nsc.liu.se>


More information about the Snic-users mailing list