[Snic-users] Slow file system access on NSC Centre Storage (/home, /proj)
Mats Kronberg
kronberg at nsc.liu.se
Wed May 20 07:42:53 CEST 2015
Status update: things should be back to normal now, but the storage
system is operating with slightly reduced performance and redundancy
while it's being repaired.
Impact: several long and short periods of file system "freezes"
between approximately 03:29 and 06:50, and most jobs that ran on
Kappa/Matter around 06:32 failed.
Details:
We have now identified a faulty network connection between one of the
storage servers and Kappa/Matter. The affected storage server has been
shut down.
We believe that this led to an intermittent connection between
Kappa/Matter and the storage system starting around 03:29 tonight. Due
to the distributed nature of the GPFS file system it is highly
dependent on having a working network connecting between all parts of
the system. This is why Triolith was also affected by the file system
freezes.
While we were investigating the problem, around 06:32 the faulty
storage server completely lost its Kappa/Matter network connection,
which unfortunately caused many Kappa/Matter jobs to fail. Triolith
was not affected by this, as far as I can tell.
--
Mats Kronberg, NSC Support <support at nsc.liu.se>
More information about the Snic-users
mailing list