[Bi-users] Problems with fouo5 filesystem last night/this morning (also smhid12/rossby19)

Kent Engström kent at nsc.liu.se
Tue Jul 19 11:09:28 CEST 2016


Dear Bi Users,

during the night (problems from ~ 2 o'clock, up again ~ 7) and later now
this morning (problems from ~ 8 o'clock, up again recently) we have had
the mds8 file system server hanging. This has affected the fouo5,
smhid12 and rossby19 filesystems.

During these incidents, 26 (night) and 1 (morning) nodes on Bi had also
become unresponsive and had to be rebooted. We will try to see if the
jobs running there at the time gives us any clues or not.

Based on the symptoms seen, we gather that fouo5 is the filesystem that
is the cause and/or victim in this. If you have recently (from last
evening or so) started doing something substantially new on that
filesystem (new type of run, a lot more activity etc) we would like to
hear from you, to confirm or rule out if your jobs may be triggering the
problems.

-- 
Kent Engström, National Supercomputer Centre
kent at nsc.liu.se, +46 13 28 4444



More information about the Bi-users mailing list