[Bi-users] Emergency service of rossby22 and smhid15

Peter Bortas zino at nsc.liu.se
Sun Feb 17 13:56:12 CET 2019


The filesystems are now available again on the Bi compute nodes, but
not on the login node. The login node will have to be rebooted to get
access back, and we will come back with a time for that after the
weekend.

Note that the disk system that caused this to begin with has a serious
hardware failure that is still not completely fixed. Spare parts are
being rushed in by the vendor and we will need another stop to fix
things permanently when they show up. This will cause another stop
this week on relatively short notice when we replace parts. Exact
times will be announced when the parts have been delivered to Sweden.

Regards,
-- 
Peter Bortas, NSC

On Fri, 15 Feb 2019 at 16:25, Peter Bortas <zino at nsc.liu.se> wrote:
>
> With some more data we now have a better estimate: Recovery should be
> complete this Sunday evening unless more parts break between now and
> then.
>
> Regards,
> --
> Peter Bortas, NSC
>
> On Thu, 14 Feb 2019 at 21:54, Peter Bortas <zino at nsc.liu.se> wrote:
> >
> > We have a path to recovery, but the filesystems will remain down until
> > - best case - Friday lunch.
> >
> > Regards,
> > --
> > Peter Bortas, NSC
> >
> > On Thu, 14 Feb 2019 at 14:46, Peter Bortas <zino at nsc.liu.se> wrote:
> > >
> > > Dear Bi/Accumulus users
> > >
> > > Due to a serious issue with one of the servers for rossby22 and
> > > smhid15; those two filesystems will shortly stall all reads and writes
> > > while we take the server down for maintenance. We do not currently
> > > have a time for when it will be back up, but will return with that
> > > information when we have it.
> > >
> > > Regards,
> > > --
> > > Peter Bortas, NSC


More information about the Bi-users mailing list