[Vagnekman-users] Gimle and Vagn Service Downtime Monday May 23

Kent Engström kent at nsc.liu.se
Mon May 16 09:43:29 CEST 2011


Dear Gimle and Vagn Users,

we do not want to risk your jobs being killed or getting bad data
during the stop, and there are system work to be done on Vagn
that would otherwise require a separate stop anyway, so:

During the Accumulus downtime on Monday May 23, the Gimle and Vagn
compute nodes will not be running jobs, and the login servers may be
inaccessible too.

On Gimle, as we are less than 7 days from the start of the downtime,
this means that you are yet again required to submit jobs with a correct
"-t TIME" flags, otherwise the scheduler will assume 7*24 hours running
time and decide that your jobs cannot run before the stop. That also
applies to "interactive" invocations.

kent at nsc.liu.se (Kent Engström) writes:
> Dear Users of the Accumulus Storage,
>
> some of you have been reporting spurious problems with the Lustre
> filesystems, such as recently created files being seen with 
> size 0 on other nodes afterwards, for some time.
>
> Thanks to your effort in isolating the problems and a lot of work here
> at NSC, we have been able to reproduce the problem reliably. We have
> reported the error to the Lustre developers and helped testing the fix,
> and there is now a working fix for the problem.
>
> The fix is in the Lustre server, not the client. That means that we will
> need to update the servers. After talking to our user contacts,
> we have scheduled this downtime for Monday May 23, starting at 09.00
> CEST and continuing until we are done. We should be up and running
> during the day.
>
> During the downtime, the Accumulus Lustre filesystems (all of /nobackup
> except the oldest parts) will not be available on Gimle or Vagn.
>
>
> Sincerely,

-- 
Kent Engström, National Supercomputer Centre
kent at nsc.liu.se, +46 13 28 4444



More information about the Vagnekman-users mailing list