[Neolith-users] service-stop, removing all queued jobs 2008-07-01

Pär Andersson paran at nsc.liu.se
Tue Jun 24 18:37:58 CEST 2008


Dear Neolith Users,


Short version:

* Neolith will have a service stop 2008-07-01 the entire day from 08:00.
* ALL JOBS THAT ARE QUEUED WILL BE REMOVED. (Needs to be resubmitted after the service stop)


Long version:

On Tuesday 2008-07-01 Neolith will have a service stop. We will perform a series of software upgrades, primarily the job scheduler MOAB and the resource manager SLURM. The stop is scheduled to start at 08:00 and we expect it to take the entire day. During the stop the entire cluster will be unavailable, including login node.

The SLURM upgrade is from the 1.2.x versions to the newer 1.3.x versions. 1.2 and 1.3 have incompatible local state files and the upgrade can NOT preserve queued jobs. This means that jobs still in the queue at 2008-07-01 08:00 will be deleted, and you have to resubmit them when the service stop is over. Jobs that have finished before the service stop begins will of course not be affected.

We apologize for this inconvenience, but SLURM 1.3 brings enough improvements for both users and administrators that we feel the upgrade is justified. Here are some highlights:

* Clear error messages are written to slurm-XXXXX.out files when a job exceeds its walltime.

* Support is now provided for feature counts in job constraints. For example: srun --nodes=16 --constraint=fat*4 ...

* Support has been added for a much richer job dependency specification including testing of exit codes and multiple dependencies.

If you have any questions regarding this stop, please do not hesitate to contact us.

Regards,

Pär Andersson
NSC-staff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.nsc.liu.se/pipermail/neolith-users/attachments/20080624/e33dd797/attachment-0001.htm


More information about the neolith-users mailing list