[Triolith-users] Maximum allowable wall time limit changed from 7 days to 3 days

Mats Kronberg kronberg at nsc.liu.se
Thu Sep 13 13:58:06 CEST 2012


Dear Triolith users,

Executive summary: the maximum wall time limit of a job on Triolith is
lowered from 7 days to 3 days (72 hours), effective immediately. I.e
you can not request more than 3 days of running time for your jobs
("sbatch -t 3-00:00:00").


Background and details:

Due to the limited number of nodes in Triolith phase 1 (240 nodes) and
the high average length of jobs, the number of nodes that become
available for new jobs per hour is low.

The result is that even high-priority jobs will sometimes have to wait
in the queue for a long time.

To improve this situation and give more predictable queue times for
jobs (especially high-priority ones), we will temporarily lower the
maximum allowable wall time on Triolith from seven days to three days
(72 hours).

Please note that you will in most cases get more work done in a 3-day
Triolith job than in a 7-day Neolith job due to the higher core count
per node and higher performance per core.

Currently running jobs with a wall time limit over 3 days will be
allowed to run until they finish.

Queued jobs with a wall time over 3 days will not start until you
resubmit them with a lower walltime, or modify them to use a shorter
wall time, like this:

  scontrol update JobId=YOUR_JOB_ID TimeLimit=3-00:00:00

To find the JobId:s of your queued jobs that are affected you can run:

  squeue --state=PENDING -u $USER | grep PartitionTimeLimit

If your job will not be able to run successfully in 3 days you should
not modify its wall time requirement, you should cancel it and submit
a modified job that can run for just 3 days.

If your job cannot be modified to finish in 3 days, we can allow you
to run selected jobs with a higher wall time limit (just as we do on
other systems). If you need to do this, please email
support at nsc.liu.se and explain why you cannot use e.g checkpointing or
other methods to split your job into several shorter runs.

When phase 2 of Triolith is in production we will consider raising the
wall time limit again.

A reminder: If you need quick access to a node for test or
development, you can use one or more of the four nodes reserved for
development (add "--reservation=devel" and request less than 1h of
wall time to use these nodes).

If you have any questions regarding this change, please contact
support at nsc.liu.se.


-- 
Mats Kronberg, NSC Support <support at nsc.liu.se>


More information about the Triolith-users mailing list