[Tetralith-users] Tetralith job scheduling change
Mats Kronberg
kronberg at nsc.liu.se
Fri Apr 19 13:43:03 CEST 2019
Dear Tetralith users,
Summary:
--------
A Tetralith project can now exceed the previous hard "QOSUsageThreshold"
limit of 1.5 times its allocation, as long as the jobs are shorter than
24h. Such jobs are run "for free", i.e they do not affect the priority
of future jobs.
This can give significantly higher job throughput for projects that are
already hitting the 1.5x "QOSUsageThreshold" limit often and that can
use jobs shorter than 24h.
Details and background:
-----------------------
As some of you might know already, Tetralith has always (and Triolith
from late 2017) had a hard limit that prevents a project from running
more than 1.5 times its allocated time, even if idle nodes are available
and no higher-priority jobs are in the queue.
This limit was introduced mainly due to low-priority projects often
being able to get long-running jobs started during periods of low demand
(e.g start 7 days jobs on Sunday evening). This could then prevent
higher-priority projects from starting their jobs not just on Monday
morning but for much of the work week.
What we have seen in the last few months is that the utilization of the
system has been lower than expected. We believe the reason for this is
that Tetralith tripled in size in January (and existing and new projects
were sized accordingly). Many projects have not adjusted fully to their
new larger sizes and therefore don't use all of their allocated
computing time. The few projects that have had plenty of jobs waiting to
run have often been blocked by the 1.5x limit.
We believe that this will eventually sort itself out as projects start
running more and bigger jobs, and the next round of SNAC Large projects
start in July. But in order to keep the system well utilized until then
without bringing back the old problems we have modified the job
scheduling policy.
A project can now run more than 1.5x its allocation, but only jobs
shorter than 24h may do so. We call such jobs "bonus" jobs and they do
not affect the project's future priority. Bonus jobs are only started
when many nodes are unused (to avoid interfering with normal jobs)
If you have the ability to do so, structuring your jobs to run for e.g
24h rather than 48h can give you a significant increase in throughput.
https://www.nsc.liu.se/support/batch-jobs/tetralith/ has been updated to
describe this change (and many other details of how jobs are scheduled).
The change was actually done a little over a month ago, but we wanted to
make sure everything worked as intended before announcing the change.
We will also review the amount of time allocated to various project
types (Medium, Large etc) to ensure that we can give as much time as
possible to projects that are actually able to use it.
--
Mats Kronberg
National Supercomputer Centre (NSC)
More information about the Tetralith-users
mailing list