[Berzelius-users] Important updates to scheduling policy and automatic job termination

Henrik Henriksson hx at nsc.liu.se
Mon Feb 26 11:37:13 CET 2024


Dear Berzelius users,

We will implement some changes to the automatic job termination.



# Upcoming changes

  - A new reservation has been created, "safe". Jobs running within this
    reservation will be safe from automatic job termination. However, this
    reservation will be intentionally underprovisioned, so expect longer queue
    times. Add `--reservation=safe` to your slurm invocation to use the
    reservation. With the creation of this reservation, we will apply stricter
    rules to jobs outside of this reservation. Within this reservation we may
    impose limits on job width.

  - Job termination will no longer be based on average wattage since the start of
    the job. Instead, we will use an exponential moving average [0].

  - The power limit will be gradually increased to at least 100W.

  - Interactive jobs have so far been fully exempted from automatic job
    termination. In the future, interactive jobs will only be exempt from
    automatic job termination for the first 8 hours of walltime.

The changes will be rolled out gradually over the next week or so. The `safe`
reservation is available as of today. It should be noted that the `safe`
reservation is not intended to be a long-term solution for projects, rather a
stop-gap solution while looking to improve code. Efficient use of resources are
considered by the Berzelius allocation staff.




# Motivation

We are updating the scheduling policy in this manner based on metrics,
experiences and user interactions collected since we started working on
improving the efficiency. So far, we see a measurable and significant
improvement in how the system is used. These changes are intended to mitigate
limitations imposed on users, as well as allowing for automatic job termination
in some common situations where this haven't been done so far.

  - The old scheduling policy place a small subset of users in a position where
    they can't work at all. This is partially mitigated by the
    `1g.10gb`-reservation, but not all users are able to use that. The
    `safe`-reservation is intended to mitigate this, by providing a manner in which
    the job will always run safely. However, to provide incentive to use the system
    efficiently, the size of this reservation will be limited. That means that
    queue-times will be artificially and intentionally longer.

  - Basing job termination on average use works fine most cases. "Delayed
    starts", where nothing happens, are terminated after one hour, as are jobs
    that simply don't manage to saturate the GPU. However, for "forgotten" jobs,
    where the GPU was used for a few hours and then went idle, we need a
    different averaging function to allow for a faster decay. In practice, we
    don't think this particular change will affect users noticably.

  - A common pattern for interactive jobs are forgotten sessions - users allocate
    resources, use them for a while and then forget to terminate the job. So far,
    we have intentionally avoided interactive jobs at all. In the future we will
    terminate them according to the same policy as other jobs, but interactive
    jobs will have a grace time of 8 hours (= a full workday), instead of the
    normal one hour.



As always, please contact berzelius-support at nsc.liu.se with any questions or comments.


[0] We will leave the exact parameters open for us to adjust. To start with, we
     aim for the "step function" for a job going from a full load down to idle to
     allow for the job to continue running for approximately one hour.


Kind regards,
Berzelius Staff


More information about the Berzelius-users mailing list