[Triolith-users] Downtime and changes on Triolith 2016-03-01

Pär Lindfors paran at nsc.liu.se
Tue Feb 23 09:30:24 CET 2016


Dear Triolith users,

Triolith will have planned downtime the entire day Tuesday 2016-03-01
starting at 08:00. Jobs that cannot finish before the downtime will not
be started until after the downtime is over.

We will perform various software upgrades and configuration
changes. Several of the changes will be noticeable, please see the list
below for details. Our user guides on the web will also be updated.

Slurm upgrade
-------------

The Slurm workload manager will be upgraded from version 2.4 to
14.11. This includes many bug fixes, new features and performance
improvements. Other clusters at NSC have been running Slurm 14.11 for
over a year.

Most job scripts will not need any changes to work with the new
version. There are however some minor changes in the default output
format for commands like squeue and sinfo.

FairShare within projects
-------------------------

The current scheduling policy uses fairshare to prioritise jobs from
different projects. Jobs submitted by different members of a project are
only prioritised based on the order they were submitted.

This will be changed so that fairshare is also used to prioritise
between jobs from different users within a project.

Development node usage limit
----------------------------

Usage of the development nodes will be restricted so that no user can
ever use more than 4 of the 8 nodes at the same time. This have become
necessary to prevent a single user from filling up all the development
nodes.

Node local scratch directory
----------------------------

The path to the node local scratch directory will change from
/scratch/local/$SLURM_JOB_ID to /scratch/local.

Please note that the supported method is to get this path from the
environment variable SNIC_TMP. Jobs using the supported method will
not be affected by this change.

New 'interactive' script
------------------------

Change to a new version of the interactive script. The messages printed
when running the 'interactive' command will be different, and once the
job is started the terminal will behave more like a normal terminal.

Technically the old version relied on the software 'screen' while the
new version uses native functionality in Slurm.

Improved memory limits
----------------------

Memory limits have not always been set or enforced correctly for jobs
running on shared compute nodes. After the downtime they will be.

Accelerator access
------------------

Job scripts requesting access to accelerators (NVIDIA Tesla or Intel
Xeon Phi) will need minor changes. The accelerators will be specified
using Slurm generic resources instead of of node features. Our user
guide will be updated with information about this.

External IP addresses
---------------------

Trioliths external IPv4 addresses will change.

ThinLinc upgrade
----------------

ThinLinc on Triolith will be upgraded to the latest version.


Kind regards,
Pär Lindfors, NSC



More information about the Triolith-users mailing list