[Triolith-users] Fixed: incorrect memory limits on fat nodes

Pär Lindfors paran at nsc.liu.se
Thu Jun 7 15:56:31 CEST 2018


Dear Triolith users,

We have identified and fixed a problem using nodes with extra memory on
Triolith.

Jobs that was submitted between 2018-05-23 and today, and requested
nodes with extra memory (fat/huge/dcs nodes) have not been allowed to
use the extra memory when started, but instead been restricted to using
32G as on normal "thin" nodes.

This affected jobs submitted using -C/--constraint, for example:

  sbatch -C fat ...


During the downtime on 2018-05-23 the resource manager Slurm was
upgraded. The new version change how memory limits are stored internally
from 32 bit to 64 bit integers. We failed to update one of our scripts
to handle this change, causing memory limits not being set as expected.

If you have been running jobs on fat/dcs/huge nodes during the last two
weeks and gotten unexpected out of memory conditions, or poor
performance, it is likely to have been caused by this problem.


NSC apologizes for this inconvenience.


Regards,
Pär Lindfors, NSC


More information about the Triolith-users mailing list