[Bi-users] Fixed: incorrect memory limits on fat nodes

Kent Engström kent at nsc.liu.se
Thu Jun 7 16:25:01 CEST 2018


Dear Bi users,

We have identified and fixed a problem using nodes with extra memory on
Bi.

Jobs that was submitted between 2018-05-30 and today, and requested
nodes with extra memory (fat nodes) have not been allowed to use the
extra memory when started, but instead been restricted to using 32G as
on normal "thin" nodes.

This affected jobs submitted using -C/--constraint, for example:

  sbatch -C fat ...



During the downtime on 2018-05-30 the resource manager Slurm was
upgraded. The new version change how memory limits are stored internally
from 32-bit to 64-bit integers. We failed to update one of our scripts
to handle this change, causing memory limits not being set as expected.

If you have been running jobs on fat nodes during the last week and
gotten unexpected out of memory conditions, or poor performance, it is
likely to have been caused by this problem.


NSC apologizes for this inconvenience.

-- 
Kent Engström, National Supercomputer Centre
kent at nsc.liu.se, +46 13 28 4444



More information about the Bi-users mailing list