[Bi-users] Pilot access on Thursday Feb 26

Wed Feb 25 19:05:10 CET 2015

> Dear Bi Pilot Testers,
>
> we are getting close to the date when Bi will be taken into production
> (the preliminary date March 2 is still the one we are aiming for).
>
> To prepare for this, we need to work on configuring SLURM (the scheduler
> and resource manager) on Bi. It is hard to do that without disturbing
> users and jobs, so we will have to kick out people and kill jobs on
> Wednesday morning when we start working on this.
>
> If we do not run into problems, we can let you in again on Thursday Feb
> 26 to continue pilot testing. If we need to continue working on SLURM to
> get it ready for production, we will have to keep the cluster for
> ourselves on Thursday-Friday too.

It looks like we can give you access again in the morning tomorrow
Thursday. We will need to reclaim the system for some exclusive work in
the Thursday afternoon again. More information about Friday and further
on later.

Changes you need to be aware of for the rest of the pilot phase (and for
the start of production on Bi):

- Hyper-threading is now off by default. Instead of disabling it using
  --ntasks-per-core=1 you enable it with --ntasks-per-core=2.

- The implementation of "interactive" is brand new. If it fails, try the
  old one available as "interactive-old" and tell smhi-support at nsc.liu.se
  how the new one failed while the old one worked!

- You cannot login to a node running a job using plain ssh. Use "jobsh"
  instead. You will have to supply the job number too (e.g. "jobsh -j
  1234 n375") when running on the login node.

- Node sharing is available (as on Triolith). If you say something like
  "sbatch -n 1 ..."  the job may share the node with other jobs smaller
  than a node. Jobs using a full node or more will not experience this
  (we will not pack two 24 core jobs into 3 nodes). You can turn off
  node-sharing for otherwise eligible jobs using --exclusive.

- As part of the node-sharing, each job now has a private
  /scratch/local, /tmp and /var/tmp. You should still put your
  node-local files in $SNIC_TMP (which now points to your job-private
  /scratch/local without job number in it, but that is an implementation
  detail).

-- 
Kent Engström, National Supercomputer Centre
kent at nsc.liu.se, +46 13 28 4444