[Neolith-users] New testpilots activated, welcome!

Peter Kjellstrom cap at nsc.liu.se
Wed Aug 8 20:41:11 CEST 2007


Neolith is still in its early stages and almost nothing is tested. Please 
don't assume that things work in a specific way, read the instructions, try 
_and_ verify what happened.

Report everything (good and bad) to support at nsc.liu.se (preferably with the 
word neolith included in the subject line).

Instructions:
 http://www.nsc.liu.se/systems/neolith/testpilot.html

I'm also including a snapshot of the above instructions in this e-mail. Note 
that that copy should be considered out of date almost as I paste it...

Good luck,
 Peter Kjellström - NSC


Current testpilot instructions as of now:


Minimal Neolith testpilot instructions
What we expect

    * that you read the e-mail sent to testpilots carefully
    * lots of feedback (problems, wishes, questions, ...) to 
support at nsc.liu.se
    * performance figures (see previous info)
    * flexibiliy, be prepared to resubmit jobs, run new ways, etc.
    * that you invest more time than usual understanding what you're doing 
(don't expect things to work, verify that they do) 

What you can expect from the system

    * frequent updates
    * changes in documentation (read all e-mail sent to testpilots)
    * no long jobs allowed without special requests
    * that things can change from one day to the other
    * bugs 

System description
final configuration will be:

    * 805 compute nodes each with 8 x86_64 cores and 16-32 GiB RAM
    * next generation Infiniband interconnect (ConnectX)
    * around 60 TiB of fast /nobackup storage space
    * three login nodes
    * Centos-5 64-bit Linux
    * Intel compilers, Scali MPI, and SLURM batch queueing system 

phase1 limitations:

    * 36 compute nodes (or less)
    * "normal" Infiniband
    * one login node
    * no /nobackup storage, only home directories 

Main differences compared to Monolith

    * 8 processor cores per node instead of 2
    * 64-bit adressing instead of 32-bit
    * 16-32 GiB RAM per node instead of 2 GiB
    * SLURM batch queue system instead of PBS/torque 

Significant similarities compared to Monolith

    * Intel compilers
    * MKL in serial version located 
at /software/intel/cmkl/9.1/lib_serial/em64t/
    * Scali MPI, located at /opt/scali
    * Storage layout with backup protected home directories and /nobackup 
(/nobackup was called /global on Monolith and will only be available on the 
final system (stage2))
    * Moab scheduler (showq command)
    * Scratch disk on compute nodes is: /disk/local 

How to use, with examples
Compiling
Compilers are loaded by default (c.f. module list), building applications not 
requiring MPI or such, should be straight forward.
In order to build with MPI-support, the corresponding MPI-module must be 
loaded.

 $ module list
 Currently loaded modules:
   1) ifort
   2) icc
   3) idb
   4) dotmodules
   5) base-config
   6) default

... compiler already loaded (ifort/icc), no mpi...

 $ module load scampi
 $ icc -Nmpi my_prog.c -o my_prog.mpibin

Submitting jobs

 $ module load scampi
 $ sbatch my_prog.sh

my_prog.sh:

 #!/bin/sh
 #
 # 4 nodes (total 32 cores/mpi ranks)
 #SBATCH -N 4
 #
 # 60 minutes
 #SBATCH -t 60

 /software/tools/bin/mpprun my_prog.mpibin

Note1: mpprun needs the correct mpi to be loaded with the module command prior 
to running sbatch.
Note2: mpprun does not allow you to choose size (-np). If you want to lauch 
less (or more) than nodes x 8 ranks then use the slurm option -n. If you 
wanted to use 16 cores instead of default all (32) above then this line 
should be added to the submit script:

 #SBATCH -n 16

Interactive jobs (like qsub -I on monolith)
Use the command interactive to submit an interactive job. Time, number of 
nodes etc are specified on the command line using the same syntax as in a 
batch script.
Request an interactive job using four nodes for 60 minutes:

 $ interactive -N4 -t 60
 Waiting for JOBID 640 to start
 ....
 $

Monitoring jobs

    * squeue (shows jobs from SLURM perstpective, like qstat on Monolith)
    * scancel (cancels a job, like qdel on Monolith)
    * showq (same as on Monolith)
    * sinfo (shows node overview, free, used, down...)
    * ssh to the node (and run top, ps, etc.) 

Login-node and storage quota

    * top (press M (shift+m) to sort by memory usage)
    * /home/diskinfo (file with nightly disk usage summary)
    * quota
    * df 

More information

    * slurm userguides: quickstart guide (note: general documentation, not 
100% applicable to Neolith)
    * scampi userguide: /opt/scali/doc/SMC55_UserGuide.pdf 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.nsc.liu.se/pipermail/neolith-users/attachments/20070808/b2f9ca7c/attachment.bin


More information about the neolith-users mailing list