[Dunder-users] A new kind of job on Dunder: riskjobb

Lennart Karlsson Lennart.Karlsson at nsc.liu.se
Wed Apr 4 13:50:13 CEST 2007


Dear Dunder user,

Now we introduce risk jobs at Dunder.

For a long time it has been possible to run risk jobs on Tornado
and Blixt. They are queued with a very low priority and also
run with a very low priority.

Actually, a risk job will automatically be terminated and once
again put into the "eligible" queue as soon as a normal job is
queued and can make use of the nodes that the risk job is
running on.

This means that it is normal for a risk job to be terminated and
rerun from the beginning several times before getting lucky and
allowed to run to its normal exit. Not every application is
written in a way that makes it fit for such a treatment.

The sunny side of risk jobs is that they can use any number of
free nodes on Tornado, including the development nodes,
that they have no walltime limit, and that they will get out of
the way automatically whenever a more important job needs to be
run.

You start a risk job with a qsub command as you do with normal
jobs, but with an added queue parameter '-q riskjobb'. An example:

	qsub my_job_script -q riskjobb

On Tornado, we have earlier seen some problems with risk jobs,
that were not terminated as quickly as expected, so we would
like to hear from you if you experience such problems.

If you have any questions or problems, please tell us at
smhi-support at nsc.liu.se as usual.

-- Lennart Karlsson <smhi-support at nsc.liu.se>
   National Supercomputer Centre, Linkoping University
   http://www.nsc.liu.se





More information about the dunder-users mailing list