[Kappa-users] Re: Neolith and Kappa downtime 2010-06-22 08:00-18:00 CEST

Tue Jun 22 22:31:19 CEST 2010

Kappa is now available again.

[Changes]

 . OS upgrade to latest (Centos 5.5)
 . Infiniband software stack upgraded
 . gpfs-client upgrade (3.2.1-20)
 . Batch queuing system upgrade (slurm 2.1.9)

 . Green-like batch queueing system configuration for members of local
   liu project liu1*. See section further down below. Users not a
   member of liu1 are not directly affected by this, but the
   information might still be of interest.

(*) In the process of configuring the batch queueing system, all
queued jobs failed and have to be resubmitted again. We apologize
sincerely for this.

[Green-like batch queueing system config]

   This guarantees a minimum number of nodes per user in liu1 that is
   always available if needed. In order to be able to utilize unused
   nodes there are certain so called "risk jobs" that have no
   limitation but may be cancelled at any time if a normal/guaranteed
   job needs its resources.

   This is configured as two separate overlapping partitions in
   slurm. liu1's share of kappa is currently 29 nodes. This is
   currently computed as:

        (nodes_all - nodes_afm - nodes_grid) / 2 * (liu1_allocation/liu_allocation) =  (364-26-16)/2.0 * (16/(16*5+9)) = 28.94 nodes. 

   There are currently 10 users in liu1, this gives a guaranteed number
   of nodes of 2 per user without running riskjobs.

   Args to sbatch/srun/interactive when using "risk jobs"* (max 29
   nodes, no walltime limitation, may be killed by any theophys job):

        -A liu1 -p theophys_risk

   Args to sbatch/srun/interactive when using "normal" jobs (max 2
   nodes, max 7 days walltime, may kill a theophys_risk job if
   necessary):

        -A liu1 -p theophys

   Note: liu1 jobs may not run in kappa/devel partition anymore.

Please contact support if you have any questions or encounter any
problems.

best regards,
/Per Lundqvist

On Mon, 21 Jun 2010, Mats Kronberg wrote:

> Dear Neolith and Kappa users,
> 
> This is just a reminder that Kappa and Neolith will be unavailable
> tomorrow from 08:00.
> 
> 
> Also, if you want to run short jobs, today is a good day for that. There
> will be many nodes available on Neolith and Kappa, since no long jobs
> can start until after the downtime.
> 
> //Mats
> 
> 
> On 2010-06-10 16:57, Mats Kronberg wrote:
> > Dear Neolith and Kappa users,
> >
> > On Tuesday June 22nd (2010-06-22) Neolith and Kappa will have planned
> > downtime from 08:00 CEST.
> >
> > We have scheduled the whole day for this, but will of course try to
> > keep the actual downtime as short as possible.
> >
> > Both clusters including the login nodes will be unavailable during the
> > downtime.
> >
> >
> > The scheduled work consists of upgrades to the Kappa Infiniband
> > interconnect firmware and software, as well as operating system
> > upgrades on both Kappa and Neolith.
> >
> > Jobs that cannot finish before the downtime will remain queued until
> > the downtime is over.
> >
> > Hint: short jobs will find it easier then usual to run in the days
> > immediately before downtime, as they don't have to compete with longer
> > jobs.
> >
> >
> > Kind regards,
> >
> > Mats Kronberg <support at nsc.liu.se>
> > NSC
> >
> >   
> 
> _______________________________________________
> Kappa-users mailing list
> Kappa-users at lists.nsc.liu.se
> http://www.nsc.liu.se/mailman/listinfo/kappa-users
> 

-- 
Per Lundqvist

National Supercomputer Centre
Linköping University, Sweden

http://www.nsc.liu.se