[Vagnekman-users] Acceptable use of Vagn
Mats Kronberg
kronberg at nsc.liu.se
Fri Aug 24 15:38:50 CEST 2012
Dear Vagn users,
There has been some complaints, both today and earlier this summer,
about some users using so many resources (CPU cores and memory) on
Vagn that it becomes difficult for other users to get access to the
system without having to wait for hours or even days.
Please remember that the queue on Vagn is just a FIFO, jobs (both
batch jobs and interactive sessions) are started in the order they are
submitted. Each job is allocated a certain number of CPU cores and
memory, and those resources cannot be used by other jobs, so when Vagn
runs out of unused CPU cores or memory, no new jobs can start until an
old one ends.
The responsibility for not "hogging" a too large part of the machine is yours!
If you want to do run several batch jobs job on Vagn, I suggest that
you consider using the method described on
http://www.nsc.liu.se/systems/vagn/#sec-4-9 to make sure that you
don't use too many resources at any one time.
How many cores/RAM is it OK to use? That is difficult to say. If you
are the only user from your user group (e.g MISU) using Vagn on a
certain day, it might be acceptable to use a large chunk of Vagn, but
if many other users from your group are also active you should
probably be more careful. Also remember that it is the amount of
resources used that matters, not the number of jobs. A 32-core/256GB
job uses just as many resources as 32 1-core/8GB jobs.
Since it is difficult to get an overview of who is actually using what
resources on Vagn I made a small script "vagn-usage" that might be
useful. Please try it out. Sample output:
[kronberg at analys1 ~]$ vagn-usage
Vagn usage at 2012-08-24T15:21:26
Usage by group
Group #cores Memory (MB)
---------------------------------
kthmech 39 424000
misu 7 38000
rossby 6 22000
sm_fouo 3 10000
Usage by user
User #cores Memory (MB)
---------------------------------
sm_annli 1 4000
sm_louca 1 2000
sm_mkola 2 8000
sm_ppemb 1 4000
sm_rohor 1 2000
sm_semsc 1 4000
sm_stran 1 4000
sm_torko 1 4000
x_andci 32 64000
x_iulib 2 8000
x_janju 1 2000
x_julsa 2 4000
x_larah 1 20000
x_laubr 1 4000
x_liawe 2 100000
x_maber 1 4000
x_phisc 4 256000
Cores Memory (MB)
Node in_use total % in_use total % Full?
----------------------------------------------------------
a2 4 8 50.0 32000 32186 99.4 yes
a3 1 8 12.5 32000 32186 99.4 yes
a4 1 8 12.5 32000 32186 99.4 yes
a5 7 8 87.5 18000 32186 55.9 no
a6 4 8 50.0 62000 64372 96.3 yes
a7 6 32 18.8 254000 257488 98.6 yes
a8 32 32 100.0 64000 257488 24.9 yes
all 55 104 52.9 494000 708092 69.8 n/a
(Full == no available cores or <4GB RAM available)
There are jobs waiting in the queue:
JOBID PARTITION USER ACCOUNT NODES CPUS MIN_MEMORY NODELIST(REASON)
72746 share x_julsa misu 1 8 4000 (Resources)
72991 share x_phisc kthmech 1 1 48000 (Priority)
72992 share x_phisc kthmech 1 1 48000 (Priority)
As you can see, Vagn was almost full, there is just one node where a
small single-core job could be started. You can also see that a single
user group is responsible for most of the usage.
--
Mats Kronberg, NSC Support <support at nsc.liu.se>
More information about the Vagnekman-users
mailing list