[Vagnekman-users] Important news regarding the cluster filesystem on Ekman

Tue Dec 8 19:01:59 CET 2009

Dear Ekman users

In an effort to prevent the cluster file system on Ekman (/cfs/ekman/...)
from becoming full we are now
1. Asking you to move away or remove any files you do not need for your runs
on ekman.
2. Implementing the enforcement of a quota for the cluster file systems
which will try to maintain a state such that the jobs running at any given
time will have place for both input and output of their data, which is the
intention of that file system.

While we understand that (1) is made more difficult by the current
unavailability of Vagn we still need to make sure that there is space
available for jobs that will and are currently running on Ekman.

We hope that by the 21th of December you will have removed or moved data so
that the file-system usage is stable at at most 70%. If that is not the case
we will need to perform an immediate space allocation enforcement. This
enforcement will try to adhere to the policy that will be used for (2) (see
below).

The following is an overview of the situation as of 20091207:

Available:  79T
Total used: 62.1474T

Per group (group used percentage-of-used):

misu   1.46T 2%
smhi   49.71T 80%
mech  10.30T 17%
nsc      0.34T 0.55%
pdc     0.35T 0.56%

 8 users with more than 1T using 59.9T in total
45 users with less than 1T using 2.25T in total

Email has or will be sent to users using more than 1T detailing their
individual usage and emails will be sent to each group detailing the groups
usage.

Forthcoming enforcement of space allocation
The details of the policy used for (2) is yet to be decided but the general
outline is:

a. Only files needed by or recently produced by jobs running on ekman should
be on the cluster file system.

b. The system is owned by three different groups with differing needs of
storage. To allow the groups to be flexible with the allocation within their
group and to limit the effect of over-usage to stay within the group there
will be a monitored quota per group. The limits of the respective groups are
yet to be decided.

c. The file system will be divided into two branches - one called nobackup
and one called scratch each of which will have the following purposes:

    nobackup: Use for files that - while needed by jobs frequently running
on ekman - is not of a transient nature. Examples of this could be large
in-data sets that are used by several jobs running over several months. The
current plan is that this file system will be manually cleaned if need
arises.

    scratch: Use for files that are used by jobs on ekman but does not fall
into the nobackup-category. The current plan is that this file system will
be automatically cleaned by removing file older than a certain time. This
time will be tuned so that:
       a) The files are available while a job is running on the cluster
       b) After the job has run there is a reasonable chance to move the
files to e.g. vagn.
       c) There is a low likelihood of jobs failing due to a full file
system.

   and please note: NONE OF THESE FILES ARE BACKED UP.

If you have questions about this please address:

* Your principal investigator for questions about why we prioritize the
continuous running of jobs on Ekman even if it will require you to make
reruns to recreate removed results.
* Your technical contact person for the reasons behind the policy of the
division of scratch and nobackup.
* vagn-ekman-support at snic.vr.se for other questions.

Best regards,
Daniel Ahlin
PDC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.nsc.liu.se/pipermail/vagnekman-users/attachments/20091208/30fe03e0/attachment.htm