[Vagnekman-users] Vagn is back in normal operation again

Johan Raber raber at nsc.liu.se
Mon Dec 21 22:39:53 CET 2009


Dear Vagn users,

After the long lasting down period of Vagn, we are now happy to say that
Vagn is once more up to satisfy your computing urges. Not only is Vagn up
but it has also been enhanced storage-wise, 29 TB worth of enhancement to
be precise has been added to to /nobackup/vagn1.

Since the last update, we have made checksum comparisons between a set of
files from the restored filesystem and the corresponding files we read out
before the restoration where we luckily had reliable checksums. About 3500
large files where checked in this way and no differences between the sets
were found. The /nobackup/vagn1 filesystem was consequently rebuilt from
scratch and the read-out files where rsync'ed back from temporary storage
which was then in turn added to vagn1.

The results of our damage assessment are as follows: ~180 files are lost,
515 are definitely affected to some extent and as many as ~50000 files
could be affected. A short explanation is merited. Since the contents of
most every larger file in a GPFS filesystem is spread over several disks,
possibly non-contiguous on these disks, some part of files with extents
within the affected regions may be damaged. A comparatively small amount of
data was actually written to disk (~4G) of which most was densely written
at the start of the first disk.

We have therefore put lists of names on the affected and possibly affected
files in each users home directory classified as very suspect and suspect
files named as very.suspect.files.${username} and
suspect.files.${username} respectively. The lost files names are in the file
"definitely_gone.txt" and these are without path. If you recognize a name
in this list as likely yours, you are probably correct as all your naming
schemes differ markedly. We can not judge the actual extent to which these
files are affected or which are unaffected. To be perfectly honest we can't
guarantee that this list is exhaustive but we have no reason to think it is
not.

It is difficult to predict exactly how errors in files will manifest
themselves. A text file for instance may seem perfectly intact apart from
some garbled section. The corresponding damage could quite possibly make a
binary format file completely useless and be unreadable by the application
intended to use them. However, you will most likely notice it clearly when
damaged files turn up.

To ensure this mishap doesn't repeat the following steps have been taken so
far:
* The driver for the interface to the GPFS storage has been removed from
  the installation system, in effect making the GPFS storage invisible during the
  installation process.
* Checks have been put in place that ensures installation is only done on disks
  of "blessed" sizes and that the GPFS storage can't be seen.
This is an ongoing process and we investigate additional safety measures as
well.

In conclusion we would like to thank you, our users, for the patience you
have shown and the constructive suggestions we have received in this
lengthy process. Obviously we regret all damages and inconveniences you may
have had as well.

Best regards,
NSC support



More information about the Vagnekman-users mailing list