[Vagnekman-users] Current status of Vagn

Johan Raber raber at nsc.liu.se
Fri Nov 13 15:04:47 CET 2009


Dear Vagn users,

The extent of the damages incurred on the /nobackup/vagn1 filesystem and
what can be done about them is now becoming clear. Because of that, we are
now able to give you a cohesive status report and an account of what has
happened.

The data corruption on /nobackup/vagn1 was brought about by an accidental
update to the control mechanism for installing new OS images. This update
was not supposed to have happened and had the dire consequence that the
/nobackup/vagn1 disks were left exposed to the installation procedure which
partitioned two drives and installed an OS image on one of them. In
practise this meant that critical data blocks on the disk were written over
causing likely irreparable damage even though a comparatively small area of
the disk was written to.

Our corresponence with the filesystem vendor have so far been very
discouraging regarding the prospect of rescueing the filesystem or
recovering files from it. We are at this point corresponding with the
cluster software support team lead at IBM who is directly in touch with the
developers, so the the final verdict from them is authoritative. One final
round of correspondence have now gone to IBM, assuring ourselves that no
misunderstandings about the situation exists on their or our part regarding
the extent of the damage, current status and possibilities to rescue data.

After we have assured ourselves that no misunderstandings exist we will
attempt to rescue data no matter what, and whatever results will be
presented to you. At the very least you will be given a list of the files
you have lost.

For us at NSC, we have begun the process to reduce the risk of fatal
accidents of this kind, the causes of which are multi-factorial, ranging
from the obvious like insufficient safe-guarding the integrity of the
disks to insufficient dissemination of knowledge. We deeply regret this
situation, realising that this impacts your work severely. We can however
only offer you our sincerest apologies and, going forward, improvements to
our routines and setups. Unfortunately we will never be able to offer any
guarantees, only levels of security.

Please also let us know if you have large amounts of highly refined data
stored outside backup:ed areas as we go forward, so we can try to take
protective measures.

At present we are in the process of opening access to the login node so
that you may get hold of /home data again. The delicate state of the FS
unfortunately require certain guidance from IBM to safely bring up /home
again on the login node without ruining any chances we have with vagn1 and
we must ask for your further patience. At the beginning of next week we
have good hope to have got final word on what can be done about vagn1 and
some type of action will then be taken, at which point you will be
notified.

Best regards,
NSC support



More information about the Vagnekman-users mailing list