[Storage-users] NSC migftp (tapestorage) back in service

Peter Kjellstrom cap at nsc.liu.se
Tue Feb 19 17:39:06 CET 2008


Short version for the impatient: migftp now works again on the login-nodes of 
tornado, dunder and blixt with no known bugs. If you have previously 
used '-c' to any migftp put or similar please read on below.


The service stop for the NSC migftp/tape storage ended up alot longer than 
planned. This was due to the discovery, by a user, of failed (corrupted) file 
transfers. Since this involved data integrity we had no choice but to leave 
the system down until it had been properly investigated.

The results of the investigation are that using the '-c' option (continue) 
with migftp when doing put or mput sometimes caused only parts of files being 
correctly transfered. The investigation did not find any cases where a normal 
put/mput corrupted data.

We don't know how common the use of the '-c' option have been in the past and 
if you know that you have used it please attempt to verify past transfers or 
contact support for help.

The solution now implemented is to remove the possibility of using this 
option. The migftp now available on our systems know nothing of any '-c' 
option and has been verified to correctly transfer data.

During the investigation (and after) it would have been very nice to have 
checksums for user data. As such NSC would like to recommend users to (as 
soon after creation as possible) create checksums for datasets/datafiles. It 
is not very complicated and allows both system-administrators and users to 
verify the integrity of the data. Here follows a small example of how 
checksums can be generated and verified.

 $ ls -l
 -rw-r--r-- 1 cap cap 2097152 2008-02-19 17:29 datafile_1.gz
 -rw-r--r-- 1 cap cap 1048576 2008-02-19 17:29 datafile_2.gz
 $ md5sum *.gz > MD5SUMS
 $ echo "bad data" > datafile_1.gz
 $ md5sum -c MD5SUMS
 datafile_1.gz: FAILED
 datafile_2.gz: OK
 md5sum: WARNING: 1 of 2 computed checksums did NOT match
 $          

Explanation: We start with two datafiles, we generate checksums, we overwrite 
one of the files (corrupt it) and finally we ask md5sum to verify all files 
and we can clearly see that one of them is now damaged.


NSC apologises for the inconvenice this stop has caused and asks concerned 
users to contact support,
 Peter K and the NSC Support-team

-- 
------------------------------------------------------------
  Peter Kjellström               | E-mail: cap at nsc.liu.se
  National Supercomputer Centre  |
  Sweden                         | http://www.nsc.liu.se
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://www.nsc.liu.se/pipermail/storage-users/attachments/20080219/6364bcc3/attachment.bin


More information about the storage-users mailing list