[Dunder-users] lustre

Per Lundqvist perl at nsc.liu.se
Fri Mar 23 17:19:45 CET 2007


Important information regarding the Lustre filesystems on Dunder & 
Tornado:

 . As users you could _almost_ live without this information except
   for the case when the Lustre filesystem gets nearly full (such as the 
   case now for /nobackup/rossby10). It is also good to have a basic 
   understanding of the difference between a Lustre filesystem and a NFS 
   filesystem.

Current Lustre filesystems on Dunder and Tornado:
/nobackup/misu2
/nobackup/rossby9
/nobackup/rossby10
/nobackup/smhid4

Current NFS filesystems:
/home + the other /nobackup/* filesystems

Lustre differs from NFS in that it is a parallel filesystem. The 
filesystem is distributed among several file servers - in comparison to 
NFS, where 1 filesystem is exported from only 1 file server. In our Lustre 
configuration we have 1 server for metadata and >=1 servers for data for 
all Lustre filesystems.

For example, /nobackup/rossby10 has the following configuration:
  . 1 Metadata server (MDS)
  . 4 Data servers (OST)

When new files are created on one of our Lustre filesystems, each file 
will by default be stored on only 1 OST. The filesystem will also try to 
distribute the files in a round-robin fashion, trying to even out the disk 
usage over the OST:s.

The layout of a Lustre filesystem may be seen with:

  lfs df

Which, for /nobackup/rossby10 gives:
UUID                 1K-blocks      Used Available  Use% Mounted on
rossby10_mds_UUID     54421096   3674780  50746316     6 /nobackup/rossby10[MDT:0]
rossby10_ost1_UUID   2403041708 2266059088 136982620    94 /nobackup/rossby10[OST:0]
rossby10_ost2_UUID   2403041708 2254670720 148370988    93 /nobackup/rossby10[OST:1]
rossby10_ost3_UUID   2403041708 2364664728  38376980    98 /nobackup/rossby10[OST:2]
rossby10_ost4_UUID   2403041708 2288188812 114852896    95 /nobackup/rossby10[OST:3]

i.e. it displays df information for all involved fileservers in KiB). The 
corresponding df gives:

Filesystem           1K-blocks      Used Available Use% Mounted on
mds2:/rossby10_mds/rossby10_client
                     9612166832 8650260656 473584524  95% /nobackup/rossby10

Note the different disk usage on the OSTs.

df claims that the total space available is 451.6 GiB, but the limit of 
the largest possible file is always determined by the OST with least 
amount available - in this case 36.6 GiB. Trying to create a file larger 
than 36 GiB on /nobackup/rossby10 might fail.

This obviously becomes a problem when the filesystem is nearly full, since 
you might believe that there are more space available then there really 
are. Your jobs might be terminated when they unexpectedly run out of disk 
space.

   Please, almost every filesystem perform badly in some way or another 
   when its nearly full. We strongly recommend you to either free up some 
   space (remove or copy to NSC storage) or use another filesystem when 
   this is the case.

---

It is possible for you to change the default behaviour for file 
distribution in Lustre. As an alternative to distributing whole files over 
OSTs you may spread each individual file over all OSTs (stripe). This 
affects both performance and reliability in the unlikely case of a failure 
of 1 OST.

  . We recommend you to leave the striping behaviour as default, except 
    for large files (>1GiB). Large files will perform better when striped 
    and will not create such a large inbalance between the OSTs.

We have created a couple of wrapper scripts for this. 

  Please note: Striping behaviour is set on directories. Only files and 
  directories created within this directory and after the striping 
  behaviour has changed will be affected.

I.e. It is not possible to change the distribution of an already existing 
file. E.g. consider the following existing file structure:

      /nobackup/rossby10/perl/dir1/
      /nobackup/rossby10/perl/dir1/dir2
      /nobackup/rossby10/perl/dir1/dir2/file3
      /nobackup/rossby10/perl/dir1/dir2/file4
      /nobackup/rossby10/perl/dir1/file1
      /nobackup/rossby10/perl/dir1/file2

  . changing striping behaviour on /nobackup/rossby10/perl/dir1/ will
    not affect striping behaviour on directory dir2 or the files 
    file1-file4.

  . Afterward, creating a new directory in dir1 called dir3 will have
    the new striping behaviour, and so also any files/directories created 
    within dir1 and dir3. Any files/directories created within dir2 will 
    have the old striping behaviour.
 
The wrapper scripts are called:

   lustregetstripeinfo            # get current striping information
                                  # for a file/directory

   lustresetdefault               # set file distribution to default
                                  # (i.e. not striped)

   lustresetnotstriped            # set file distribution to not striped

   lustresetstriped               # set file distribution to striped
                                  # among all OSTs

Example:

   $ lustregetstripeinfo /nobackup/rossby10/perl/dir1/
   /nobackup/rossby10/perl/dir1/: is striped

   $ lustresetnotstriped /nobackup/rossby10/perl/dir1/
   Striping properties updated for directory: /nobackup/rossby10/perl/dir1/

   $ lustregetstripeinfo /nobackup/rossby10/perl/dir1/
   /nobackup/rossby10/perl/dir1/: is not striped

   $ echo test > /nobackup/rossby10/perl/dir1/file3

   $ lustregetstripeinfo /nobackup/rossby10/perl/dir1/file3
   /nobackup/rossby10/perl/dir1/file3: is not striped

(this information will also be available in the corresponding user guides)

/Per

-- 
Per Lundqvist

National Supercomputer Centre
Linköping University, Sweden

http://www.nsc.liu.se


More information about the dunder-users mailing list