[Gimle-users] Gimle login node crashes, Wednesday status update

KentEngström kent at nsc.liu.se
Wed Feb 2 17:39:02 CET 2011


kent at nsc.liu.se (Kent Engström) writes:
> Dear Gimle Users,
>
> please bear with me for this rather long status update on the login node
> crash problem.
...

> We will try to find out if there are parameters in the Lustre
> clients/servers and the kernel we can adjust, but until we find that,
> we must suggest some other ways forward.

We got some interesting information yesterday, when a node was crashed
in relation to this problem. Using that information to focus our
attention, we have found that a feature in Lustre called statahead
(which is there to speed up things when one stats a lot of files in a
directory) might be buggy.

When testing a simple "ls -l" in a directory with 100000 files in it,
we saw no difference in speed with and without statahead, so it does not
appear to be vital for our performance.

We have now disabled the statahead feature on the Gimle login node and
the compute nodes.

If we are lucky, we have solved the login node crash problem. If not,
we'll continue looking.

And yes, we still recommend that you limit yourself to a reasonable
number of files per directory.


Sincererly,
-- 
Kent Engström, National Supercomputer Centre
kent at nsc.liu.se, +46 13 28 4444


More information about the Gimle-users mailing list