[Neolith-users] Neolith downtime tomorrow

Pär Andersson paran at nsc.liu.se
Mon Oct 27 14:12:43 CET 2008


Dear users of Neolith,

Tuesday this week (tomorrow) Neolith will be unavailable from 10:00. Sorry for 
the short notice about this.

The login node will probably be available the whole day, but we can't 
guarantee this.

> We have at a number of occasions experienced unexpected infiniband
> switch reboots in the Neolith cluster. The frequency at which these
> reboot events has occurred is very low, but at every such event a
> handful of MPI-applications have been disrupted.
>
> Working together with an engineering team from the switch vendor, we are
>   gathering information to be able to find the root cause of the
> problem, and in this process we believe that during the coming week
> there could be an elevated risk of switch reboot.

During the last week there have not been any InfiniBand switch reboots. 

To move forward with resolving this problem we need to install a new debug 
switch firmware provided by the vendor. This and some other routine 
maintenance will be performed tomorrow.

Best regards,
The Neolith Team


More information about the neolith-users mailing list