[Bi-users] Update 2 - Re: Problems with smhid16 and rossby23 at the moment

Fredrik Nyström freny at nsc.liu.se
Fri Mar 6 17:09:13 CET 2020


On 2020-03-06 16:12, Fredrik Nyström wrote:
> On 2020-03-06 14:16, Fredrik Nyström wrote:
>> On 2020-03-06 11:00, Fredrik Nyström wrote:
>>> Dear Bi Users,
>>>
>>> we are having problems with 4 out of 6 servers for smhid16 and rossby23.
>>>
>>> Servers will be rebooted shortly...
>>
>> Servers has been rebooted and access to smhid16 and rossby23 has been 
>> restored.
>>
>> Recovery after reboot completed at 13:27 CET. Jobs using smhid16 and 
>> rossby23 before this time may have been affected.
> 
> Dear Bi Users,
> 
> We have been forced to reboot a server for smhid16 and rossby23 again.
> We think smhid16 is the file system that is overloaded in some way.
> 
> If you have started to run more or different jobs against smhid16 during
> the last day or so, please consider refraining for the time being,
> ideally until after the planned upgrade on 11-12/3.
> 
> If the file systems go down again during the weekend, we can not promise
> to bring them up promptly, and we have limited time for troubleshooting
> during Monday-Tuesday next week too as we need to prepare for the upgrade.

Same minute as I sent last email, metadata server for smhid16 and 
rossby23 crashed.

Recovery completed 16:55 CET.


Kind Regards,
-- 
Fredrik Nyström, National Supercomputer Centre
freny at nsc.liu.se


More information about the Bi-users mailing list