[Tetralith-users] Urgent reboot of Tetralith login node(s)

Mats Kronberg kronberg at nsc.liu.se
Thu Dec 14 10:53:19 CET 2023


Dear Tetralith users,

Unfortunately we need to reboot the login node tetralith2 (a.k.a
tetralith.nsc.liu.se, tetralith-el9.nsc.liu.se) this afternoon at
13:00. Estimated downtime: 30 minutes.

It is also possible we will need to reboot tetralith1 (a.k.a
tetralith-el7.nsc.liu.se) sometime this afternoon or tomorrow. If this
becomes necessary we will warn logged-in users as far ahead as
possible. Estimated downtime: 10-60 minutes.

Apologies for the short notice, but both of these problems are likely to
get worse over time and we want to ensure problem-free running over the
weekend.



Technical details for those interested:

The tetralith2 problem is caused by a bug in the storage software. It
results in directories becoming inaccessible (has only happened to a few
users, but to those it's a major problem) as well as preventing some
other operations like taking snapshots and deleting old project
directories. We are working with the software vendor to find a permanent
fix. The reboot should clear the current problems, and we will also
downgrade the software to a version we believe is more stable.

The tetralith1 problem is caused by Thinlinc not releasing licenses when
user sessions end. This means that we are slowly running out of Thinlinc
licenses, which eventually will prevent new Thinlinc logins to
tetralith1. We are still discussing this with the software vendor, but
so far it looks like we might need to shut down tetralith1 to clear the
license database.



-- 
Mats Kronberg
National Supercomputer Centre (NSC)


More information about the Tetralith-users mailing list