[Neolith-users] Yesterday's service stop
Mattias Slabanja
slabanja at nsc.liu.se
Fri May 9 17:16:49 CEST 2008
Dear Neolith users.
The service stop went along according to plan, but took a bit longer
than initially anticipated. We were back in production yesterday
evening, when we opened up the cluster for queued jobs.
Summary of changes to the system
* The firmware of the central ethernet switch were upgraded, addressing
a stability issue. All other switches in the cluster had their firmware
upgraded to the latest bug fix version as well. The instability issue
have at an earlier occasion affected the availability of the /nobackup
file system.
* Switches were reconfigured to improve manageability.
* Compute node BMC firmware were upgraded to address a CMOS battery
related issue.
* Compute node System ROM were upgraded due to a dependency of the new
BMC firmware.
* Due to an unfortunate side effect of the system ROM upgrade which
affects the memory bandwidth of the 32 GiB nodes, the job scheduler is
for the time being configured to avoid intermixing of 32 and 16 GiB
nodes for all job reservations. Hence, a job will now always run on
either only 32 or only 16 GiB nodes.
Regards,
Neolith Admin
More information about the neolith-users
mailing list