[Vagnekman-users] Re: Information about Ekman service windows during the next two to three weeks

lama at pdc.kth.se lama at pdc.kth.se
Tue Aug 25 00:08:36 CEST 2009


Hello,

an update as of 2009-08-24:

We have finished the first day of the 'bulk-cpu-replacement.'

Roughly a quarter of the cpus were replaced today, and we
did just allow jobs to start again.

We will not let any of the nodes with cpus replaced today
go back to production. Those nodes have not been tested yet.

There are 254 nodes on-line. These got new cpus as reported
before, and you have been running on them for quite some time.

As at least 3/4 of the work remain, and there always is
the risk of doing a mistake, jobs could get damaged.

In case you experience an unexpected job loss, please let us know.

Also, as the machine is much smaller right now but your jobs
are not - you will experience a different job-flow. Small (thin)
jobs will block large (wide) jobs, and vice versa, more often.

regards,
lars/pdc-staff.
- - - 
> an update as of 2009-08-21:
>
> We have gotten enough, if not all, of the replacement CPUs.
> We will go ahead with replacement, Monday morning (2009-08-24.)
>
> regards,
> lars/pdc-staff.
> - - - 
>> an update as of 2009-08-20:
>>
>> The trucks with CPUs will arrive tomorrow Friday, 2009-08-21.
>>
>> Replacement work can start by Monday morning, 2009-08-24.
>>
>> New scheduling block set to Monday morning (09:00.)
>>
>> $
>> $ spstatus           # or spq
>> [..]
>> ---- System Actualities ----
>>
>> Note: All reserved between 2009-08-24/09:00:00 and 2009-08-25/09:00:00 (24h)
>> $
>> $
>>
>> When we consider the 'CPU replacement assembly lines' to work smooth
>> and safe enough, we will re-enable parts of the system where CPUs
>> already have been replaced. Please consider the 24hour duration
>> indicated above an early approximate.
>>
>> regards,
>> lars/pdc-staff.
>> - - - 
>>   [..]
>>
>>> there are unfortunatelly further delays in the delivery of CPUs.
>>>
>>> The start of the upgrade is postponed until Thursday morning.
>>> This is reflected in the output of, i.e. spstatus or spq
>>>
>>> ekman$
>>> ekman$ spstatus
>>> [..]
>>> ---- System Actualities ----
>>>
>>> Note: All reserved between 2009-08-17/09:00:00 and 2009-08-17/12:00:00 (3h)
>>> Note: All reserved between 2009-08-20/09:00:00 and 2009-08-21/09:00:00 (24h)
>>> ekman$
>>>
>>> No jobs are allowed to execute over the above reservations. The
>>> above window(s) are not static. Once the upgrade work is flowing,
>>> we will try to make a sub-set of the system available for jobs.
>>>
>>> We are sorry for the inconvenience.
>>>
>>> regards,
>>> lars/pdc-staff.
>>>
>>>   [..]
>>>
>>>>> Due to delays in delivery of the CPUs the update schedule will be
>>>>> revised as follows:
>>>>> All replacements will be carried out between 17/8 and 21/8. We will
>>>>> make an effort to keep as many nodes as we can available during the
>>>>> upgrade but there may be as few as about 230 nodes available during
>>>>> that period (these 230 nodes were upgraded the last week).
>>>>> Also - Please note that from now until all nodes has been updated
>>>>> Ekman will contain nodes with two different CPU's. We expect the
>>>>> performance difference to be very small but you should still be aware
>>>>> that there may be some. If you want to know the cpu version on the
>>>>> node you are running on you can always do:
>>>>> grep "model name" /proc/cpuinfo | sort --unique


More information about the Vagnekman-users mailing list