[Vagnekman-users] update: forwarded flashnews - lost file-server
lars malinowsky
lama at pdc.kth.se
Fri Nov 20 10:50:54 CET 2009
Hello again,
2009-11-20 at 10:40
afs-server now salvaged, gradually resuming batch-operations.
as mentioned earlier - an explicit impact on batch jobs
is that they could not change state (since ~2300 last night)
otherwise they should be unaffected.
The impact for users/jobs having files/applications on the broken
server is that access to those typically have failed with messages
similar to 'connection timed out'
regards,
lars/pdc-staff.
- - -
> Hello,
>
> forwarding this out of the flash-news at www.pdc.kth.se
>
> - - -
> 2009-11-20 at 06:05
> We seem to have problems with at least one afs-server.
> Home catalogues, applications, and also batch-job-processing
> affected.
> - - -
>
> In addition to random home-catalogues, and applications, important
> parts of data of all batch-jobs, node-reservations et cetera for
> ekman, lives on this file-server.
>
> The impact is that no changes of jobs/nodes can be made,
> your reserved nodes/running jobs will stay reserved/running.
>
> Once the server is back on-line again they will show up. However,
> any attempted changes (i.e. jobs finishing and nodes released)
> during the outage will have failed.
>
> Should the server fail to come on-line we have backups on tape.
>
> sorry for the inconvenience,
>
> regards,
> lars/pdc-staff.
More information about the Vagnekman-users
mailing list