[Tornado-users] tornado: job failure

Miina Manninen katri.manninen at helsinki.fi
Thu Jul 3 15:18:14 CEST 2008


Dear Per,

I was using Tornado while studying at Stockholm University. Im no longer
there and do not use Tornado anymore. So, if You could delete my user
account and remove me from this mailing list. Thank You. If I remember right
my user name is kama_x or something like that.

Sincirely,
Miina Manninen



Lainaus Per Lundqvist <perl at nsc.liu.se>:

> Dear Tornado users, yesterday afternoon, at around 1500 CEST, the system
> 
> node on Tornado crashed. We got it up and running pretty quickly, but we
> 
> failed to notice that jobs had trouble starting to run after this
> incident 
> (jobs were hanging, producing no output).
> 
> This problem was caused by the license daemon on all the compute nodes 
> bailing out when they couldn't get in contact with the license server on
> 
> the system node.
> 
> It was remedied by restarting the license daemon on the compute nodes (at
> 
> approx 11:15 today). Unfortunately this seems to have caused hanging
> nodes 
> to abort with an error message like:
> 
>    --- mpimon --- n1: Error when receiving message  ---
>    --- mpimon --- Contact license at scali.com to request or check a
> license
>    --- ---
>    Jul  2 11:08:44: (mpimon at n1)(21763) Mutable error: subMonitor-1 exits
>    --- before allFinished is set
> 
> Please, check the status of your recent jobs, and resubmit if necessary.
> 
> 
> /Per
> 
> -- 
> Per Lundqvist
> 
> National Supercomputer Centre
> Linköping University, Sweden
> 
> http://www.nsc.liu.se


-- 


More information about the tornado-users mailing list