[Tornado-users] Tornado is now available again
Johan Raber
raber at nsc.liu.se
Mon May 4 19:25:19 CEST 2009
Dear Tornado users,
The unfortunate outage of Tornado's services is now over, feel free to use
them again and please report any problems as soon as you see them.
The loss of services was due to the system node crashing, which in turn was
most likely caused by a change in the routines (prompted by security
concerns) for how vital configuration files are synced within the cluster.
A probable bug in a component used accumulated forked off processes,
eventually consuming all available memory. A work-around has been
implemented to avoid this problem.
Unfortunately some submitted jobs became casualties of this incident. As
reported by qstat -n, they were;
105388.torn sm_aulle rossbyq run_ecocliflake_ 32548 12 -- -- 144:0 R --
n104/1+n104/0+n103/1+n103/0+n102/1+n102/0+n101/1+n101/0+n100/1+n100/0+n99/1
+n99/0+n98/1+n98/0+n97/1+n97/0+n96/1+n96/0+n95/1+n95/0+n94/1+n94/0+n93/1
+n93/0
105483.torn sm_stran rossbyq runA2_1_run0 577 15 -- --
144:0 R 00:00
n48/1+n48/0+n47/1+n47/0+n34/1+n34/0+n33/1+n33/0+n32/1+n32/0+n31/1+n31/0
+n30/1+n30/0+n29/1+n29/0+n28/1+n28/0+n27/1+n27/0+n26/1+n26/0+n25/1+n25/0
+n12/1+n12/0+n11/1+n11/0+n10/1+n10/0
105529.torn sm_jenbr rossbyq b30.004.1n 10950 16 -- --
75:00 R --
n107/1+n107/0+n106/1+n106/0+n87/1+n87/0+n86/1+n86/0+n82/1+n82/0+n68/1+n68/0
+n67/1+n67/0+n66/1+n66/0+n65/1+n65/0+n64/1+n64/0+n46/1+n46/0+n45/1+n45/0
+n44/1+n44/0+n43/1+n43/0+n42/1+n42/0+n41/1+n41/0
105542.torn x_sema misuq run_MC 4665 4 -- --
100:0 R 00:00
n20/1+n20/0+n19/1+n19/0+n18/1+n18/0+n17/1+n17/0
105598.torn x_maxbe misuq LGM_ORCA05_LIM2 2156 8 --
-- 70:00 R --
n119/1+n119/0+n118/1+n118/0+n117/1+n117/0+n116/1+n116/0+n115/1+n115/0+n114/1
+n114/0+n113/1+n113/0+n112/1+n112/0
105600.torn x_maxbe misuq LGM_ORCA05_LIM2 4290 8 --
-- 40:00 R 00:00
n111/1+n111/0+n110/1+n110/0+n109/1+n109/0+n108/1+n108/0+n105/1+n105/0+n92/1
+n92/0+n91/1+n91/0+n90/1+n90/0
We regret any inconveniance or problem this has caused you and thank you for
swiftly notifying us of the problems as they occurred.
Regards,
NSC support
--
Johan Raber, PhD
Systems expert -- distributed computing
National Supercomputer Center, Linköping University
SE-581 83 LINKÖPING, SWEDEN
More information about the tornado-users
mailing list