[Triolith-users] Scheduling problem on Triolith (2015-08-10)

Marvin Lie marvin at nsc.liu.se
Mon Aug 10 09:18:41 CEST 2015


Dear All,

I have taken a look at our scheduler status around 08:15.
There are more than 1000 nodes idle but many jobs are held with status 
'Resource'.
I confirmed that there were no large reservations being made.
There was also no high priority wide job hogging the queue, requesting 
1000 nodes.
That does not look right to me.
I then restarted our scheduler Slurm at 08:21 and 08:22.
Twice and the problem was still there.
Something has caused Slurm to not be able to schedule the jobs.
That left me no choice to start the scheduler with a clean state at 08:38.
Unfortunately, this causes many running jobs to fail and pending jobs 
cleared.
I apologize for the inconvenience and we need your cooperation to 
resubmit the job again.
We will investigate this issue further with our scheduler expert who is 
still on vacation.


Regards,
Marvin - NSC





More information about the Triolith-users mailing list