There are a number of factors at play when the scheduler is deciding when to run your job. This page attempts to summarise the key factors.
There is also a talk here that summarises how Slurm works.
First and foremost, there needs to be resources available, matching the specification of your job, for your job to be able to run. If Viking is very busy, these resources are unlikely to be immediately available, so your job will be placed in a queue.
Once your job has entered the queue for resources, it is assigned a priority. This is calculated as the sum of several priority factors:
Priority Factors Used On Viking:
- Age- a function of how long your job has been queueing, 0 - 1000 over 7 days of queueing on Viking
- FairShare - a value determined by how much you have recently used Viking, 0 - 1000 proportional to recent usage, effect of recent usage halved after 7 days
Job Size - a function of how many nodes/cores requested, small priority given to larger jobs on Viking, preventing large job starvation
TRES - a function of how many cores / how much memory has been requested, 0 - 1000 (cores), 0 - 2000 (memory) on Viking
Priority Factors Not Used On Viking:
- Assoc - a value specific to the user account that submitted the job, 0 on Viking
- Nice - user controlled job priority factor, unavailable on Viking
- Partition - a factor associated with the partition your job was submitted to, 1000 for every partition on Viking, essentially 0
- Quality of Service (QOS) - a factor associated with the QoS that your job was submitted with, not applied on Viking
- Site - an administrator controlled priority factor, 0 on Viking
When your job begins queueing, this factor will be 0. Over the course of 7 days of queueing, this factor will increase to 1000. This creates a balance between those users who have never / not recently used Viking and those whose jobs have been waiting to run.
When you first start using Viking, this factor will be 500. As you use Viking, your usage is 'billed', and your FairShare value decreases proportionally to your usage, driving this value down. Over time, this value will recover! The effect of a job on your FairShare value is halved after 7 days.
More cores = higher CPU TRES priority; more memory = higher Mem TRES priority. But of course, you will be billed for more usage, which will drive your FairShare down!
Checking Job Priority
You can see how these priorities have been calculated for your job using the `sprio` utility, e.g.
A nice feature of the Viking scheduler is that, where possible, it will run your job even if it has a lower priority than other people's jobs! Provided the execution of your job won't disturb the planned execution time of jobs that are in front of you in the queue, the scheduler will put your job through early. So, don't worry if you're running a lot of jobs and think you're going to end up having to wait behind every other Viking user to get your jobs serviced - there's a good chance that the scheduler will be able to run your jobs ahead of the worst-case schedule.
Rule of Thumb
A rule of thumb I like to use is to expect to queue for as long as I have requested a job to run for, e.g. if I have requested a single core and the default amount of memory in the `nodes` partition for two hours, I expect to wait two hours for that job to run. Often, that job will run much sooner than that! Of course, this rule of thumb changes with resource requirements - the trickier it is going to be for the scheduler to find resources for my job, the longer I expect to queue. I would expect the scheduler to be able to squeeze in a single core, default memory, two hour runtime fairly quickly, as that amount of resources is likely to be lying around in between scheduling the other jobs on Viking. I would not expect the scheduler to be able to easily find e.g. a block of 500 cores, so I would temper my expectations accordingly.
Tips For (Potentially) Quicker Scheduling
- Request resources for the time that you need them - on Viking, the scheduler is likely to have a better time finding shorter blocks of resources
- Request the amount of resources that your jobs will actually use - over-requesting means that the scheduler has to find bigger blocks of resources for your jobs, reducing the chance of backfill. We put information about your jobs resource usage in your job output logs, which can be used to tune your job scripts. Additionally, you will be billed for resources that you don't use, so your FairShare will suffer!
- Do as much as you can with the resources you are allocated - a Slurm job script specifies resources that you would like, and tasks to run on those resources. You are not limited to doing one thing! For example, you can combine your preprocessing, analysis, and post-processing into a single job, rather than run all three stages as three separate jobs, each at the mercy of the queue