Physnodes - 5) Simple Matlab, Stata and Gaussian jobs


Matlab

Please note, running Matlab on the physnodes is not the same as running Matlab via a GUI on your desktop. There is no GUI when running a Matlab script, you write a job script which tells the grid engine how to run your Matlab program.

Please do not run the Matlab GUI on the login nodes, they are not powerful enough to run Matlab, and you will significantly effect performance for other users and yourself.

To use matlab, the matlab compiler, or a standalone executables created with the matlab compiler, you must first load the matlab and compiler runtime modules via the commands:

Loading the Matlab and Compiler Runtime modules
-bash-4.1$ module load Matlab/R2015a
-bash-4.1$ module load MatlabCompilerRuntime/R2015a

Important note: Licence limits

The University holds a limited number of Matlab licenses in the form of campus-wide floating licenses. An unlimited number of licenses for individual toolboxes are also available. Each Matlab job will consume one matlab general license, plus one license for each toolbox used. To prevent exhasuting the finite number of licenses, you must reserve one Malab license (-l matlab=1), this stops your jobs starting and failing when no licenses are available.

Useful Matlab command line options

Option
-nosplashstarts MATLAB but does not display the splash screen during startup

-nodesktop

 do not start the MATLAB desktop. Use the current terminal for commands. The Java virtual machine will be started.

-nodisplay


-r "command""command" starts MATLAB and executes the specified MATLAB command. Include the command in double quotation marks ("command"). If command is the name of a MATLAB function or script, do not specify the file extension. To separate multiple statements, use semicolons or commas.
-logfile <filename>starts MATLAB and makes a copy of any output to the Command Window in file log. The output includes all crash reports.
-singleCompThreadlimits MATLAB to a single computational thread. By default, MATLAB uses the multithreading capabilities of the computer on which it is running.


Running a single copy of your Matlab program

twotimestable.m
for x = 1:10
fprintf('%d => %d\n', x, 2 * x);
end
twotimestable.job
#$ -cwd -V
#$ -l h_rt=0:05:00
#$ -l h_vmem=4G
# #$ -l matlab=1
#$ -o logs
#$ -e logs
echo "I am running a Matlab script using ${NSLOTS} slot(s) on `hostname`"
matlab -nosplash -nodesktop -nodisplay <twotimestable.m >outputs/twotimestable.out
 
 




Job Run
-bash-4.1$ qsub twotimestable.job
Your job 431619 ("twotimestable.job") has been submitted
-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 431619 0.00000 twotimesta abs4         qw    09/14/2014 10:52:28                                    1
-bash-4.1$ qstat
-bash-4.1$ more logs/twotimestable.job.*
::::::::::::::
logs/twotimestable.job.e431619
::::::::::::::
::::::::::::::
logs/twotimestable.job.o431619
::::::::::::::
I am running a Matlab script using 1 slot(s) on rnode5
-bash-4.1$ more outputs/twotimestable.out
                            < M A T L A B (R) >
                  Copyright 1984-2014 The MathWorks, Inc.
                    R2014a (8.3.0.532) 64-bit (glnxa64)
                             February 11, 2014

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
>> 1 => 2
2 => 4
3 => 6
4 => 8
5 => 10
6 => 12
7 => 14
8 => 16
9 => 18
10 => 20
>>


Running multiple Matlab programs

timestable.m
function timestable (n)
    x = str2num(n)
    fprintf('%d times table\n', x)
    for y = 1:12
        fprintf('%d x %d => %d\n', y, x, y * x);
    end
timestable.job
#$ -cwd -V
#$ -l h_rt=0:05:00
#$ -l h_vmem=4G
# #$ -l matlab=1
#$ -o logs
#$ -e logs
#$ -t 1-20
echo I am running a Matlab script timestable ${SGE_TASK_ID} on `hostname`
matlab -nosplash -nodesktop -nodisplay -r "timestable ${SGE_TASK_ID}" >outputs/timestable.${SGE_TASK_ID}
Running multiple Matlab programs
-bash-4.1$ qsub timestable.job
Your job-array 431623.1-20:1 ("timestable.job") has been submitted
-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 431623 0.00000 timestable abs4         qw    09/14/2014 11:24:51                                    1 1-20:1
-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-long@rnode1.york.ac.uk         1 1
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode16.york.ac.uk         1 2
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 test@rnode5.york.ac.uk             1 3
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode21.york.ac.uk         1 4
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode11.york.ac.uk         1 5
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode15.york.ac.uk         1 6
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-long@rnode0.york.ac.uk         1 7
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode7.york.ac.uk          1 8
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode18.york.ac.uk         1 9
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-long@rnode2.york.ac.uk         1 10
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode9.york.ac.uk          1 11
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode14.york.ac.uk         1 12
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode17.york.ac.uk         1 13
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode20.york.ac.uk         1 14
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-long@rnode4.york.ac.uk         1 15
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode19.york.ac.uk         1 16
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode16.york.ac.uk         1 17
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode21.york.ac.uk         1 18
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode15.york.ac.uk         1 19
 431623 0.55500 timestable abs4         r     09/14/2014 11:24:56 its-day@rnode11.york.ac.uk         1 20
-bash-4.1$ qstat
-bash-4.1$ ls outputs
timestable.1   timestable.13  timestable.17  timestable.20  timestable.6
timestable.10  timestable.14  timestable.18  timestable.3   timestable.7
timestable.11  timestable.15  timestable.19  timestable.4   timestable.8
timestable.12  timestable.16  timestable.2   timestable.5   timestable.9
-bash-4.1$ more outputs/timestable.19
                            < M A T L A B (R) >
                  Copyright 1984-2014 The MathWorks, Inc.
                    R2014a (8.3.0.532) 64-bit (glnxa64)
                             February 11, 2014

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

x =
    19
19 times table
1 x 19 => 19
2 x 19 => 38
3 x 19 => 57
4 x 19 => 76
5 x 19 => 95
6 x 19 => 114
7 x 19 => 133
8 x 19 => 152
9 x 19 => 171
10 x 19 => 190
11 x 19 => 209
12 x 19 => 228
>>
-bash-4.1$ ls logs
timestable.job.e431622.1   timestable.job.e431623.16  timestable.job.o431622.4
timestable.job.e431622.10  timestable.job.e431623.17  timestable.job.o431622.5
timestable.job.e431622.11  timestable.job.e431623.18  timestable.job.o431622.6
timestable.job.e431622.12  timestable.job.e431623.19  timestable.job.o431622.7
timestable.job.e431622.13  timestable.job.e431623.2   timestable.job.o431622.8
timestable.job.e431622.14  timestable.job.e431623.20  timestable.job.o431622.9
timestable.job.e431622.15  timestable.job.e431623.3   timestable.job.o431623.1
timestable.job.e431622.16  timestable.job.e431623.4   timestable.job.o431623.10
timestable.job.e431622.17  timestable.job.e431623.5   timestable.job.o431623.11
timestable.job.e431622.18  timestable.job.e431623.6   timestable.job.o431623.12
timestable.job.e431622.19  timestable.job.e431623.7   timestable.job.o431623.13
timestable.job.e431622.2   timestable.job.e431623.8   timestable.job.o431623.14
timestable.job.e431622.20  timestable.job.e431623.9   timestable.job.o431623.15
timestable.job.e431622.3   timestable.job.o431622.1   timestable.job.o431623.16
timestable.job.e431622.4   timestable.job.o431622.10  timestable.job.o431623.17
timestable.job.e431622.5   timestable.job.o431622.11  timestable.job.o431623.18
timestable.job.e431622.6   timestable.job.o431622.12  timestable.job.o431623.19
timestable.job.e431622.7   timestable.job.o431622.13  timestable.job.o431623.2
timestable.job.e431622.8   timestable.job.o431622.14  timestable.job.o431623.20
timestable.job.e431622.9   timestable.job.o431622.15  timestable.job.o431623.3
timestable.job.e431623.1   timestable.job.o431622.16  timestable.job.o431623.4
timestable.job.e431623.10  timestable.job.o431622.17  timestable.job.o431623.5
timestable.job.e431623.11  timestable.job.o431622.18  timestable.job.o431623.6
timestable.job.e431623.12  timestable.job.o431622.19  timestable.job.o431623.7
timestable.job.e431623.13  timestable.job.o431622.2   timestable.job.o431623.8
timestable.job.e431623.14  timestable.job.o431622.20  timestable.job.o431623.9
timestable.job.e431623.15  timestable.job.o431622.3
-bash-4.1$

The Matlab Compiler

The Matlab compiler is used to create a standalone Matlab application. The compiler will compile your .m code into an executable,which should run much faster than running a .m script or function. In addition, you can run the compiled code anywhere without worrying about license restrictions.

For a full description of how to use the matlab compiler, please refer to Getting Started with MATLAB Compiler. The rest of this section gives a brief guide to creating and running standalone Matlab applications on the physnodes.

Checking if your program can be compiled. 

Not all Matlab code can be compiled. The following resources will assist your in deciding if your code can be compiled:

Ineligible toolboxes

Matlab features supported by the compiler

If your Matlab scripts cannot be compiled, you have to run your Matlab script, in batch mode, directly on the cluster. 

Code preparation

Matlab script files cannot be compiled directly. Instead, they must be converted into a prepackaged Matlab function. Normally, this is simply a case of wrapping the main section of code within a function. See the Scripts and Functions section of the Matlab Compiler Getting Started guide for more details.

Example Matlab Script as a Function
-bash-4.1$ more twotimestable.m 
for x = 1:10
    fprintf('%d => %d\n', x, 2 * x);
end

-bash-4.1$ more twotimestablefunc.m 
function twotimestable
    for x = 1:10
    fprintf('%d => %d\n', x, 2 * x);
end

The variable 'n' is the parameter passed into to program from the command line.

Compiling and running the application

The Matlab compiler tool is mcc. It can be invoked from within matlab itself, or on the command line:

Compling a Matlab function
-bash-4.1$ more twotimestablefunc.m 
function twotimestable
  for x = 1:10
    fprintf('%d => %d\n', x, 2 * x);
  end
-bash-4.1$ mcc -m twotimestablefunc.m 
-bash-4.1$ ./twotimestablefunc 
1 => 2
2 => 4
3 => 6
4 => 8
5 => 10
6 => 12
7 => 14
8 => 16
9 => 18
10 => 20
-bash-4.1$

Compilation will result in a number of files - the standalone executable will be named after the first script file use in the compilation. Note that a standalone wrapper scrypt will also be created, using the file name prefixed by run_. This file can ignored when using the the physnodes - its purpose is to set up the environmental variables for the Matlab Compiler Runtime, which is already done by the modules environment.

Matlab stand-alone applications require the MatlabCompilerRuntime module to be loaded before submitting batch job script. This only has be be done once per session. Once loaded, the stand-alone can be run just the same as any executable from the job script.

Running your compiled Matlab program on the cluster

Running a single instance of your Matlab program

This method would be used where you may wish to run your program for many hours on a resource with more memory than say your desktop.

Matlab Grid Engine Job Script
-bash-4.1$ more twotimestable.job 
#$ -cwd -V
#$ -l h_rt=0:05:00
#$ -l h_vmem=4G
#$ -o logs
#$ -e logs
echo I am running a Matlab executable using ${NSLOTS} slot(s) on `hostname`
./twotimestable
-bash-4.1$ qsub twotimestable.job 
Your job 11359 ("twotimestable.job") has been submitted
-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
  11347 0.55500 queue_test abs4         qw    06/30/2014 16:24:37                                    1        
  11348 0.55500 queue_test abs4         qw    06/30/2014 16:24:41                                    1        
  11349 0.55500 queue_test abs4         qw    06/30/2014 16:36:10                                    1        
  11359 0.00000 twotimesta abs4         qw    07/01/2014 14:53:22                                    1        
-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
  11359 0.55500 twotimesta abs4         r     07/01/2014 14:53:33 its-cluster@rnode0.york.ac.uk      1        
  11347 0.55500 queue_test abs4         qw    06/30/2014 16:24:37                                    1        
  11348 0.55500 queue_test abs4         qw    06/30/2014 16:24:41                                    1        
  11349 0.55500 queue_test abs4         qw    06/30/2014 16:36:10                                    1        
-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
  11347 0.55500 queue_test abs4         qw    06/30/2014 16:24:37                                    1        
  11348 0.55500 queue_test abs4         qw    06/30/2014 16:24:41                                    1        
  11349 0.55500 queue_test abs4         qw    06/30/2014 16:36:10                                    1   
-bash-4.1$ more logs/twotimestable.job.e11359 
tset: standard error: Inappropriate ioctl for device

-bash-4.1$ more logs/twotimestable.job.o11359 
I am running a Matlab executable using 1 slot(s) on rnode0
1 => 2
2 => 4
3 => 6
4 => 8
5 => 10
6 => 12
7 => 14
8 => 16
9 => 18
10 => 20
-bash-4.1$ 

Memory allocation

Your Matlab job will be terminated if it cannot get enough memory. Please allocate memory liberally. As a rule of thumb, if your normal computer has nGB of memory, you should allocate 2 x nGB of memory to your job script. That is because the nodes on the clusters do not swap.

Running multiple copies of your program on multiple nodes

The following example shows how to run 20 copies of your program.

Matlab program to calculate times tables
-bash-4.1$ more timestable.m
function timestable (n)
    x = str2num(n)
    fprintf('%d times table\n', x)
    for y = 1:10
        fprintf('%d x %d => %d\n', y, x, y * x);
    end
-bash-4.1$ mcc -m timestable.m
-bash-4.1$ more timestable.job 
#$ -cwd -V
#$ -l h_rt=0:05:00
#$ -l h_vmem=4G
#$ -o logs
#$ -e logs
#$ -t 1-20
echo I am running a Matlab executable timestable ${SGE_TASK_ID} on `hostname`
./timestable ${SGE_TASK_ID}
-bash-4.1$ qsub timestable.job 
Your job-array 11360.1-20:1 ("timestable.job") has been submitted
-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
  11347 0.55500 queue_test abs4         qw    06/30/2014 16:24:37                                    1        
  11348 0.55500 queue_test abs4         qw    06/30/2014 16:24:41                                    1        
  11349 0.55500 queue_test abs4         qw    06/30/2014 16:36:10                                    1        
  11360 0.00000 timestable abs4         qw    07/01/2014 15:59:41                                    1 1-20:1
-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode2.york.ac.uk      1 5
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode4.york.ac.uk      1 6
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode3.york.ac.uk      1 7
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode1.york.ac.uk      1 8
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode3.york.ac.uk      1 9
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode4.york.ac.uk      1 10
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode2.york.ac.uk      1 11
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode5.york.ac.uk      1 12
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode1.york.ac.uk      1 14
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode5.york.ac.uk      1 16
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode2.york.ac.uk      1 17
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode4.york.ac.uk      1 18
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode3.york.ac.uk      1 19
  11360 0.55500 timestable abs4         r     07/01/2014 15:59:48 its-cluster@rnode1.york.ac.uk      1 20
  11347 0.55500 queue_test abs4         qw    06/30/2014 16:24:37                                    1        
  11348 0.55500 queue_test abs4         qw    06/30/2014 16:24:41                                    1        
  11349 0.55500 queue_test abs4         qw    06/30/2014 16:36:10                                    1        
-bash-4.1$ more logs/timestable.job.o11360.15 
I am running a Matlab executable timestable 15 on rnode0
x =
    15
15 times table
1 x 15 => 15
2 x 15 => 30
3 x 15 => 45
4 x 15 => 60
5 x 15 => 75
6 x 15 => 90
7 x 15 => 105
8 x 15 => 120
9 x 15 => 135
10 x 15 => 150
-bash-4.1$ 

Parallel Processing Toolbox

THIS SECTION IS NOT COMPLETE AND REQUIRES SOME MORE WORK

 If you are using parallel processing toolbox, you have to explicitly create local worker pool by using the "parpool" command. The number of workers you create should be equal to the number of slot you request in your job script. For example, to request the creation of 5 local workers, do this in your "master" script:

parpool(5);

Caution when using system() and unix() commands

Matlab users whose scripts call system() or unix() should take care when submitting batch jobs. If the unix command to be called does not already use standard input redirection, the command should redirect standard input from the special device /dev/null. E.g. a simple command such as:

unix ("gzip myfile")

should instead be written as:

unix ("gzip myfile < /dev/null")

Many unix commands when they encounter unsusual circumstances will prompt the user for input - but only if they believe they are being used in an interactive context. When not used in an interactive context, most of these tools will take a default safe option without prompting, and then exit. A bug in matlab fools these commands into believing they are being used in an interactive context. In an HEC batch context, however, no user input is possible, and so a call which requires user input will cause the job to hang indefinitely.

In the above gzip example, if a filename matching the gzipped version"s name already exists, gzip will prompt an interactive user if they wish to overwrite the file. The default action for a non-interactive session is for gzip to chose not to overwrite the file and exit.

Stata

The physnodes currently hosts a 70-user licence for Stata, which includes both regular and special editions. To enable access to the current default version, use the following modules command:

module load Stata/13

Regular Stata can then be invoked with the stata command, Stata Special Edition with stata-se, and the multi-core version with stata-mp.

The following example runs Stata script called 'stata-script.do' on five data files named "data.1", "data.2", "data.3", "data.4", "data.5" in the "data" directory. Output is written to a file in the "slog" in the "stata-log" directory.

Stata job file
#$ -cwd -V
#$ -l h_rt=0:30:00
#$ -o logs
#$ -e logs
#$ -N Stata_job
#$ -t 1:5
echo `date`: executing Stata job ${SGE_TASK_ID}
stata -b do stata-script.do data/data.${SGE_TASK_ID} stata-log/slog.${SGE_TASK_ID}

Within the Stata "do" script, we use the following to read from individual data files and write unique Stata log files.

Specifying data and log files
log using `2', replace
use `1', clear

Stata/MP

The following script file requests 16 cores and 64GB of RAM for a Stata/MP job

Stata/MP Job
#$ -cwd -V
#$ -l h_rt=1:00:00
#$ -o logs
#$ -e logs
#$ -N Stata_MP_job
#$ -pe smp 16
echo `date`: executing Stata job
stata-mp -b do stata-mp-script.do data/data.file stata-log/slog.stata

Gaussian Jobs

Gaussian License

Before using Gaussian you must sign the license form and request access.

 The following script submits a simple Gaussian job to the cluster

Gaussian Job Script
#$ -V -cwd
#$ -l h_rt=01:59:59
#$ -l h_vmem=8G
#$ -l tmpfree=600G
g09 < ptcl6.master > ptcl6.test.gen.td-30.log

The tmpfree directive is required for Gaussian jobs that use large amounts of disc space in /tmp. Running a number of jobs on the same host can cause the disc to fill up and the jobs will fail. The amount of space required can vary based on the work being done.