I Shell initialization files:
Your jobs may not run properly if your start-up files (i.e. .cshrc, .login or .profile) contain commands that attempt to set up the terminal. Any commands which do so should be skipped by checking for PBS_ENVIRONMENT variable. If it is defined, then you should skip your terminal initialization. Here is an example of how to do this (in your .login file):
... if (! $?PBS_ENVIRONMENT) then (do any terminal setup here, or anything that writes to stdout) endif ...Also please note that if you use csh for your shell you will receive a warning message at the header of your standard output file stating
II. How to request nodes:
Within a PBS directive (or from a qsub command line) use the parameter
'
'-l nodes=node_spec[+node_spec]'
Where node_spec = number | property[:property] | number:property[:property]
Currently available properties are: nova - request nova nodes star - request star nodes vega - request vega nodes
So for example: to request 2 nova nodes the node_spec would be
'2:nova'.If you don't care what kind of nodes you get, omit the 'property' clause.
#PBS -l nodes=3:nova+4This requests 3 nova nodes and 4 nodes of any type. Remember that each node has 2 processors, so if you only need 3 or 4 processors you only need to request 2 nodes:
#PBS -l nodes=2:novaAdditionally, PBS v1.2 includes support for a new property, processors per node (ppn). You use it in the following fashion:
#PBS -l nodes=node_spec:ppn=xWhere node_spec follows the format outlined above. x indicates how many processors per node you want for EACH of the nodes defined in the node_spec. For example:
#PBS -l nodes=5:star:ppn=2+4:ppn=2+1:ppn=1This asks for 5 star nodes with 2 processors free, any 4 other nodes with 2 processors free, and 1 node with at least one processor free.
III. MPI commandline
Launch MPI commandline from shell as normal Add the -pbs parameter to
the argument list. Omit arguments such as -machinefile and -nolocal
since the process will be provided a machinefile by PBS (If you do not
omit them they will be ignored). Use -np as normal, mpirun will figure
out how many nodes you need.
Example mpirun calls:
mpirun -pbs -np 2 a.out
mpirun -pbs -np 4 -nodetype vega a.out
The -nodetype requests a nodetype the same as using '-l nodes' described above. Currently available -nodetype agruments are 'star' and 'vega'.
IV. Job exit status
>>>>IF YOU DO NOT HAVE A .logout FILE, PLEASE DISREGARD THE FOLLOWING<<<<Job exit status can be misreported by PBS if you use csh. This is due to the fact that PBS will process your .logout file after the qsub script terminates. To ensure the correct exit value is returned, you can add the following line as the first line in your .logout:
set EXITVAL = $statusAnd add the following as the last line in your .logout:
exit $EXITVALIf you don't make these modifications it can affect inter-job dependencies (ie if you queue one job to begin after successful or unsuccessful completion of another job).
V. Delivery of output files
Stdout and Stderr of your job will be returned to you under the following
names:
jobname.o.jobID for output and jobname.e.jobID for errorwhere jobname is the user-specified job-name and jobID is the PBS-assigned job ID. These files will be placed in the directory where you executed qsub from.
It is important to note that if you redirect your stdout and stderr from your qsub script, these files should be empty (with the possible exception of a warning message at the head o f the stdout file if you use csh, as outlined above).
You will need to read the following section on File Stage-in/out to ensure you receive these files properly.
Delivery of output files can fail under the following cirumstances:
VI. File stage-in/out
If your process requires input files or creates files as output, you have two options on how to deal with this. If the files are in your home directory, proceed as normal(since the home directories are mounted on each node). Otherwise (for example, if you are using the tmp directory), you will need to use file staging by adding one of the following arguments to your qsub call (this can also be placed inside your qsub script):
-W stagein=node_file@starzero:starzero_file AND/OR -W stageout=node_file@starzero:starzero_filewhere:
-W stagein=/tmp@starzero:$home/mydirwill create a new directory called /tmp/mydir and copy all files and subdirectories of $home/mydir there. All staged-in files will be deleted when the job terminates. All staged out files will be deleted after the stage-out process succeeds. If stage-out fails for some reason you will receive e-mail from the PBS system and the files will remain on the remote server.
IT IS EXTREMELY important that you not use any wildcards (i.e. * or ?) in your stage in/out directives since they will not be expanded on the execution node.