PBS User Info

 
 

I Shell initialization files:

Your jobs may not run properly if your start-up files (i.e. .cshrc, .login or .profile) contain commands that attempt to set up the terminal. Any commands which do so should be skipped by checking for PBS_ENVIRONMENT variable. If it is defined, then you should skip your terminal initialization. Here is an example of how to do this (in your .login file):

        ...
        if (! $?PBS_ENVIRONMENT) then
                (do any terminal setup here, or anything that writes to stdout)
        endif
        ...
Also please note that if you use csh for your shell you will receive a warning message at the header of your standard output file stating
"Warning: no access to tty,..."
this message can be safely ignored, as it has no impact on PBS. You may observe this problem on sub9 at the moment, due to the version of csh used there. Galaxy does not display this problem.

II. How to request nodes:

Within a PBS directive (or from a qsub command line) use the parameter '
            '-l  nodes=node_spec[+node_spec]'
Where node_spec = number | property[:property] | number:property[:property]
 

Currently available properties are: nova - request nova nodes star - request star nodes vega - request vega nodes

So for example: to request 2 nova nodes the node_spec would be

   '2:nova'.
If you don't care what kind of nodes you get, omit the 'property' clause. 
Remember that if you request a lot of nodes and there are not enough available, 
the job will sit in the queue. A directive to accomplish this (from inside a
script) would look like this:
#PBS -l nodes=3:nova+4
This requests 3 nova nodes and 4 nodes of any type. Remember that each node has 2 processors, so if you only need 3 or 4 processors you only need to request 2 nodes:
#PBS -l nodes=2:nova
Additionally, PBS v1.2 includes support for a new property, processors per node (ppn). You use it in the following fashion:
#PBS -l nodes=node_spec:ppn=x
Where node_spec follows the format outlined above. x indicates how many processors per node you want for EACH of the nodes defined in the node_spec. For example:
#PBS -l nodes=5:star:ppn=2+4:ppn=2+1:ppn=1
This asks for 5 star nodes with 2 processors free, any 4 other nodes with 2 processors free, and 1 node with at least one processor free.
In general, this will not be of concern to users due to MPI commandline support (see below). More information and additional examples are available in the PBS documentation, see 'man qsub' for more information on qsub scripts in general. See 'man pbs_resources_linux' for more information on '-l nodes' and other attributes you can assign to your jobs.

III. MPI commandline

Launch MPI commandline from shell as normal Add the -pbs parameter to the argument list. Omit arguments such as -machinefile and -nolocal since the process will be provided a machinefile by PBS (If you do not omit them they will be ignored). Use -np as normal, mpirun will figure out how many nodes you need.
Example mpirun calls:
mpirun -pbs -np 2 a.out
mpirun -pbs -np 4 -nodetype vega a.out

The -nodetype requests a nodetype the same as using '-l nodes' described above. Currently available -nodetype agruments are 'star' and 'vega'.

IV. Job exit status

>>>>IF YOU DO NOT HAVE A .logout FILE, PLEASE DISREGARD THE FOLLOWING<<<<
Job exit status can be misreported by PBS if you use csh. This is due to the fact that PBS will process your .logout file after the qsub script terminates. To ensure the correct exit value is returned, you can add the following line as the first line in your .logout:
set EXITVAL = $status
And add the following as the last line in your .logout:
exit $EXITVAL
If you don't make these modifications it can affect inter-job dependencies (ie if you queue one job to begin after successful or unsuccessful completion of another job).

V. Delivery of output files
Stdout and Stderr of your job will be returned to you under the following names:

      jobname.o.jobID for output and 
      jobname.e.jobID for error
where jobname is the user-specified job-name and jobID is the PBS-assigned job ID. These files will be placed in the directory where you executed qsub from.

 It is important to note that if you redirect your stdout and stderr from your qsub script, these files should be empty (with the possible exception of a warning message at the head o f the stdout file if you use csh, as outlined above).

You will need to read the following section on File Stage-in/out to ensure you receive these files properly.

 Delivery of output files can fail under the following cirumstances:

  • If your .cshrc file outputs any characters to stdout such as an echo, this can cause a failure. See (I.) - Shell initialization files for details on how to avoid this.
  • If you don't submit your job from a directory where you have write access, the PBS system may be unable to return your stdout and stderr files to you. This is due to the fact that PBS initializes the process under your name and attempts to return these files to the directory where it was spawned from. To avoid this, always run the qsub command from your home directory (or a subdirectory thereof).

  • VI. File stage-in/out

    If your process requires input files or creates files as output, you have two options on how to deal with this. If the files are in your home directory, proceed as normal(since the home directories are mounted on each node). Otherwise (for example, if you are using the tmp directory), you will need to use file staging by adding one of the following arguments to your qsub call (this can also be placed inside your qsub script):

            -W stagein=node_file@starzero:starzero_file AND/OR
            -W stageout=node_file@starzero:starzero_file
    where: PLEASE NOTE that it is possible to stage in/out an entire directory. If you wish to do this, please ensure that both node_file and starzero_file are referencing a directory rather than a file. If there is a mismatch the stage in/out will fail.Also, when staging in an entire directory, the starzero_file (actually a directory) will be created in a new subdirectory of node_file. So for example:
    -W stagein=/tmp@starzero:$home/mydir
    will create a new directory called /tmp/mydir and copy all files and subdirectories of $home/mydir there. All staged-in files will be deleted when the job terminates. All staged out files will be deleted after the stage-out process succeeds. If stage-out fails for some reason you will receive e-mail from the PBS system and the files will remain on the remote server.

    IT IS EXTREMELY important that you not use any wildcards (i.e. * or ?) in your stage in/out directives since they will not be expanded on the execution node.