OpenPBS Commands

The simulation cluster uses the OpenPBS job scheduler to perform the simulations. Jobs submitted to the queue named "StdQ" is executed as soon as the resources requested are available. The following are the commands that will be useful for submitting jobs, requesting resources for jobs and checking on the statuses of submitted jobs.

First set up your environment using the following set of commands. A table of compute nodes and their statuses should appear on the terminal. If an error occurs, please contact the GAs.

.  /etc/profile;
module load local pbspro
pbsnodes -ajSL

To get a terminal on a compute node for running simulations. Two switches are used in this example. The first argument, -I, specifies the job to be an interactive job. An interactive job is one where a terminal is given to the user to interact with the compute node. The second argument, -X, specifies that X11 forwarding should be enabled for the job. This is needed if the interactive job requires access to a graphical user interface (GUI).

qsub -I -X

To check on the jobs submitted to the queue.
This command will return the list of jobs submitted to the queue and their statuses. The common statuses are R=running, H=held, Q=queued and E=exiting.

qstat

When the job is in "held" or "queued", there may be errors preventing it from being run. To check, take note of the job ID (the number associated with the job) and run the qsub command with the -f switch.

Replacing <JobID> with the actual job ID number. In the returned table, the "comment" should show any errors preventing the job from running. This can happen when the requested resources (e.g., number of CPUs, amount of memory, specific machines, etc.) are not available. Sometimes it might be the issue with the script being run or the software. The log file should be checked to resolve it.

qstat -f <JobID>

When you need to kill your submitted job due to insufficient resources or for whatever reason, take note of the JobID and run the qdel command.

Replacing
<JobID> with the actual number corresponding to your job. Take note that it may take some time for the job to be fully killed. Make sure the job is fully killed before submitting a new job to the queue.
Importantly, if the submitted job is requesting for a compute server, and the name of the compute server cannot be found in the list returned by "
pbsnodes -ajSL", the job will be queued indefinitely.
(As a courtesy to every user, please kill all such jobs that you have submitted).

qdel <JobID>

To request for specific resources, run the qsub command with the -l switch (one for every resource requested). The list of available resources and their names are described in the table.

qsub -l <ResourceName>=<ResourceRequested>

Resource Name	Description
ncpus	Number of virtual CPUs. The default is 1.
mem	Amount of RAM. The default is 1GB.
walltime	Maximum amount of runtime. The default is 11:00:00, which is 11 hours. The scheduler assumes the job is dead if the simulation exceeds this time, and will kill jobs exceeding their specified walltime.
cudamem	Amount of GPU RAM on CUDA devices (i.e., NVidia GPUs). The default is 0.
rocmmem	Amount of GPU RAM on ROCm devices (i.e., AMD Instinct GPUs). The default is 0.
host	Specific compute server (by name). By default, any compute server that has other resources available to the job will be selected. A list of compute servers is obtained by running the "pbsnodes -ajSL" command. This resource is case insensitive.
mpiprocs	If running an MPI program, this command specifies the number of MPI processes that will be started on the compute nodes. This should be less than or equal to the ncpus requested for the job.
select	Number of compute nodes. This is only used for running MPI programs