Outils pour utilisateurs

Outils du site


cluster-lbt:jobs_submission

Job submissions

Basic commands

Thereafter are some quick and usefull commands for job management:

While it is not mandatory to specify the needed hardware resources, this is more than recommended since the default resources request is 1 hour for 1 CPU core.
  • submit a job script:
$ qsub <jobscript>
  • run a interactive session:
$ qsub -I -A <credit account>
  • submit a new job (scripted or interactive) no matter where you are in the current filesystem:
 $ qsub -d `/shared/scripts/getWorkdir.sh` [-W depend=afterok:$PBS_JOBID] <jobscript>

The getworkdir.sh script determines the correct current directory wherever you are on the file system. This is particularly useful if you want to submit a new job from a job in progress (with afterok directive to be sure the current job will complete and OK before running the new one) running in the local /scratch directory or to run an interactive session directly from anyhere.

  • stop/kill a job:
$ qdel <jobid>
  • get information about your scheduled job:
$ checkjob -vv <jobid>
  • graphically view the allocated resources:
$ pbstop
  • get use statistics for a given date concerning your group:
$ gusage -s <date (yyyymmdd)> -h -p <credit account>
  • list the uses for a given date:
$ gstatement -s <date (yyyymmdd)> -h -p <credit account>

NB : –summarize option provide a summary

  • get your remaining allocated time:
$ mybalance -h
  • display in text format free resources:
$ showbf
  • get the estimated start date for job execution:
$ showstart <jobid>
  • list the queued jobs:
$ showq
  • get queue info and stats:
$ qstat -q

Resources request

Here some ways for requesting some computational resources

  • a simple sequential job for 2 hours:
$ qsub -l walltime=2:00:00 <script>
  • a openmp job (on only one node) for 1 day and 8 cores:
$ qsub -l walltime=24:00:00 -l nodes=1:ppn=8 <script>
  • a parallel job with 10 nodes and 12 cores each:
$ qsub -l nodes=10:ppn=12 <script>
You should also pay attention to the amount of memory actually consumed by your jobs, which should not exceed the proportion of node processor required.

Advanced usage

In the computing nodes cluster you can do much more than submitting only one script such as submitting job array or chaining jobs.

Job Arrays

If you have to submit several identical jobs without having drive every submissions you can use a Torque's feature called Job Arrays.

The submission of job arrays is also a good cluster practice. Indeed, since you are not alone in using the cluster computing resources, you must avoid overconsuming it by leaving some resources available…
Submitting job array

You can submit a job array simply typing:

$ qsub -t x-y

where x and y are the array bounds; but you can also provide a comma-separated list:

$ qsub -t x,y,z

You can also limit the number of tasks that run at once suffixing tab list/bounds by “%N”

$ qsub -t 1-100%10

You can use $PBS_ARRAYID Torque variable to play with input or output files.

Script example that uses the array feature to run 3 identical jobs:

arrayjob.pbs
#!/bin/sh
#PBS -o testArray.out
#PBS -e testArray.err
#PBS -l nodes=1:ppn=3
#PBS -t 1,5,7
 
# Informations sur le job
echo "Job Id:" $PBS_JOBID
echo "List of nodes allocated for job (PBS_NODEFILE): " `cat $PBS_NODEFILE | uniq | sort`
 
# job part
cd $PBS_O_WORKDIR
RUN=./simulation.x
 
$RUN  data${PBS_ARRAYID}.in  data${PBS_ARRAYID}.out
Viewing job array information

To view the information about tasks in a running job array pass the -t switch to the qstat command as in

$ qstat -t 1,5,7
Deleting job arrays and tasks

To delete some tasks use the following command format

$ qdel -t 1,5,7 12345[]

Note : Make sure to use the [] brackets after the job-id.

Job chains and Dependencies

Quite often, a single simulation requires multiple long runs which must be processed in sequence. One method for creating a sequence of batch jobs is to execute the “qsub” command to submit its successor. I strongly discourage recursive scripts since, for some jobs, you can chain them.

In PBS, you can use the “qsub -W depend=…” option to create dependencies between jobs

$ qsub -W depend=afterok:<job-ID> <new QSUB script>
option description
afterok:<job-ID> Job is scheduled if the job <job-ID> exits without errors or is successfully completed.
afternotok:<job-ID> Job is scheduled if the job <job-ID> exited with errors.
after any:<job-ID> Job is scheduled if the job <job-ID> exits with or without errors.

Here is an example script about how to chain 3 jobs:

job_chaining.sh
#!/bin/bash
 
FIRST=$(qsub job1.pbs)
echo $FIRST
SECOND=$(qsub -W depend=afterany:$FIRST job2.pbs)
echo $SECOND
THIRD=$(qsub -W depend=afterany:$SECOND job3.pbs)
echo $THIRD

Calculation quota statistics

At any time, you can check the evolution of your calculation quota usage and its history, half-year by half-year:

$ usagestats-<sn|amd|intel> admin_project
admin_project (2016-01-01 -> 2016-06-30): 0.0%
admin_project (2016-07-01 -> 2017-01-01): 0.0%
admin_project (2017-01-01 -> 2017-07-01): 0.0%
admin_project (2017-07-01 -> 2018-01-01): 0.0%

Node targetting

In some very specific cases, you may need to target nodes on which you want to submit your jobs.

To do so:

$ qsub -l nodes=<node-name>[:<options>][+<node-name>[:<options>]]
You must pay attention with this kind of options to avoid any edge effects. For example, you will never have priority over the job scheduler for the specified nodes in its current node allocation table -and, according to the queue, you may have to wait for very long time before seeing your jobs start.
cluster-lbt/jobs_submission.txt · Dernière modification : 2021/05/31 10:30 de 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki