Outils pour utilisateurs

Outils du site


cluster-lbt:jobs_submission
no way to compare when less than two revisions

Différences

Ci-dessous, les différences entre deux révisions de la page.


cluster-lbt:jobs_submission [2021/05/31 10:30] (Version actuelle) – créée - modification externe 127.0.0.1
Ligne 1: Ligne 1:
 +===== Job submissions =====
 +==== Basic commands ====
 +Thereafter are some quick and usefull commands for job management:
 +<note warning>While it is not mandatory to specify the [[cluster-lbt:jobs_submission#resources_request|needed hardware resources]], this is more than recommended since the default resources request is 1 hour for 1 CPU core.</note>
 +  * submit a job script:
 +<cli prompt="$">
 +$ qsub <jobscript>
 +</cli>
 +  * run a interactive session:
 +<cli>
 +$ qsub -I -A <credit account>
 +</cli>
 +  * submit a new job (scripted or interactive) no matter where you are in the current filesystem:
 +<cli> $ qsub -d `/shared/scripts/getWorkdir.sh` [-W depend=afterok:$PBS_JOBID] <jobscript></cli>
 +The //getworkdir.sh// script determines the correct current directory wherever you are on the file system. This is particularly useful if you want to submit a new job from a job in progress (with //afterok// directive to be sure the current job will complete and OK before running the new one) running in the local /scratch directory or to run an interactive session directly from anyhere.
 +  * stop/kill a job:
 +<cli>
 +$ qdel <jobid>
 +</cli>
 +  * get information about your scheduled job:
 +<cli>
 +$ checkjob -vv <jobid>
 +</cli>
 +  * graphically view the allocated resources:
 +<cli>
 +$ pbstop
 +</cli>
 +  * get use statistics for a given date concerning your group:
 +<cli>
 +$ gusage -s <date (yyyymmdd)> -h -p <credit account>
 +</cli>
 +  * list the uses for a given date:
 +<cli>
 +$ gstatement -s <date (yyyymmdd)> -h -p <credit account>
 +</cli>
 +**NB : ** –summarize option provide a summary
 +  * get your remaining allocated time:
 +<cli>
 +$ mybalance -h
 +</cli>
 +  * display in text format free resources:
 +<cli>
 +$ showbf
 +</cli>
 +  * get the estimated start date for job execution:
 +<cli>
 +$ showstart <jobid>
 +</cli>
 +  * list the queued jobs:
 +<cli>
 +$ showq
 +</cli>
 +  * get queue info and stats:
 +<cli>
 +$ qstat -q
 +</cli>
  
 +==== Resources request ====
 +Here some ways for requesting some computational resources
 +  * a simple sequential job for 2 hours:
 +<cli>
 +$ qsub -l walltime=2:00:00 <script>
 +</cli>
 +  * a openmp job (on only one node) for 1 day and 8 cores:
 +<cli>
 +$ qsub -l walltime=24:00:00 -l nodes=1:ppn=8 <script>
 +</cli>
 +  * a parallel job with 10 nodes and 12 cores each:
 +<cli>
 +$ qsub -l nodes=10:ppn=12 <script>
 +</cli>
 +<note warning>You should also pay attention to the amount of memory actually consumed by your jobs, which should not exceed the proportion of node processor required.</note>
 +==== Advanced usage ====
 +In the computing nodes cluster you can do much more than submitting only one script such as submitting job array or chaining jobs.
 +
 +=== Job Arrays ===
 +If you have to submit several identical jobs without having drive every submissions you can use a Torque's feature called //Job Arrays//.
 +<note>The submission of job arrays is also a good cluster practice. Indeed, since you are not alone in using the cluster computing resources, you must avoid overconsuming it by leaving some resources available...</note>
 +== Submitting job array ==
 +You can submit a job array simply typing:
 +<cli>
 +$ qsub -t x-y
 +</cli>
 +where x and y are the array bounds; but you can also provide a comma-separated list:
 +<cli>
 +$ qsub -t x,y,z
 +</cli>
 +
 +You can also limit the number of tasks that run at once suffixing tab list/bounds by "%N"
 +<cli>
 +$ qsub -t 1-100%10
 +</cli>
 +
 +You can use //$PBS_ARRAYID// Torque variable to play with input or output files.
 +
 +Script example that uses the array feature to run 3 identical jobs:
 +<code bash arrayjob.pbs>
 +#!/bin/sh
 +#PBS -o testArray.out
 +#PBS -e testArray.err
 +#PBS -l nodes=1:ppn=3
 +#PBS -t 1,5,7
 +
 +# Informations sur le job
 +echo "Job Id:" $PBS_JOBID
 +echo "List of nodes allocated for job (PBS_NODEFILE): " `cat $PBS_NODEFILE | uniq | sort`
 +
 +# job part
 +cd $PBS_O_WORKDIR
 +RUN=./simulation.x
 +
 +$RUN  data${PBS_ARRAYID}.in  data${PBS_ARRAYID}.out
 +</code>
 +
 +== Viewing job array information ==
 +To view the information about tasks in a running job array pass the -t switch to the qstat command as in
 +<cli>
 +$ qstat -t 1,5,7
 +</cli>
 +
 +== Deleting job arrays and tasks ==
 +To delete some tasks use the following command format
 +<cli>
 +$ qdel -t 1,5,7 12345[]
 +</cli>
 +**Note :** Make sure to use the [] brackets after the job-id.
 +
 +=== Job chains and Dependencies ===
 +Quite often, a single simulation requires multiple long runs which must be processed in sequence. One method for creating a sequence of batch jobs is to execute the "qsub" command to submit its successor. I strongly discourage recursive scripts since, for some jobs, you can chain them.
 +
 +In PBS, you can use the "qsub -W depend=..." option to create dependencies between jobs
 +<cli>
 +$ qsub -W depend=afterok:<job-ID> <new QSUB script>
 +</cli>
 +<WRAP center>
 +^ option  ^ description  ^
 +| afterok:<job-ID>  | Job is scheduled if the job <job-ID> exits without errors or is successfully completed.  |
 +| afternotok:<job-ID>  | Job is scheduled if the job <job-ID> exited with errors.  |
 +| after any:<job-ID>  | Job is scheduled if the job <job-ID> exits with or without errors.  |
 +</WRAP>
 +Here is an example script about how to chain 3 jobs:
 +<code bash job_chaining.sh>
 +#!/bin/bash
 +
 +FIRST=$(qsub job1.pbs)
 +echo $FIRST
 +SECOND=$(qsub -W depend=afterany:$FIRST job2.pbs)
 +echo $SECOND
 +THIRD=$(qsub -W depend=afterany:$SECOND job3.pbs)
 +echo $THIRD
 +</code>
 +=== Calculation quota statistics ===
 +At any time, you can check the evolution of your calculation quota usage and its history, half-year by half-year:
 +<cli>
 +$ usagestats-<sn|amd|intel> admin_project
 +admin_project (2016-01-01 -> 2016-06-30): 0.0%
 +admin_project (2016-07-01 -> 2017-01-01): 0.0%
 +admin_project (2017-01-01 -> 2017-07-01): 0.0%
 +admin_project (2017-07-01 -> 2018-01-01): 0.0%
 +</cli>
 +
 +=== Node targetting ===
 +In some very specific cases, you may need to target nodes on which you want to submit your jobs. 
 +
 +To do so:
 +<cli>
 +$ qsub -l nodes=<node-name>[:<options>][+<node-name>[:<options>]]
 +</cli>
 +<note>You must pay attention with this kind of options to avoid any edge effects. For example, you will never have priority over the job scheduler for the specified nodes in its current node allocation table -and, according to the queue, you may have to wait for very long time before seeing your jobs start.</note>
cluster-lbt/jobs_submission.txt · Dernière modification : 2021/05/31 10:30 de 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki