User Tools

Site Tools


cluster-lbt:limits

Cluster usage rules

The first rule to know and follow about the use of the LBT computationnal clusters is that is formally forbidden to run any king of scripts/jobs/daemons on masters. Anyone caught on infringing this rule will be banned from the list of authorized users.

Because depending of financial investments, each group has its own quotas, both for computing and storage capacities, which could differ depending the cluster you need.

Below are respective semestrial quotas for Hades and Baal:

  • on Hades:
Group name Project account CPU time Archive quotas Workdir quotas
amyloid_team amyloid_project 0h 7TB - 140kf 200GB - 16kf
baaden_team baaden_project 0h 18TB - 360kf 26.8TB - 2144kf
biou_team* biou_project 0h 7TB - 140kf 7TB - 560kf
derreumaux_team derreumaux_project 0h 5TB - 100kf 3.4TB - 272kf
lafontaine_team* lafontaine_project 153792h** 2TB - 40kf 2TB - 160kf
sacquin_team sacquin_project 0h 13TB - 260kf 200GB - 16kf
simlab_team simlab_project 1528416h 4TB - 80kf 1TB - 80kf
sterpone_team sterpone_project 0h 35TB - 700kf 9.4TB - 752kf
stirnemann_team stirnemann_project 0h 25TB - 500kf 200GB - 16kf
robert_team robert_project 0h 3TB - 60kf /
  • on Baal:
Group name Project account CPU time Archive quotas Workdir quotas
amyloid_team amyloid_project 0h 7TB - 140kf 500GB - 40kf
baaden_team baaden_project 0h 18TB - 360kf 2.5TB - 200kf
biou_team* biou_project 0h 7TB - 140kf 7TB - 560kf
derreumaux_team derreumaux_project 0h 5TB - 100kf 500GB - 40kf
lafontaine_team* lafontaine_project 0h 2TB - 40kf 2TB - 160kf
sacquin_team sacquin_project 0h 13TB - 260kf 500GB - 50kf
simlab_team simlab_project 909144h 4TB - 80kf 1TB - 80kf
sterpone_team sterpone_project 0h 35TB - 700kf 2.5TB - 200kf
stirnemann_team stirnemann_project 1633824h 25TB - 500kf 25TB - 2Mf
robert_team robert_project 162504h 3TB - 60kf 3TB - 240kf

*: the archive and workdir volumes dedicated for biou_team and lafontaine_team UNIX groups are located on the same server (but 2 distinguished block devices) and common for all clusters. So, these are neither distributed nor replicated.
**: temporary allocations by over-booking.

For all clusters, according to your assignment laboratory (LBT or elsewhere) or membership agreement, you may use simlab_project's CPU accounting (and/or others) to run your jobs

As you can notice, you dont have any limitation on /scratch directory on any computing node because you must clean all temporary directories at the end of job. If you dont, an automatic script will do that. Please, remember this thinking to back synchronize your produced data at the end of your job runtime.

A script is run every hour to check the quota status. If you are exceeding your disk quota, whatever the quota, your PI will get a one-day reprieve to solve the trouble (and one-day more in quarantine); failing that, your group will be excluded from access policy and your session(s) killed until a solution is found.
You can monitor your storage consumption by displaying the above mentioned daemon's outputs (/shared/cluster_help/*.quota)

You've to keep in mind the homedir where you are located is in fact a bind into your own workdir space.

Note than if you have completely dryed your CPU time allocation [or if your don't have any CPU time allocation dedicated to your project] and if you are a LBT laboratory member, you are allowed to use the simlab_project one.

In addition of these disk space and CPU time allocation, you have also to know clusters [that don't have the same technology inside] have different designed queues:

  • Hades:
Queue name Number of nodes Number of cores Maximum Walltime
complete 21 - 29 126 - 348 04:00:00
large 8 - 20 48 - 240 12:00:00
medium 4 - 7 24 - 84 24:00:00
small* 1 - 3 6 - 36 48:00:00
small_24h* 1 - 2 6 - 24 24:00:00

* the main difference between “small” and “small_24h” queues is that with small queue you can only burn the half of cluster nodes; there's no limitation with small_24h.

  • Baal*:
Queue name Number of nodes Number of cores Maximum Walltime
monop 1 1 - 4 144:00:00
cryo_em 1 16 168:00:00
gpu_16c (default) ½ / 1 8 / 16 9 / 6:00:00
gpu_40c (default) ½ / 1 / 2 20 / 40 / 80 16 / 12 / 8:00:00

*: this cluster is mainly dedicated for GPU technology. That is why “gpu*” queues are the default ones and all jobs run without any queue specification will be in GPU nodes (not in monop one).

nohup, disown and similar commands that would keep your job alive on the clusters are prohibited.

Last, but not least, rule: if you are running a job on a single node, you MUST use /scratch volume for all your temporary files (i.e. do not read/write directly on /wordir during the computation). The easiest way to do it is using the submission script provided here.

cluster-lbt/limits.txt · Last modified: 2020/07/17 16:52 by admin