cluster-lbt:limits

Cluster usage rules

The first rule to know and follow about the use of the LBT computationnal clusters is that is formally forbidden to run any king of scripts/jobs/daemons on masters. Anyone caught on infringing this rule will be banned from the list of authorized users.

Because depending of financial investments, each group has its own quotas, both for computing and storage capacities, which could differ depending the cluster you need.

Knowing that the default quota is 750GB/60kf per user on /workdir, below are respective semestrial quotas for Baal:

Group name	Project account	Dedicated CPU time	Archive quotas	Workdir quotas**
amyloid_team	amyloid_project	0h	7TB - 140kf	default only
baaden_team	baaden_project	0h	18TB - 360kf	default only
barraud_team*	barraud_project	0h	no quota	default only
biou_team*	biou_project	0h	7TB - 140kf	15TB - 1200kf
boel_team*	boel_project	0h	no quota	defaut only
bioinfo_team*	bioinfo_project	0h	1TB - 20kf	default only
derreumaux_team	derreumaux_project	0h	5TB - 100kf	default only
lafontaine_team*	bioinfo_project	0h	1TB - 20kf	default only
meyer_team*	meyer_project	0h	no quota	default only
sacquin_team	sacquin_project	175680h	13TB - 260kf	default only
simlab_team	simlab_project	1497672h	5TB - 100kf	default only
sterpone_team	sterpone_project	0h	35TB - 700kf	default only
stirnemann_team	stirnemann_project	1633824h	25TB - 500kf	25TB/2000kf**
robert_team	robert_project	338184h	3TB - 60kf	3TB/240kf**
vallon_team*	vallon_project	0h	no quota	default only
ej_team	ej_project.	676368h	no quota	8.5TB/680kf**

*: the archive volume for these IBPC UNIX groups is located on a dedicated server. So, these are neither distributed nor replicated.
**: some group leaders bought their own storage space. So, in all fairness, they got their acquisition quota plus the default quota per group member. So, for LBT members only, this Baal Workdir quota above is an approximate value because it depends on the current group members number (holders, PhD, post-PhD, trainees, etc.)

For all clusters, according to your assignment laboratory (LBT or elsewhere) or membership agreement, you may use simlab_project's CPU accounting (and/or others) to run your jobs

As you can notice, you dont have any limitation on /scratch[-dfs] directory on any computing node because you must clean all temporary directories at the end of job. If you dont, an automatic script will do that. Please, remember this thinking to back synchronize your produced data at the end of your job runtime.

A script is run every hour to check the quota status. If you are exceeding your disk quota, whatever the quota, your PI will get a one-day reprieve to solve the trouble (and one-day more in quarantine); failing that, your group will be excluded from access policy and your session(s) killed until a solution is found.
You can monitor your storage consumption by displaying the above mentioned daemon's outputs (/shared/cluster_help/*.quota)

You've to keep in mind the homedir where you are located is in fact a bind into your own workdir space.

Note than if you have completely dryed your CPU time allocation [or if your don't have any CPU time allocation dedicated to your project] and if you are a LBT laboratory member, you are allowed to use the simlab_project one.

In addition of these disk space and CPU time allocation, you have also to know clusters [that don't have the same technology inside] have different designed queues:

Queue name	Available nodes	Number of nodes	Number of cores	Maximum Walltime
monop	2	1	1 - 4	144:00:00
gpu_40c_(h/1/2)n*	25	½ / 1 / 2	20 / 40 / 80	16 / 12 / 8:00:00
nogpu_40c_(h/1)n	25	½ / 1	20 / 40	16 / 12:00:00
alphafold_(h/1)n	3	½ / 1	8 / 16	168:00:00
alphafold2_(h/1)n	5	½ / 1	24 / 48	168:00:00

*: this cluster is mainly dedicated for GPU technology. That is why “gpu_40c_1n” queue is the default one and all jobs run without any queue specification will be in GPU 40c CPU nodes (not in other ones).

You should also pay attention to the amount of memory actually consumed by your jobs, which should not exceed the proportion of node processor required.

Last, but not least, rule: if you are running a job on a single node, you MUST use /scratch volume for all your temporary files (i.e. do not read/write directly on /wordir during the computation). If you are running a job on multiple nodes, you must favor the /scratch-dfs one.
The easiest way to do it is using the submission script provided here.