===== Computation resources architecture =====
Below is the architecture model of our computation and storage resources.
{{ :lbt-cr:cluster-lbt.jpg?direct&500 |HPC architecture}}
Every LBT computing resources (HPC, storage and IS clusters) are internally designed and built, and based on open-source software and solutions.
Currently in the cluster room we have 1* HPC clusters (a.k.a. Supercomputer), called Baal (Lucifer for the standby login server), which is composed as follows:
* 2x Intel 24 cores for post-processing jobs
* 3x Intel 16 cores and bi Nvidia GTX-1080Ti GPU nodes (nodes 61 to 63)
* 11x Intel 40 cores and bi Nvidia GTX-1080Ti GPU nodes (nodes 64 to 73)
* 1x Intel 40 cores and bi Nvidia GTX-4070 GPU nodes (nodes 74 to 75)
* 1x Intel 40 cores and bi Nvidia RTX-2080Ti GPU nodes (node 76)
* 3x Intel 40 cores and bi Nvidia RTX-3080 GPU nodes (node 77 to 79)
* 1x Intel 40 cores and bi Nvidia RTX-3080Ti GPU node (node 80)
* 1x Intel 48 cores and bi Nvidia RTX-A6000 GPU nodes (node 81)
* 4x Intel 40 cores and bi Nvidia RTX-A5000 GPU nodes (node 82 to 85)
* the 2 first (Lucifer and Hades) have been completely dismantled.
Because it will not do any good to compute without storage capacity, following storage volumes
are also available:
* **/workdir**: ~146TB (useful capacity)
* **/archive**: ~127TB* (useful capacity, without data compression or deduplication) replicated every weeks into another server with the same storage capacity
* **/archive/ibpc_team**: ~43.5TB* (useful capacity, without data compression or deduplication) dedicated for non-LBT members
* **/scratch**: except for the first 3 GPU nodes and post-processing nodes that have around 12TB each and all other GPU nodes with 1.6 to 2 SSD TB each, every nodes have around 200GB each to let you use temporary computing files (mainly used for mono-node jobs)
* **/scratch-dfs**: ~142TB (useful capacity) for temporary computing files in parallelized job contexts
* maybe more considering ZFS compression and deduplication.
Some other disk spaces are not listed above because they are kept in reserve.
Except for **/archive/ibpc_team** and **/scratch** volumes, all above-mentioned volumes are **replicated and/or distributed** on a storage cluster currently composed by 11 servers.
**/scratch** volume on each node serves only to store temporary computing files. So, because it's cleared every night (deleting old job's directories and all not well-formed directories), you should not try to use it for chaining jobs in a single -but dedicated- scratch directory.
Archive volume (/archive) is not available on computing nodes.
In order to get the best performance as possible, we choose a high throughput and low latency network technology: **Infiniband QDR** (40Gbs).
Below, the evolution year-per-year of the computing performance and storage:
{{:lbt-cr:lbt-cpu-end2023.png?200|CPU computing power evolution}}.{{:lbt-cr:lbt-gpgpu-end2023.png?200|GPU-accelerated computing power evolution}}.{{:lbt-cr:lbt-storage-end2023.png?200|Storage evolution}}