===== Computation resources architecture ===== Below is the architecture model of our computation and storage resources. {{ :lbt-cr:cluster-lbt.jpg?direct&500 |HPC architecture}} Every LBT computing resources (HPC, storage and IS clusters) are internally designed and built, and based on open-source software and solutions. Currently in the cluster room we have 1* HPC clusters (a.k.a. Supercomputer), called Baal (Lucifer for the standby login server), which is composed as follows: * 2x Intel 24 cores for post-processing jobs * 3x Intel 16 cores and bi Nvidia GTX-1080Ti GPU nodes (nodes 61 to 63) * 11x Intel 40 cores and bi Nvidia GTX-1080Ti GPU nodes (nodes 64 to 73) * 1x Intel 40 cores and bi Nvidia GTX-4070 GPU nodes (nodes 74 to 75) * 1x Intel 40 cores and bi Nvidia RTX-2080Ti GPU nodes (node 76) * 3x Intel 40 cores and bi Nvidia RTX-3080 GPU nodes (node 77 to 79) * 1x Intel 40 cores and bi Nvidia RTX-3080Ti GPU node (node 80) * 1x Intel 48 cores and bi Nvidia RTX-A6000 GPU nodes (node 81) * 4x Intel 40 cores and bi Nvidia RTX-A5000 GPU nodes (node 82 to 85) * the 2 first (Lucifer and Hades) have been completely dismantled. Because it will not do any good to compute without storage capacity, following storage volumes are also available: * **/workdir**: ~146TB (useful capacity) * **/archive**: ~127TB* (useful capacity, without data compression or deduplication) replicated every weeks into another server with the same storage capacity * **/archive/ibpc_team**: ~43.5TB* (useful capacity, without data compression or deduplication) dedicated for non-LBT members * **/scratch**: except for the first 3 GPU nodes and post-processing nodes that have around 12TB each and all other GPU nodes with 1.6 to 2 SSD TB each, every nodes have around 200GB each to let you use temporary computing files (mainly used for mono-node jobs) * **/scratch-dfs**: ~142TB (useful capacity) for temporary computing files in parallelized job contexts * maybe more considering ZFS compression and deduplication. Some other disk spaces are not listed above because they are kept in reserve. Except for **/archive/ibpc_team** and **/scratch** volumes, all above-mentioned volumes are **replicated and/or distributed** on a storage cluster currently composed by 11 servers. **/scratch** volume on each node serves only to store temporary computing files. So, because it's cleared every night (deleting old job's directories and all not well-formed directories), you should not try to use it for chaining jobs in a single -but dedicated- scratch directory. Archive volume (/archive) is not available on computing nodes. In order to get the best performance as possible, we choose a high throughput and low latency network technology: **Infiniband QDR** (40Gbs). Below, the evolution year-per-year of the computing performance and storage: {{:lbt-cr:lbt-cpu-end2023.png?200|CPU computing power evolution}}.{{:lbt-cr:lbt-gpgpu-end2023.png?200|GPU-accelerated computing power evolution}}.{{:lbt-cr:lbt-storage-end2023.png?200|Storage evolution}}