Below is the architecture model of our computation and storage resources.
Every LBT computing resources (HPC, storage and IS clusters) are internally designed and built, and based on open-source software and solutions.
Currently in the cluster room we have 1* HPC clusters (a.k.a. Supercomputer), called Baal (Lucifer for the standby login server), which is composed as follows:
2x Intel 24 cores for post-processing jobs
3x Intel 16 cores and bi Nvidia GTX-1080Ti GPU nodes (nodes 61 to 63)
11x Intel 40 cores and bi Nvidia GTX-1080Ti GPU nodes (nodes 64 to 73)
1x Intel 40 cores and bi Nvidia GTX-4070 GPU nodes (nodes 74 to 75)
1x Intel 40 cores and bi Nvidia RTX-2080Ti GPU nodes (node 76)
3x Intel 40 cores and bi Nvidia RTX-3080 GPU nodes (node 77 to 79)
1x Intel 40 cores and bi Nvidia RTX-3080Ti GPU node (node 80)
1x Intel 48 cores and bi Nvidia RTX-A6000 GPU nodes (node 81)
4x Intel 40 cores and bi Nvidia RTX-A5000 GPU nodes (node 82 to 85)
* the 2 first (Lucifer and Hades) have been completely dismantled.
Because it will not do any good to compute without storage capacity, following storage volumes
are also available:
/workdir: ~146TB (useful capacity)
/archive: ~127TB* (useful capacity, without data compression or deduplication) replicated every weeks into another server with the same storage capacity
/archive/ibpc_team: ~43.5TB* (useful capacity, without data compression or deduplication) dedicated for non-LBT members
/scratch: except for the first 3 GPU nodes and post-processing nodes that have around 12TB each and all other GPU nodes with 1.6 to 2 SSD TB each, every nodes have around 200GB each to let you use temporary computing files (mainly used for mono-node jobs)
/scratch-dfs: ~142TB (useful capacity) for temporary computing files in parallelized job contexts
* maybe more considering ZFS compression and deduplication.
Some other disk spaces are not listed above because they are kept in reserve.
Except for /archive/ibpc_team and /scratch volumes, all above-mentioned volumes are replicated and/or distributed on a storage cluster currently composed by 11 servers.
/scratch volume on each node serves only to store temporary computing files. So, because it's cleared every night (deleting old job's directories and all not well-formed directories), you should not try to use it for chaining jobs in a single -but dedicated- scratch directory.
Archive volume (/archive) is not available on computing nodes.
In order to get the best performance as possible, we choose a high throughput and low latency network technology: Infiniband QDR (40Gbs).
Below, the evolution year-per-year of the computing performance and storage: