User Tools

Site Tools


Computation resources architecture

Below is the architecture model of our computation and storage resources. HPC architecture Currently in the cluster room we have 3 HPC clusters (a.k.a. Supercomputers), called Hades, Lucifer and Baal, that are composed as following:

  • Lucifer (896 cores and 896GB RAM):
    • 28x AMD 32 cores nodes for parallel jobs

Lucifer HPC cluster is being dismantled.

  • Hades (348 cores and 1.2TB RAM):
    • 29x Intel 12 cores nodes for parallel jobs
  • Baal (616 CPU cores, ~115k GPU cores and ~2TB RAM)
    • 1x Intel 8 cores for test jobs (20 minutes max)
    • 2x Intel 24 cores for post-processing jobs
    • 3x Intel 16 cores and bi NVidia GPU nodes (dedicated for GPU jobs)
    • 13x Intel 40 cores and bi NVidia GPU nodes (dedicated for GPU jobs)

Because it will not do any good to compute without storage capacity, following storage volumes are also available:

  • /home: ~25TB (useful capacity) replicated in real time
  • /workdir: ~49TB (useful capacity) for each of both Hades and Lucifer clusters, ~146TB for Baal one
  • /workdir/ibpc_team: ~19TB (useful capacity) dedicated for non-LBT members
  • /archive: ~112TB (useful capacity) replicated every weeks
  • /archive/ibpc_team: ~33TB (useful capacity) dedicated for non-LBT members
  • /scratch: except for the first 3 GPU nodes and post-processing nodes that have around 12TB each and all other GPU nodes with 1.6 SSD TB each, every nodes have around 200GB each to let you use temporary computing files (mainly used for mono-node jobs)

Except for *ibpc_team and /scratch volumes, all above-mentioned volumes are replicated and/or distributed on a storage cluster currently composed by 11 servers.

/scratch volume on each node serves only to store temporary computing files.So, because it's cleared every night (deleting old job's directories), you should not try to use it for chaining jobs.

Only /workdir/* and /scratch are available on computing nodes.

In order to get the best performance as possible, we choose a high throughput and low latency network technology: Infiniband QDR (40Gbs).

Below, the evolution year-per-year of the computing performance and storage:

CPU computing power evolution.GPU-accelerated computing power evolution.Storage evolution

cluster-lbt/architecture.txt ยท Last modified: 2019/12/23 17:48 by admin