HACC-mlcluster

This webpage provides a high-level overview of the machine learning cluster (mlcluster) under the Heterogeneous Accelerated Compute Cluster (HACC) program (full name HACC-mlcluster). This cluster is part of the program formerly known as the Xilinx Adaptive Compute Clusters (XACC).

This cluster offers different resources than the first UIUC HACC cluster. The first HACC cluster is focused entirely on FPGAs – we offer access to several powerful Alveo boards such as the U250 and U280. The HACC-mlcluster offers access to several AMD MI210 GPUs and AMD-Xilinx FPGAs, namely the U55C and VCK5000.

The cluster is composed of 1 Head Node + 3 Compute Nodes. Features:

  • 1x Head Node with many CPU cores.
    • Manages the Kubernetes controller and Kueue scheduler
    • Resources are available for users to develop software and hardware for deployment on our GPUs and FPGAs.
    • Hosts 14 TB of NFS storage that our users can use for development and ML training.
  • 3x Worker Nodes
    • Powerful CPU cores on each machine
    • 12 Total MI210 GPUs
    • Several U55C and VCK5000 FPGA boards
  • 100 Gbps Networking
    • A 100 Gbps top-of-the-rack switch
    • Each node is equipped with 100 Gbps NICs which enables internode throughputs approaching 100 Gbps.
    • All FPGAs have at least 1x 100 Gbps port connected to the switch

Tabulated Resources Summary

Node Name IP Address CPU Cores RAM Accelerators
Development VM 10.0.69.26 90 216 GB
License Server VM 10.0.69.25
Worker Node 1 10.0.69.21 128 512 GB 4x MI210, VCK5000, U55C
Worker Node 2 10.0.69.22 128 512 GB 4x MI210, VCK5000, U55C
Worker Node 3 10.0.69.23 128 512 GB 4x MI210, VCK5000, 2x U55C