This webpage provides a high-level overview of the machine learning cluster (mlcluster) under the Heterogeneous Accelerated Compute Cluster (HACC) program (full name HACC-mlcluster). This cluster is part of the program formerly known as the Xilinx Adaptive Compute Clusters (XACC).
This cluster offers different resources than the first UIUC HACC cluster. The first HACC cluster is focused entirely on FPGAs – we offer access to several powerful Alveo boards such as the U250 and U280. The HACC-mlcluster offers access to several AMD MI210 GPUs and AMD-Xilinx FPGAs, namely the U55C and VCK5000.
The cluster is composed of 1 Head Node + 3 Compute Nodes. Features:
- 1x Head Node with many CPU cores.
- Manages the Kubernetes controller and Kueue scheduler
- Resources are available for users to develop software and hardware for deployment on our GPUs and FPGAs.
- Hosts 14 TB of NFS storage that our users can use for development and ML training.
- 3x Worker Nodes
- Powerful CPU cores on each machine
- 12 Total MI210 GPUs
- Several U55C and VCK5000 FPGA boards
- 100 Gbps Networking
- A 100 Gbps top-of-the-rack switch
- Each node is equipped with 100 Gbps NICs which enables internode throughputs approaching 100 Gbps.
- All FPGAs have at least 1x 100 Gbps port connected to the switch
Tabulated Resources Summary
Node Name | IP Address | CPU Cores | RAM | Accelerators |
Development VM | 10.0.69.26 | 90 | 216 GB | – |
License Server VM | 10.0.69.25 | – | – | – |
Worker Node 1 | 10.0.69.21 | 128 | 512 GB | 4x MI210, VCK5000, U55C |
Worker Node 2 | 10.0.69.22 | 128 | 512 GB | 4x MI210, VCK5000, U55C |
Worker Node 3 | 10.0.69.23 | 128 | 512 GB | 4x MI210, VCK5000, 2x U55C |