HACC-mlcluster Tools & Storage

This page gives more detailed information on the software, tools, and storage available on HACC-mlcluster.

Operating System & Kubernetes

  • All machines (and containers) run Ubuntu 20.04 LTS
  • Resources are virtualized via Kubernetes containers/pods
  • Jobs are scheduled using Kueue (a Kubernetes plugin)

FPGA-Related Software

  • Vitis/Vivado 2022.2 on the development VM
    • Including support for HLS
  • XRT 2.14.384 (special XRT version that supports the VCK board) on the worker nodes
  • FPGA platform versions:
    • U55C: xilinx_u55c_gen3x16_xdma_3_202210_1
    • VCK5000: xilinx_vck5000_gen4x8_qdma_2_202220_1
  • Note: we do not support the Vivado flow on our FPGAs – you must use Vitis-based flows.
  • Note: We will not grant you root access or upload custom OSes.

GPU-Related Software

  • ROCm v0.0.60000-91~20.04
  • Python 3.8.10
  • PyTorch
  • Pyvenv
  • We support flows that provide you with a GPU and a jupyter notebook for easy development and debugging. See the tutorial here.

Storage

  • User home directories are mounted on a 14 TB NFS server.
    • We do not provide data backups or any guarantees about our storage. We are not responsible for any data lost as a result of cluster downtime, hardware failures, etc.
  • Additional storage can be allocated as scratch upon request.
    • Such scratch space would be ephemeral (you lose it after your job is finished).
  • We limit the amount of storage per user.
    • We allocate 20GB of storage per user, by default.
    • If you need additional storage, please let us know and we will consider granting your request on a case-by-case basis.
    • If you need access to common datasets, we may consider hosting them on a shared storage device which is available to multiple users. Please let us know.

If there are additional tools you need, please let us know. We will try to accommodate any reasonable requests. However, we reserve the right to decline any tooling requests.