HACC Networked FPGA Guide

FPGAs in Compute Nodes 2 and 3 have both QSFP cages directly connected to the 100Gb/s Top-of-Rack (TOR) switch. While some of those FPGAs are dedicated to particular users, other FPGAs are flexibly provided to all HACC users through SLURM queues. For example,  u280-networked-short, u280-networked-medium, and u280-networked-long are the SLURM queues to launch jobs on a VM with a network attached U280. Please contact a system administrator if you desire to access one of these queues. Like other SLURM accessible FPGAs, these FPGAs are restricted to flows compatible with Xilinx provided shells. As of writing this guide, the two most popular shell-based networking frameworks are EasyNet and VNX which provide easy-to-use TCP and UDP stacks respectively. Please note that programming frameworks such as these might have restrictions on the type and version of shell on the device, which should be considered before determining which framework to use.

HACC utilizes UIUC Engineering IT’s DNS and DHCP service to aide in networking. Through our IT department, we have specific IPs, hostnames, and MACs dedicated to each interface of each network-attached device. While the FPGA programming frameworks allow any MAC, IP, and gateway to be programmed into the device, we require that users use the correct addressing when programming the device to minimize issues caused on the UIUC network. To aid in this, we provide a file at /mnt/shared/alveo_addresses.json which maps FPGA serial number to MAC address, IP address, hostname, and gateway IP address for each interface on the FPGA. We want users to use this file to programmatically program the correct addressing information into the card.

We provide a simple example that does this, located at /mnt/shared/uiuc_xacc_vnx_example/. This directory contains a driver python script called vnx_test.py, a .xclbin file to program both interfaces of the FPGA, and the VNX APIs defined in vnx_utils.py. vnx_test.py will run xbutil dump and parse through the output to determine the serial number of the FPGA. Then, it uses alveo_addresses.json to determine the correct MAC, IP, and gateway for the device, which it then programs into the device. After running, one should be able to ping both interfaces on the card to determine that the networked interfaces are working.

Please note that when a session ends, the design will still be loaded on the FPGA. This means it will continue to generate network traffic until it is reprogrammed. A simple way to stop this is to run xbutil validate, which will load a new design on the FPGA which doesn’t utilize the network interfaces.