HACC-mlcluster Submitting a Job

To submit a job, use the program called “mlcluster_job_submission” located at /nfs/shared. All jobs will be launched in a single container with access to all requested resources. This program takes the following arguments.

–help(-h): This shows the help menu

–info(-i): This option tells you which pools of resources you can request to use. It will give you their names, how many CPU’s, GPU’s, memory, FPGA’s, and amount of time this job can run for.

–flavor(-f) FLAVOR: (required str) This argument specifies which flavor (pool of resources) you would like to request to run your job with. The name of the flavors can be found with the –info command.

–script(-s) SCRIPT: (required str for non-interactive jobs) This is where you specify the path to your script which runs your job. This must be a bash script, and all output files from your program must be located here or at a subdirectory, or they will be lost. Required for non-interactive jobs.

–name(-n): (required str) Here, create a name for the job you want to run. This name can only have lowercase, alphanumeric characters and dashes “-”, no whitespace or underscores.

–cpu COUNT: (required int) The number of CPU’s to request.

–gpu COUNT: (required int) The number of GPU’s (AMD MI-210) to request.

–mem AMOUNT: (required int) Amount of memory in GB to request.

–u55 COUNT: (required int) Number of U55C FPGA’s to request.

–vck COUNT: (required int) Number of VCK-5000 FPGA’s to request.

–time HRS: (required float) Maximum uptime (in hrs) for the container.

–max: (optional) This will ignore other resource arguments and request the maximum amount of resources for the flavor. NOTE: This will likely take a much longer time to be scheduled as there is only one machine capable of supplying all of these resources.

Example:

  • Non-Interactive:
    • /nfs/shared/mlcluster_job_submission -f flavorName -s ~/test_files/script.sh -n job-name –cpu 4 –gpu 1 –u55 1 –vck 0 –mem 8 –time .5