Skip Navigation

Using the AMD compilers on Ookami

Ookami users can take advantage of the AMD Optimizing C/C++ Compiler (AOCC) software suite that includes a set of compilers and debuggers tuned and optimzed for the  AMD EPYC architecture.

While the AMD compilers should work on any Ookami x86_64 node, we recommend specifically using it on the  fj-epyc node, as it contains the architecture for which AOCC is optimized. Therefore, users should first either:

A) start an interactive Slurm job and request the milan-64core partition


C) Alternatively, if no interactive session is desired, users may simply write and submit a Slurm job submission script to compile the code.

Once on an appropriate node, load the following module:

module load aocc/3.0.0

This will add the clang, clang++, and flang executables (among others) to your $PATH.

Here, we will use an example matrix multiplication code to demonstrate the use of the clang++ compiler. Because this code compiles without issue and does not require any interactive troubleshooting, we can  write a Slurm script to compile and run the code:

#!/usr/bin/env bash

#SBATCH --job-name=amd_example
#SBATCH --output=amd_example.log
#SBATCH --ntasks-per-node=64
#SBATCH --nodes=1
#SBATCH --time=05:00
#SBATCH -p milan-64core

# unload any modules currently loaded
module purge

# load the AOCC module
module load aocc/3.0.0

# copy the sample C++ code to the working directory
cp /lustre/projects/global/samples/ARM-sample/mm.cpp $SLURM_SUBMIT_DIR

# compile the code using the AMD clang++ compiler
clang++ mm.cpp -o mm

# run the code on an 1000 x 1000 x 1000 matrix
./mm 1000 1000 1000

Let's call this script "amd-example.slurm" and submit it with sbatch:

sbatch amd-example.slurm

Once the job has run, you should see something similar to the following in the job's log file ("amd_example.log"), indicating that the matrix multiplication code has compiled and run sucessfully:

Set up of matrices took: 0.020 seconds
Performing multiply
Naive multiply took: 3.312 seconds