Running a Weak Scaling Test in Parthenon

Here we present how to perform a weak scaling test on a Power9 architecture with two Volta GPUs per node.
We use the advection test with slightly modified input parameters for performance. AMR is turned on with 3 levels.
The following procedure was tested on power9 nodes on Darwin by Jonah Miller. However, the same procedure should hold more generically.

To Build

Note that depending on your system you may need to disable HDF5 with -DPARTHENON_DISABLE_HDF5=On.

module purge
module load cmake gcc/7.4.0 cuda/10.2 openmpi/p9/4.0.1-gcc_7.4.0 anaconda/Anaconda3.2019.10
git clone git@github.com:parthenon-hpc-lab/parthenon.git --recursive
cd parthenon
git checkout develop && git pull
git submodule update --init --recursive
mkdir -p bin && cd bin
cmake -DCMAKE_BUILD_TYPE=Release -DKokkos_ENABLE_OPENMP=True -DKokkos_ARCH_POWER9=True -DKokkos_ENABLE_CUDA=True -DKokkos_ARCH_VOLTA70=True -DCMAKE_CXX_COMPILER=${PWD}/../external/Kokkos/bin/nvcc_wrapper ..
make -j

To Run

Note that if you disabled HDF5 in the previous step, you must open up the parthinput.advection file and comment out all blocks beginning with <parthenon/output*>.
Place the following in your job script.

N=1
mpirun -np ${N} ./example/advection/advection-example -i ../example/advection/parthinput.advection parthenon/time/nlim=10 parthenon/mesh/nx1=64 parthenon/mesh/nx2=64 parthenon/mesh/
nx3=64 parthenon/meshblock/nx1=32 parthenon/meshblock/nx2=32 parthenon/meshblock/nx3=32 | tee ${N}.out

N=2
mpirun -np ${N} ./example/advection/advection-example -i ../example/advection/parthinput.advection parthenon/time/nlim=10 parthenon/mesh/nx1=128 parthenon/mesh/nx2=64 parthenon/mesh/
nx3=64 parthenon/meshblock/nx1=32 parthenon/meshblock/nx2=32 parthenon/meshblock/nx3=32 | tee ${N}.out

N=4
mpirun -np ${N} ./example/advection/advection-example -i ../example/advection/parthinput.advection parthenon/time/nlim=10 parthenon/mesh/nx1=128 parthenon/mesh/nx2=128 parthenon/mesh/
nx3=64 parthenon/meshblock/nx1=32 parthenon/meshblock/nx2=32 parthenon/meshblock/nx3=32 | tee ${N}.out

N=8
mpirun -np ${N} ./example/advection/advection-example -i ../example/advection/parthinput.advection parthenon/time/nlim=10 parthenon/mesh/nx1=128 parthenon/mesh/nx2=128 parthenon/mesh/
nx3=128 parthenon/meshblock/nx1=32 parthenon/meshblock/nx2=32 parthenon/meshblock/nx3=32 | tee ${N}.out

N=16
mpirun -np ${N} ./example/advection/advection-example -i ../example/advection/parthinput.advection parthenon/time/nlim=10 parthenon/mesh/nx1=256 parthenon/mesh/nx2=128 parthenon/mesh/
nx3=128 parthenon/meshblock/nx1=32 parthenon/meshblock/nx2=32 parthenon/meshblock/nx3=32 | tee ${N}.out

# and so on

To get the timing data

We use the built in instrumentation inside Parthenon, which is stored in the output.

filename=timings.dat
printf "# nprocs\tzone-cycles/cpu-second\n" > ${filename}
# make sure upper bound on this is log2(Nprocs max)
for n in {0..4}; do echo $((2**n)) $(grep "zone-cycles/cpu_second = " $((2**n)).out | cut -d "=" -f 2) >> ${filename}; done

You can now load the timings in your favorite plotting program.