Running a Weak Scaling Test in Parthenon
Here we present how to perform a weak scaling test on a Power9 architecture with two Volta GPUs per node.
We use the advection test with slightly modified input parameters for performance. AMR is turned on with 3 levels.
The following procedure was tested on power9 nodes on Darwin by Jonah Miller. However, the same procedure should hold more generically.
To Build
Note that depending on your system you may need to disable HDF5 with
-DPARTHENON_DISABLE_HDF5=On
.
module purge
module load cmake gcc/7.4.0 cuda/10.2 openmpi/p9/4.0.1-gcc_7.4.0 anaconda/Anaconda3.2019.10
git clone git@github.com:parthenon-hpc-lab/parthenon.git --recursive
cd parthenon
git checkout develop && git pull
git submodule update --init --recursive
mkdir -p bin && cd bin
cmake -DCMAKE_BUILD_TYPE=Release -DKokkos_ENABLE_OPENMP=True -DKokkos_ARCH_POWER9=True -DKokkos_ENABLE_CUDA=True -DKokkos_ARCH_VOLTA70=True -DCMAKE_CXX_COMPILER=${PWD}/../external/Kokkos/bin/nvcc_wrapper ..
make -j
To Run
Note that if you disabled HDF5 in the previous step, you must open up the
parthinput.advection
file and comment out all blocks beginning with<parthenon/output*>
.Place the following in your job script.
N=1
mpirun -np ${N} ./example/advection/advection-example -i ../example/advection/parthinput.advection parthenon/time/nlim=10 parthenon/mesh/nx1=64 parthenon/mesh/nx2=64 parthenon/mesh/
nx3=64 parthenon/meshblock/nx1=32 parthenon/meshblock/nx2=32 parthenon/meshblock/nx3=32 | tee ${N}.out
N=2
mpirun -np ${N} ./example/advection/advection-example -i ../example/advection/parthinput.advection parthenon/time/nlim=10 parthenon/mesh/nx1=128 parthenon/mesh/nx2=64 parthenon/mesh/
nx3=64 parthenon/meshblock/nx1=32 parthenon/meshblock/nx2=32 parthenon/meshblock/nx3=32 | tee ${N}.out
N=4
mpirun -np ${N} ./example/advection/advection-example -i ../example/advection/parthinput.advection parthenon/time/nlim=10 parthenon/mesh/nx1=128 parthenon/mesh/nx2=128 parthenon/mesh/
nx3=64 parthenon/meshblock/nx1=32 parthenon/meshblock/nx2=32 parthenon/meshblock/nx3=32 | tee ${N}.out
N=8
mpirun -np ${N} ./example/advection/advection-example -i ../example/advection/parthinput.advection parthenon/time/nlim=10 parthenon/mesh/nx1=128 parthenon/mesh/nx2=128 parthenon/mesh/
nx3=128 parthenon/meshblock/nx1=32 parthenon/meshblock/nx2=32 parthenon/meshblock/nx3=32 | tee ${N}.out
N=16
mpirun -np ${N} ./example/advection/advection-example -i ../example/advection/parthinput.advection parthenon/time/nlim=10 parthenon/mesh/nx1=256 parthenon/mesh/nx2=128 parthenon/mesh/
nx3=128 parthenon/meshblock/nx1=32 parthenon/meshblock/nx2=32 parthenon/meshblock/nx3=32 | tee ${N}.out
# and so on
To get the timing data
We use the built in instrumentation inside Parthenon, which is stored in the output.
filename=timings.dat
printf "# nprocs\tzone-cycles/cpu-second\n" > ${filename}
# make sure upper bound on this is log2(Nprocs max)
for n in {0..4}; do echo $((2**n)) $(grep "zone-cycles/cpu_second = " $((2**n)).out | cut -d "=" -f 2) >> ${filename}; done
You can now load the timings in your favorite plotting program.