Task-based reductions
Many codes require the ability to do global reductions. In a task-based environment where each rank may be executing multiple tasks lists operating on independent sub-domains, orchestrating these reductions turns out to be nontrivial. Here, we document a Parthenon-ic way of expressing a global reduction. The basic strategy follows:
Initialize a variable that will capture the local (just a given MPI rank’s value) reduced value.
Launch a task in each list which updates the local value. For example, if the reduction is a sum, each task will add its contribution to this shared variable. Since task launching is not threaded, there is no concern over race conditions.
Mark the task which accumulates the local reduction using the
TaskRegionmember functionAddRegionalDependencies. This will ensure that tasks that require a complete local reduction will not launch until that local value is available.One task list on each rank launches a non-blocking reduction operation.
All task lists launch a task which checks the status of the reduction, returning
TaskStatus::completeonce the value of the global reduction is set.
To facilitate this pattern, parthenon provides an AllReduce struct,
described below. Examples of the pattern above and the usage of
AllReduce are provided
here.
AllReduce
AllReduce is a struct templated on the type of value that needs to
be reduced (e.g. int, Real, std::vector<Real>, etc.). It
manages the storage in a member variable val which is of the type
provided as a template argument. val must be appropriately
initialized by the user. The functionality in AllReduce (described
above) is exposed through two member functions, StartReduce and
CheckReduce. StartReduce requires a single argument which is the
MPI reduction operator (e.g. MPI_SUM, MPI_MAX, etc.). Both of
these tasks are non-blocking (i.e. they call MPI_Iallreduce and
MPI_Test).
Reduce
Same as AllReduce except MPI_Ireduce is called and the root rank
of the reduction must be provided in StartReduce