Parallelism
The loop wrappers documented here abstract the Kokkos::parallel_*
parallel launches. The wrappers
simplify the use of Kokkos execution policies
for multidimensional loops through a common interface using loop pattern tags.
Additionally there is a provided parthenon::seq_for
wrapper that uses a similar interface to perform
multidimensional sequential loops.
An example of usage can be found in the unit test
Parthenon |
Kokkos |
---|---|
|
|
|
|
|
|
Parallel launches are passed a string label, a set of inclusive loop bounds, a functor, and any extra arguments needed
for parallel reductions/scans. Optionally a loop pattern tag and an execution space may be provided.
When ommitted the DEFAULT_LOOP_PATTERN
is used.
parthenon::par_for(
loop_pattern_tag, exec_space, PARTHENON_AUTO_LABEL, ks, ke, js, je, is, ie,
KOKKOS_LAMBDA(const int k, const int j, const int i) {
data(k, j, i) += 1.;
});
Parameter |
|
---|---|
loop_pattern_tag |
Determines the execution policy. See table below. |
exec_space |
kokkos execution space |
loop bounds |
inclusive start/end pairs for the multidimensional loop. Supported types are |
functor |
Defines the body of the parallel loop. See Kokkos programming guide for more. |
Tag |
Execution Policy |
---|---|
|
Flattens all of the loops into a single |
|
Maps to two C-style loops. The innermost gets decorated with a |
|
Maps all the loop bounds onto a |
|
Maps onto a hierarchical parallel loop. The |
Cmake Options
PAR_LOOP_LAYOUT
controls the DEFAULT_LOOP_PATTERN
macro.
|
Pattern Tag |
---|---|
“MANUAL1D_LOOP” |
loop_pattern_flatrange_tag |
“SIMDFOR_LOOP” |
loop_pattern_simdfor_tag |
“MDRANGE_LOOP” |
loop_pattern_mdrange_tag |
“TP_TTR_LOOP” |
loop_pattern_tpttr_tag |
“TP_TVR_LOOP” |
loop_pattern_tptvr_tag |
“TPTTRTVR_LOOP” |
loop_pattern_tpttrtvr_tag |
Adding New Loop Patterns
All of the par_for*
overloads get processed into the par_dispatch_impl
struct that
determines the types of the loop pattern, functor, functor arguments, loop bounds, and any
extra arguments need for scans/reductions. The struct implements overloads of the
par_dispatch_impl::dispatch_impl
method that are tagged using the PatternTag
enum
to specialize the LoopPatternTag
struct. New loop patterns need to extend this enum and
provide an additional overload.
There is a chance that the requested loop pattern passed through parthenon::par_for
, for
example a loop_pattern_simdfor_tag
DEFAULT_LOOP_PATTERN
being used in a par_reduce
,
resulting in a conflict. For this reason the DispatchType
type trait provides the
DispatchType::GetPatternTag()
method that processes the requested loop pattern and returns
a PatternTag
and provides sensible fallbacks for the loop pattern if there are any conflicts.
In this way DEFAULT_LOOP_PATTERN
can be reliably used.
Adding New Loop Bound Types
All of the loop bounds provided to any parallel wrapper gets processed by the LoopBoundTranslator
to determine the rank of the multidimensional loop and translate the start/end pairs into an array
of IndexRange
s. Each bound type gets processed individually and allows the flexibility to mix
loop bound types as long as they are supported.
New types can be provided by specializing the ProcessLoopBound
struct in the parthenon
namespace.
These structs need to provide a GetNumBounds
method to count the number of start/end bounds contained
in the type, as well as a GetIndexRanges
method to fill the IndexRange
bounds used in the
parallel dispatch.