Sparse implementation

This file describes the implementation of sparse variables in Parthenon. Parthenon also provides the capability to use sparse fields, where “sparse” really refers to two distinct and orthogonal concepts, namely (a) a family or pool of distinct and separate variable fields that share a common base name and are distinguished by a sparse ID (sparse naming), and (b) variables that are only allocated on some but not necessarily all mesh blocks in the domain (sparse fields). In principle, both of these “sparse” concepts can be used independently, i.e.  we could make a family of dense variables that share a base name and are distinguished by sparse IDs, or we could simply have a single variable with a label without a sparse ID that is only allocated on some blocks but not others. However, in practice these two concepts are often used together. As a result, the current implementation of sparse fields in parthenon only allows fields to be sparse if they are sparsely named. This should be relaxed in the future. These two concepts are addressed in separate sections below.

Sparse Fields

For computation and memory savings, it may be desirable to explicitly evolve certain fields only on certain sub-regions of the grid. In regions of the grid where certain fields are not evolved, they are assumed to take some default value (usually zero) and storage for these fields does not need to be allocated in those sub-regions. Fields that are explicitly evolved on only certain sub-regions of the grid are termed “sparse fields.”

Some examples of where sparse fields might be desirable:

  • Properties (density, temperature, etc.) of materials that are only present in certain parts of the simulation. One does not want to allocate memory that contains all zeros and perform operations on zeros in regions where the material is not present.

  • A self-gravitating fluid in a vacuum. Often, astrophysics codes will model this by including fluid throughout the entire simulation domain but set a floor on the density and temperature of the material to create an effective “atmosphere” around the actual self-gravitating system. Regions that are set to the floor do not necessarily need to be explicitly tracked and the fluid quantities could be treated as sparse fields.

  • Material that is transitioning between non-equilibrium nuclear burning and nuclear statistical equilibrium (NSE). In the non-equilibrium regions, the abundances of all nuclear species are required to define the composition of the material while in the regions that are in NSE only one field (the electron fraction) is required to define the composition.

In Parthenon, sparse fields are implemented at the block level, meaning that a field is allocated everywhere or nowhere on each block in the simulation. The set of possible operations for a sparse field are:

  • Allocation on initialization: Sparse variables need to be allocated by hand in the ProblemGenerator in regions where they are non-zero.

  • Allocation due to local changes: For some types of sparse fields (e.g. nuclear abundances fields for material transitioning in and out of NSE), it may be desirable to allocate them based on the local state on the block (e.g. allocate abundances if the temperature is anywhere on the block below some threshold). This is an operation that needs to be defined in a task by the downstream code, since it is problem specific. Currently, Parthenon does not have this capability. It will probably be necessary to define some hooks for checking blocks for the necessity of allocation and having some internal Parthenon functions for actually performing the allocation.

  • Allocation due to neighbors: If a sparse field on a neighbor block passes ghost data that is anywhere above the allocation_threshold, the receiving block will allocate the sparse field if it is unallocated before loading the ghost data. allocation_threshold is set in the Metadata of a field. A detailed description of sparse ghost zone communication is given here.

  • Allocation due to other fields: In some instances, we may want to allocate or deallocate a sparse field when another sparse field is allocated or deallocated, not when the field itself changes state (i.e. we would like to allocate a sparse field describing a dependent variable whenever a corresponding inedendent sparse variable is allocated). For this, we introduce the notion of controlling and controlled fields. A controlling field has a list of field names associated with it and whenever allocation or deallocation of the controlling field occurs on a block, it also goes through the list of controlled fields associated with it and allocates or deallocates them. To make a controlled sparse field, a sparse pool can be created using SparsePool(base_name, metadata, controller_base_name, ...) where controller_base_name is base name of the controlling field. Fields associated with this sparse pool will only be allocated or deallocated based on the corresponding fields in the controller_base_name sparse pool. There are no checks performed that the two pools contain the same set of sparse indices.

  • Initialization after allocation: Depending on what a particular sparse field is modeling, its values on a block may need to be initialized to values that are non-zero and/or depend on the state of other fields on the block. Since this requirement is problem dependent, downstream codes using Parthenon must define tasks that perform this initialization. If no task is defined, a newly initialized field is everywhere on the block set to the default_value defined in its Metadata. The implementation of this needs to be fleshed out and more extensively tested and this description needs more detail about how to register initialization functions. Also needs to describe how to check for sub-regions of a block where uninitialized data has been passed in. In the downstream use cases so far, this has not been a necessary feature.

  • Deallocation due to local changes: Currently, the only way for a sparse field to be deallocated is if its absolute value falls below deallocation_threshold everywhere on the interior of a block (or if its controlling field is deallocated). The deallocation_threshold is set in the sparse fields metadata.

  • Deallocation due to other fields: If a field is controlled, it is deallocated whenever its controlling field is deallocated.

  • Access if allocated: Since fields are mainly accessed through packs in Parthenon based codes (a VariablePack, MeshBlockPack, or SparsePack<...>), packs need to carry around information about the allocation status of the requested fields in the pack. A “dense sparse packing” strategy is used in SparsePack<...>, where only allocated fields are included in the index space of the pack. Here, the index space means (block, field, k, j, i) and for sparse fields the number of allocated fields can change from block to block which means the index space is logically ragged for dense sparse packing. For each block, the range of indices corresponding to a particular field in the pack can be accessed on device and iterated over (if the range has positive size, a negative size for the range indicates that none of the corresponding fields are allocated). Looping over fields in these type of packs generally requires hierarchichal parallelism. Currently, VariablePack and MeshBlockPack employ a “sparse sparse packing” strategy, where all fields are included in the index space of the pack but the allocation status of (block, field) must be checked before accessing (block, field, k, j, i) since this is not guaranteed to point to valid memory. *There is “dense sparse pack” implementation of VariablePack and MeshBlockPack in the branch lroberts36/merge-sparse-with-jdolence-sparse that is being used in Riot. This should probably be brought into develop, since the “sparse sparse pack” access pattern is probably not desirable.

  • Flatten outer indices: Most common packs over MeshData produce a 5-dimensional data structure, where the slowest two moving indices are over MeshBlocks then tensor/vector components of Variables. However, it is sometimes desirable to construct a 4-dimensional data structure where the slowest moving index ranges over all variables and all blocks. So for example if we had a system with 9 blocks and 5 variables per block, the slowest moving index would be of size 45. This can be enabled SparsePack``s with the ``GetFlat series of factory functions or by passing the optional flat boolean into the constructor.

In comparison to a sparse field, a dense field only requires the operation Access.

To set the thresholds for a sparse field, after creating the ``Metadata`` object that will be used for the field, call ``Metadata::SetSparseThresholds(allocation_threshold, deallocation_threshold, default_value)``.

Turning off sparse

The sparse allocation feature can be turned off at run- or compile-time. The sparse naming feature cannot be turned off.

Run-time

Setting enable_sparse to false (default is true) in the parthenon/sparse block of the input file turns on the “fake sparse” mode. In this mode, all variables are always allocated on all blocks, just if they were all dense, and they will not be automatically deallocated. Thus the fake sparse mode produces the same results as if all variables were declared dense, but the infrastructure will still perform IsAllocated checks, so this mode does not remove the sparse infrastructure overhead, but it is useful to debug issues arising with the usage of sparse variables.

Compile-time

Turning on the CMake option PARTHENON_DISABLE_SPARSE turns on fake sparse mode (see above) and also replaces all the IsAllocated functions with essentially constexpr bool IsAllocated() const { return true; } so that they should all be optimized out and thus the sparse infrastructure overhead should be removed, which will be useful for measuring the performance impact of the sparse overhead. Note however, that there will still be some overhead due to the sparse implementation on the host. For example, the allocation status of the variables will still be part of variable pack caches and will be checked when retrieving packs from the cache. However, since fake sparse is enabled, the allocation statuses will always be all true, thus not resulting in any additional cache misses.

If sparse is compile-time disabled, this information is passed through to the regression test suite, which will adjust its comparison to gold results accordingly.

Sparse naming

Of the two sparse concepts described above, sparse naming is much simpler to implement, because it is essentially just a convenient front end to the machinery provided by the state descriptor, containers, and other parts of the Parthenon infrastructure, all of which don’t need to know anything about sparse naming. Once a family or pool of variables sharing the same base name but having different sparse IDs is added to the state descriptor, they are treated exactly like ordinary, unrelated variables that all have distinct labels. The only exception is functions that take a set of flags or labels to pull out a list of variables. These functions are aware that multiple variables can share the same base name, and it will match all those variables if the base name is given in a list of labels, furthermore, many of such functions take an optional list of sparse IDs, which can be used to restrict the variable selection to specific sparse IDs. But again, these are just front end conveniences. Once the list of variables is assembled, all the variables are treated as completely independently and unrelated, just like dense variables.

Sparse naming is implemented through the SparsePool class, which can be added to a state descriptor via AddSparsePool. A SparsePool consists of: (i) a base name, (ii) a shared Metadata instance, and (iii) a list of sparse IDs, which may be used. Note that the list of sparse IDs must be specified when the sparse pool is created and once its added to the state descriptor, that list cannot be changed. This limitation drastically simplifies the sparse naming implementation, because it means that we know the complete list of variables at the beginning and that list is always the same on all mesh blocks. The individual Variable instances that are created for each sparse ID have a label of the form <base name>_<sparse index> and the have the same metadata as the shared metadata of the pool, with two exceptions: (i) the shape of the variable can be set per sparse ID (i.e. some ID could be a scalar, another a vector of length 2, another a vector of length 12, another a rank-3 tensor, etc.), and (ii) related to the shape, the Metadata::Vector and Metadata::Tensor flags can be individually set per sparse ID as well.

The sparse ID can be any integer (positive, negative, and zero) except the smallest possible integer (std::numeric_limits<int>::min()), which is reserved to mean an invalid sparse ID. It is not allowed to add a dense variable with a label that is used as a base name for a SparsePool or vice versa.

When a sparse pool is added to the state descriptor, it simply adds a separate variable for each of its sparse IDs with the appropriate metadata and composite label (as described above). After this point, the rest of the infrastructure treats those variables like any other unrelated variables, with the following exception.When one specifies a variable label in a list of labels, for example in the PackVariable or PackVariablesAndFluxes functions, one can simply specify the base name in the list of labels, which will add all sparse variables with that base name to the resulting list of variables. Furthermore, the Pack* functions also take an optional argument to specify a list of sparse IDs. If such a list is present, then only sparse variables with an ID from that list will be added to the pack. However, when using a label to refer to a single variable, one must specify the full label (base name plus sparse ID) to refer to a particular sparse variable.

Sparse allocation and deallocation implementation

This section has not been completely updated from the original sparse implementation and is kept here as a reference for developers.

Implementing the sparse allocation capability requires deep changes in the entire infrastructure, because the entire infrastructure assumed that all variables are always allocated on all blocks. It also raises the question of how to handle the case when one block has a sparse variable allocated and its neighbor doesn’t. Under what circumstances will the neighboring block have to allocate that sparse variable and how will this be communicated? Furthermore, the use of MPI to communicate boundary and other data between blocks on different MPI ranks requires that the sending and receiving ranks both call send and receive functions for each message passed between them, which complicates the situation where two neighboring blocks don’t have the same sparse variables allocated and thus would like to communicate data for different sets of variables.

Before describing the bigger infrastructure changes to handle the boundary communication for sparse variables, here are some smaller changes that are necessary for sparse variables to work.

  • Variable tracks its allocation status and has member functions to allocate and deallocate its data (data, flux, and coarse_s).

  • A Variable now knows its dimensions and coarse dimensions. Because the ParArrayND<T> data member holding the actual variable data is not necessarily allocated (i.e., it has a size of 0), we can no longer use its size to get the dimension of the Variable, but we still need to know its dimensions when it’s unallocated, for example when adding it to a pack. Similarly, the coarse_s member used to be queried to get the coarse dimensions, but that is also not always allocated, thus Variable also directly knows its coarse dimensions.

  • Variable, MeshBlock, MeshBlockData, variable packs, and mesh block packs, all have new member functions IsAllocated to query whether a particular variable is allocated or not. Generally speaking, whenever the data or fluxes of a variable are accessed, such accesses need to be guarded with IsAllocated checks.

  • The pvars_cc_ field of the MeshRefinement class is now a std::vector<std::shared_ptr<Variable<Real>>> instead of a std::vector<std::tuple<ParArrayND<Real>, ParArrayND<Real>>>. The problem with storing (shallow) copies of the ParArrayNDs data and coarse_s is that they don’t point to the newly allocated views if a variable is initially unallocated and then gets allocated during the evolution. Storing a pointer to the Variable instance works because that one remains the same when it gets allocated.

  • The caching mechanisms for variable packs, mesh block packs, send buffers, receive (i.e., set) buffers, and restrict buffers now all include the allocation status of all the contained variables (as a std::vector<int> because it’s only used on the host). When a pack or buffers collection is requested, the allocation status of the cached entity is compared to the current allocation status of the variables and if they don’t match, the pack or buffer collection is recreated.

  • The Globals namespace contains some global sparse settings (whether sparse is enabled, allocation/deallocation thresholds, and deallocation count).

Below follows a detailed description of the main sparse allocation implementation.

Allocation status

Every Variable is either allocated or deallocated at all times. Furthermore, the Variables with the same label but corresponding to different stages (i.e., MeshBlockData instances) of the same MeshBlock are always either allocated or deallocated on all stages of the mesh block. This is enforced by the fact that the only public methods to (de)allocate a variable is through the mesh block. The MeshBlock::AllocateSparse and MeshBlock::AllocSparseID functions are meant to be used in the user code to specifically allocate a sparse variable on a given block (usually, this would be done in the problem generator). They are also used internally by the infrastructure to allocate a sparse variable on a block if it receives non-zero boundary data for that block, see Boundary exchange for details. The infrastructure can also automatically deallocate sparse variables on a block, see Deallocation.

When a Variable is allocated, its data, flux, and coarse_s fields are allocated. When the variable is deallocated, those fields are reset to ParArrayNDs of size 0.

Deallocation

There is a new task called SparseDealloc in src/interface/update.cpp taking a MeshData pointer. It is meant to be run after the update task for the last stage (of course, it does not have to be run every time step). On every block, it checks the values of all sparse variables. If the maximum absolute value is below the user-defined deallocation threshold, the variable is flagged for deallocation on that block. The variable is only actually deallocated if it has been flagged for deallocation a certain number of times in a row (if any of the values exceeds the deallocation threshold, the counter is reset to 0). That number is the deallocation count, which is also settable by the user in the input file.

Boundary exchange

Boundary communication can trigger allocation of a field on the receiving block if the communicated ghost data is above the allocation threshold. Otherwise sparse boundary communication is the same as dense boundary communication. A detailed description of the boundary communication and flux correction implementation in Parthenon is given here.

AMR and load balancing

The sparse implementation for AMR and load balancing is quite straight forward. For AMR, when we create new mesh blocks, we allocate the same variables on them as there were allocated on the old mesh blocks the new ones are created from.

For the load balancing, we need to send the allocation statuses of the variables together with their data. So we add flags at the beginning of the send/receive buffers to indicate the allocation statuses. There is one flag per variable. The rest of the buffer is unchanged and always includes space for all variables regardless whether they are allocated or not. This simplifies the implementation drastically, because all the MPI messages have the same size and the sender and receiver know what that size is without needing the know the allocation status of the other block. The remaining changes are as follows:

  • In Mesh::PrepareSendSameLevel we only fill the send buffer (using BufferUtility::PackData) if the variable is allocated, otherwise we simply skip that region of the buffer (and leave its values uninitialized, since they won’t be read) so that the data for each variable is in the same place as if all variables were allocated.

  • In Mesh::PrepareSendCoarseToFineAMR and Mesh::PrepareSendFineToCoarseAMR we do the same as above, but instead of leaving regions of the buffer belonging to unallocated variables uninitialized, we fill them with zeros (using BufferUtility::PackZero) since the target block may have the variable allocated even if the sender doesn’t (actually, I think this can only happen for fine-to-coarse and not for coarse-to-fine).

  • In Mesh::FillSameRankFineToCoarseAMR when filling in the destination data, we write zeros if the fine source block doesn’t have the variable allocated. Whereas in Mesh::FillSameRankCoarseToFineAMR we make sure the source and destination blocks have the same allocation status for each variable and we simply skip unallocated variables.

  • In all three types of Mesh::FinishRecv* functions, we read the allocation flags for all variables from the buffer, and we allocate it on the receiving block if the sending block had it allocated but it’s not yet allocated on the receiving block. We then proceed to read the buffer only if the variable is allocated on the receiving block.

Memory Footprint Reporting

With sparse variables (or particles), the memory footprint per meshblock may vary with meshblock. Each MeshBlock object keeps a running total of its memory footprint. You can get the footprint of an individual meshblock by calling:

If desired, you may also manually change the recorded memory footprint of a given meshblock with the function:

where delta is the change in memory.

Warning

Call LogMemUsage with caution! Variable and swarm allocation and de-allocation are automatically tracked.