Notes on stencils

Numba provides the @stencil decorator to represent stencil computations. This document explains how this feature is implemented in the several different modes available in Numba. Currently, calls to the stencil from non-jitted code is supported as well as calls from jitted code, either with or without the parallel=True option.

The stencil decorator

The stencil decorator itself just returns a StencilFunc object. This object encapsulates the original stencil kernel function as specified in the program and the options passed to the stencil decorator. Also of note is that after the first compilation of the stencil, the computed neighborhood of the stencil is stored in the StencilFunc object in the neighborhood attribute.

Handling the three modes

As mentioned above, Numba supports the calling of stencils from inside or outside a @jit compiled function, with or without the parallel=True option.

Outside jit context

StencilFunc overrides the __call__ method so that calls to StencilFunc objects execute the stencil:

def __call__(self, *args, **kwargs):
    result = kwargs.get('out')

    new_stencil_func = self._stencil_wrapper(result, None, *args)

    if result is None:
        return new_stencil_func.entry_point(*args)
    else:
        return new_stencil_func.entry_point(*args, result)

First, the presence of the optional out parameter is checked. If it is present then the output array is stored in result. Then, the call to _stencil_wrapper generates the stencil function given the result and argument types and finally the generated stencil function is executed and its result returned.

Jit without `parallel=True`

When constructed, a StencilFunc inserts itself into the typing context’s set of user functions and provides the _type_me callback. In this way, the standard Numba compiler is able to determine the output type and signature of a StencilFunc. Each StencilFunc maintains a cache of previously seen combinations of input argument types and keyword types. If previously seen, the StencilFunc returns the computed signature. If not previously computed, the StencilFunc computes the return type of the stencil by running the Numba compiler frontend on the stencil kernel and then performing type inference on the Numba IR (IR) to get the scalar return type of the kernel. From that, a Numpy array type is constructed whose element type matches that scalar return type.

After computing the signature of the stencil for a previously unseen combination of input and keyword types, the StencilFunc then creates the stencil function itself. StencilFunc then installs the new stencil function’s definition in the target context so that jitted code is able to call it.

Thus, in this mode, the generated stencil function is a stand-alone function called like a normal function from within jitted code.

Jit with `parallel=True`

When calling a StencilFunc from a jitted context with parallel=True, a separate stencil function as generated by Creating the stencil function is not used. Instead, parfors (Stage 5b: Perform Automatic Parallelization) are created within the current function that implements the stencil. This code again starts with the stencil kernel and does a similar kernel size computation but then rather than standard Python looping syntax, corresponding parfors are created so that the execution of the stencil will take place in parallel.

The stencil to parfor translations can also be selectively disabled by setting parallel={'stencil': False}, among other sub-options described in Stage 5b: Perform Automatic Parallelization.

Creating the stencil function

Conceptually, a stencil function is created from the user-specified stencil kernel by adding looping code around the kernel, transforming the relative kernel indices into absolute array indices based on the loop indices, and replacing the kernel’s return statement with a statement to assign the computed value into the output array.

To accomplish this transformation, first, a copy of the stencil kernel IR is created so that subsequent modifications of the IR for different stencil signatures will not effect each other.

Then, an approach similar to how GUFunc’s are created for parfors is employed. In a text buffer, a Python function is created with a unique name. The input array parameter is added to the function definition and if the out argument type is present then an out parameter is added to the stencil function definition. If the out argument is not present then first an output array is created with numpy.zeros having the same shape as the input array.

The kernel is then analyzed to compute the stencil size and the shape of the boundary (or the neighborhood stencil decorator argument is used for this purpose if present). Then, one for loop for each dimension of the input array is added to the stencil function definition. The range of each loop is controlled by the stencil kernel size previously computed so that the boundary of the output image is not modified but instead left as is. The body of the innermost for loop is a single sentinel statement that is easily recognized in the IR. A call to exec with the text buffer is used to force the stencil function into existence and an eval is used to get access to the corresponding function on which run_frontend is used to get the stencil function IR.

Various renaming and relabeling is performed on the stencil function IR and the kernel IR so that the two can be combined without conflict. The relative indices in the kernel IR (i.e., getitem calls) are replaced with expressions where the corresponding loop index variables are added to the relative indices. The return statement in the kernel IR is replaced with a setitem for the corresponding element in the output array. The stencil function IR is then scanned for the sentinel and the sentinel replaced with the modified kernel IR.

Next, compile_ir is used to compile the combined stencil function IR. The resulting compile result is cached in the StencilFunc so that other calls to the same stencil do not need to undertake this process again.

Exceptions raised

Various checks are performed during stencil compilation to make sure that user-specified options do not conflict with each other or with other runtime parameters. For example, if the user has manually specified a neighborhood to the stencil decorator, the length of that neighborhood must match the dimensionality of the input array. If this is not the case, a ValueError is raised.

If the neighborhood has not been specified then it must be inferred and a requirement to infer the kernel is that all indices are constant integers. If they are not, a ValueError is raised indicating that kernel indices may not be non-constant.

Finally, the stencil implementation detects the output array type by running Numba type inference on the stencil kernel. If the return type of this kernel does not match the type of the value passed to the cval stencil decorator option then a ValueError is raised.