Writing a reduction algorithm for CUDA GPU can be tricky. Numba provides a
@reduce decorator for converting a simple binary operation into a reduction
kernel. An example follows:
import numpy from numba import cuda @cuda.reduce def sum_reduce(a, b): return a + b A = (numpy.arange(1234, dtype=numpy.float64)) + 1 expect = A.sum() # numpy sum reduction got = sum_reduce(A) # cuda sum reduction assert expect == got
Lambda functions can also be used here:
sum_reduce = cuda.reduce(lambda a, b: a + b)
The Reduce class¶
reduce decorator creates an instance of the
reduce is an alias to
Reduce, but this behavior is not
Create a reduction object that reduces values using a given binary function. The binary function is compiled once and cached inside this object. Keeping this object alive will prevent re-compilation.
functor – A function implementing a binary operation for reduction. It will be compiled as a CUDA device function using
__call__(arr, size=None, res=None, init=0, stream=0)¶
Performs a full reduction.
arr – A host or device array.
size – Optional integer specifying the number of elements in
arrto reduce. If this parameter is not specified, the entire array is reduced.
res – Optional device array into which to write the reduction result to. The result is written into the first element of this array. If this parameter is specified, then no communication of the reduction output takes place from the device to the host.
init – Optional initial value for the reduction, the type of which must match
stream – Optional CUDA stream in which to perform the reduction. If no stream is specified, the default stream of 0 is used.
Noneis returned. Otherwise, the result of the reduction is returned.