Supported Atomic Operations

Numba provides access to some of the atomic operations supported in CUDA. Those that are presently implemented are as follows:

class numba.cuda.atomic

Namespace for atomic operations

class add(ary, idx, val)

Perform atomic ary[idx] += val. Supported on int32, float32, and float64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class and_(ary, idx, val)

Perform atomic ary[idx] &= val. Supported on int32, int64, uint32 and uint64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class cas(ary, idx, old, val)

Conditionally assign val to the element idx of an array ary if the current value of ary[idx] matches old.

Supported on int32, int64, uint32, uint64 operands only.

Returns the old value as if it is loaded atomically.

class compare_and_swap(ary, old, val)

Conditionally assign val to the first element of an 1D array ary if the current value matches old.

Supported on int32, int64, uint32, uint64 operands only.

Returns the old value as if it is loaded atomically.

class dec(ary, idx, val)

Performs:

ary[idx] = (value if (array[idx] == 0) or
            (array[idx] > value) else array[idx] - 1)

Supported on uint32, and uint64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class exch(ary, idx, val)

Perform atomic ary[idx] = val. Supported on int32, int64, uint32 and uint64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class inc(ary, idx, val)

Perform atomic ary[idx] += 1 up to val, then reset to 0. Supported on uint32, and uint64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class max(ary, idx, val)

Perform atomic ary[idx] = max(ary[idx], val).

Supported on int32, int64, uint32, uint64, float32, float64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class min(ary, idx, val)

Perform atomic ary[idx] = min(ary[idx], val).

Supported on int32, int64, uint32, uint64, float32, float64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class nanmax(ary, idx, val)

Perform atomic ary[idx] = max(ary[idx], val).

NOTE: NaN is treated as a missing value such that: nanmax(NaN, n) == n, nanmax(n, NaN) == n

Supported on int32, int64, uint32, uint64, float32, float64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class nanmin(ary, idx, val)

Perform atomic ary[idx] = min(ary[idx], val).

NOTE: NaN is treated as a missing value, such that: nanmin(NaN, n) == n, nanmin(n, NaN) == n

Supported on int32, int64, uint32, uint64, float32, float64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class or_(ary, idx, val)

Perform atomic ary[idx] |= val. Supported on int32, int64, uint32 and uint64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class sub(ary, idx, val)

Perform atomic ary[idx] -= val. Supported on int32, float32, and float64 operands only.

Returns the old value at the index location as if it is loaded atomically.

class xor(ary, idx, val)

Perform atomic ary[idx] ^= val. Supported on int32, int64, uint32 and uint64 operands only.

Returns the old value at the index location as if it is loaded atomically.

Example

The following code demonstrates the use of numba.cuda.atomic.max to find the maximum value in an array. Note that this is not the most efficient way of finding a maximum in this case, but that it serves as an example:

from numba import cuda
import numpy as np

@cuda.jit
def max_example(result, values):
    """Find the maximum value in values and store in result[0]"""
    tid = cuda.threadIdx.x
    bid = cuda.blockIdx.x
    bdim = cuda.blockDim.x
    i = (bid * bdim) + tid
    cuda.atomic.max(result, 0, values[i])


arr = np.random.rand(16384)
result = np.zeros(1, dtype=np.float64)

max_example[256,64](result, arr)
print(result[0]) # Found using cuda.atomic.max
print(max(arr))  # Print max(arr) for comparison (should be equal!)

Multiple dimension arrays are supported by using a tuple of ints for the index:

@cuda.jit
def max_example_3d(result, values):
    """
    Find the maximum value in values and store in result[0].
    Both result and values are 3d arrays.
    """
    i, j, k = cuda.grid(3)
    # Atomically store to result[0,1,2] from values[i, j, k]
    cuda.atomic.max(result, (0, 1, 2), values[i, j, k])

arr = np.random.rand(1000).reshape(10,10,10)
result = np.zeros((3, 3, 3), dtype=np.float64)
max_example_3d[(2, 2, 2), (5, 5, 5)](result, arr)
print(result[0, 1, 2], '==', np.max(arr))