Memory Management

numba.cuda.to_device(obj, stream=0, copy=True, to=None)

Allocate and transfer a numpy ndarray or structured scalar to the device.

To copy host->device a numpy array:

ary = np.arange(10)
d_ary = cuda.to_device(ary)

To enqueue the transfer to a stream:

stream = cuda.stream()
d_ary = cuda.to_device(ary, stream=stream)

The resulting d_ary is a DeviceNDArray.

To copy device->host:

hary = d_ary.copy_to_host()

To copy device->host to an existing array:

ary = np.empty(shape=d_ary.shape, dtype=d_ary.dtype)
d_ary.copy_to_host(ary)

To enqueue the transfer to a stream:

hary = d_ary.copy_to_host(stream=stream)
numba.cuda.device_array(shape, dtype=np.float64, strides=None, order='C', stream=0)

Allocate an empty device ndarray. Similar to numpy.empty().

numba.cuda.device_array_like(ary, stream=0)

Call device_array() with information from the array.

numba.cuda.pinned_array(shape, dtype=np.float64, strides=None, order='C')

Allocate an ndarray with a buffer that is pinned (pagelocked). Similar to np.empty().

numba.cuda.pinned_array_like(ary)

Call pinned_array() with the information from the array.

numba.cuda.mapped_array(shape, dtype=np.float64, strides=None, order='C', stream=0, portable=False, wc=False)

Allocate a mapped ndarray with a buffer that is pinned and mapped on to the device. Similar to np.empty()

Parameters
  • portable – a boolean flag to allow the allocated device memory to be usable in multiple devices.

  • wc – a boolean flag to enable writecombined allocation which is faster to write by the host and to read by the device, but slower to write by the host and slower to write by the device.

numba.cuda.mapped_array_like(ary, stream=0, portable=False, wc=False)

Call mapped_array() with the information from the array.

numba.cuda.managed_array(shape, dtype=np.float64, strides=None, order='C', stream=0, attach_global=True)

Allocate a np.ndarray with a buffer that is managed. Similar to np.empty().

Managed memory is supported on Linux / x86 and PowerPC, and is considered experimental on Windows and Linux / AArch64.

Parameters

attach_global – A flag indicating whether to attach globally. Global attachment implies that the memory is accessible from any stream on any device. If False, attachment is host, and memory is only accessible by devices with Compute Capability 6.0 and later.

numba.cuda.pinned(*arylist)

A context manager for temporary pinning a sequence of host ndarrays.

numba.cuda.mapped(*arylist, **kws)

A context manager for temporarily mapping a sequence of host ndarrays.

Device Objects

class numba.cuda.cudadrv.devicearray.DeviceNDArray(shape, strides, dtype, stream=0, gpu_data=None)

An on-GPU array type

copy_to_device(ary, stream=0)

Copy ary to self.

If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.

copy_to_host(ary=None, stream=0)

Copy self to ary or create a new Numpy ndarray if ary is None.

If a CUDA stream is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.

Always returns the host array.

Example:

import numpy as np
from numba import cuda

arr = np.arange(1000)
d_arr = cuda.to_device(arr)

my_kernel[100, 100](d_arr)

result_array = d_arr.copy_to_host()
is_c_contiguous()

Return true if the array is C-contiguous.

is_f_contiguous()

Return true if the array is Fortran-contiguous.

ravel(order='C', stream=0)

Flattens a contiguous array without changing its contents, similar to numpy.ndarray.ravel(). If the array is not contiguous, raises an exception.

reshape(*newshape, **kws)

Reshape the array without changing its contents, similarly to numpy.ndarray.reshape(). Example:

d_arr = d_arr.reshape(20, 50, order='F')
split(section, stream=0)

Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.

class numba.cuda.cudadrv.devicearray.DeviceRecord(dtype, stream=0, gpu_data=None)

An on-GPU record type

copy_to_device(ary, stream=0)

Copy ary to self.

If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.

copy_to_host(ary=None, stream=0)

Copy self to ary or create a new Numpy ndarray if ary is None.

If a CUDA stream is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.

Always returns the host array.

Example:

import numpy as np
from numba import cuda

arr = np.arange(1000)
d_arr = cuda.to_device(arr)

my_kernel[100, 100](d_arr)

result_array = d_arr.copy_to_host()
class numba.cuda.cudadrv.devicearray.MappedNDArray(shape, strides, dtype, stream=0, gpu_data=None)

A host array that uses CUDA mapped memory.

copy_to_device(ary, stream=0)

Copy ary to self.

If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.

copy_to_host(ary=None, stream=0)

Copy self to ary or create a new Numpy ndarray if ary is None.

If a CUDA stream is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.

Always returns the host array.

Example:

import numpy as np
from numba import cuda

arr = np.arange(1000)
d_arr = cuda.to_device(arr)

my_kernel[100, 100](d_arr)

result_array = d_arr.copy_to_host()
split(section, stream=0)

Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.