Sharing CUDA Memory

Sharing between process

Sharing between processes is implemented using the Legacy CUDA IPC API (functions whose names begin with cuIpc), and is supported only on Linux.

Export device array to another process

A device array can be shared with another process in the same machine using the CUDA IPC API. To do so, use the .get_ipc_handle() method on the device array to get a IpcArrayHandle object, which can be transferred to another process.

DeviceNDArray.get_ipc_handle()

Returns a IpcArrayHandle object that is safe to serialize and transfer to another process to share the local allocation.

Note: this feature is only available on Linux.

class numba.cuda.cudadrv.devicearray.IpcArrayHandle(ipc_handle, array_desc)

An IPC array handle that can be serialized and transfer to another process in the same machine for share a GPU allocation.

On the destination process, use the .open() method to creates a new DeviceNDArray object that shares the allocation from the original process. To release the resources, call the .close() method. After that, the destination can no longer use the shared array object. (Note: the underlying weakref to the resource is now dead.)

This object implements the context-manager interface that calls the .open() and .close() method automatically:

with the_ipc_array_handle as ipc_array:
    # use ipc_array here as a normal gpu array object
    some_code(ipc_array)
# ipc_array is dead at this point
close()

Closes the IPC handle to the array.

open()

Returns a new DeviceNDArray that shares the allocation from the original process. Must not be used on the original process.

Import IPC memory from another process

The following function is used to open IPC handle from another process as a device array.

cuda.open_ipc_array(shape, dtype, strides=None, offset=0)

A context manager that opens a IPC handle (CUipcMemHandle) that is represented as a sequence of bytes (e.g. bytes, tuple of int) and represent it as an array of the given shape, strides and dtype. The strides can be omitted. In that case, it is assumed to be a 1D C contiguous array.

Yields a device array.

The IPC handle is closed automatically when context manager exits.