Compiling Python functions for use with other languages

Numba can compile Python code to PTX or LTO-IR so that Python functions can be incorporated into CUDA code written in other languages (e.g. C/C++). It is commonly used to support User-Defined Functions written in Python within the context of a library or application.

The compilation API can be used without a GPU present, as it uses no driver functions and avoids initializing CUDA in the process. It is invoked through the following function:

numba.cuda.compile(pyfunc, sig, debug=False, lineinfo=False, device=True, fastmath=False, cc=None, opt=True, abi='c', abi_info=None, output='ptx')

Compile a Python function to PTX or LTO-IR for a given set of argument types.

Parameters

pyfunc – The Python function to compile.
sig – The signature representing the function’s input and output types. If this is a tuple of argument types without a return type, the inferred return type is returned by this function. If a signature including a return type is passed, the compiled code will include a cast from the inferred return type to the specified return type, and this function will return the specified return type.
debug (bool) – Whether to include debug info in the compiled code.
lineinfo (bool) – Whether to include a line mapping from the compiled code to the source code. Usually this is used with optimized code (since debug mode would automatically include this), so we want debug info in the LLVM IR but only the line mapping in the final output.
device (bool) – Whether to compile a device function.
fastmath (bool) – Whether to enable fast math flags (ftz=1, prec_sqrt=0, prec_div=, and fma=1)
cc (tuple) – Compute capability to compile for, as a tuple (MAJOR, MINOR). Defaults to (5, 0).
opt (bool) – Enable optimizations. Defaults to True.
abi (str) – The ABI for a compiled function - either "numba" or "c". Note that the Numba ABI is not considered stable. The C ABI is only supported for device functions at present.
abi_info (dict) – A dict of ABI-specific options. The "c" ABI supports one option, "abi_name", for providing the wrapper function’s name. The "numba" ABI has no options.
output (str) – Type of output to generate, either "ptx" or "ltoir".

Returns

(code, resty): The compiled code and inferred return type

Return type

tuple

If a device is available and compiled code for the compute capability of the current device is required (for example when building a JIT compilation workflow using Numba), the compile_for_current_device function can be used:

numba.cuda.compile_for_current_device(pyfunc, sig, debug=False, lineinfo=False, device=True, fastmath=False, opt=True, abi='c', abi_info=None, output='ptx'): Compile a Python function to PTX or LTO-IR for a given signature for the current device’s compute capabilility. This calls compile() with an appropriate cc value for the current device.

Most users should use the two functions described above; for backwards compatibility with existing use cases, the following functions are also provided:

numba.cuda.compile_ptx(pyfunc, sig, debug=False, lineinfo=False, device=False, fastmath=False, cc=None, opt=True, abi='numba', abi_info=None): Compile a Python function to PTX for a given signature. See compile(). The defaults for this function are to compile a kernel with the Numba ABI, rather than compile()’s default of compiling a device function with the C ABI.

numba.cuda.compile_ptx_for_current_device(pyfunc, sig, debug=False, lineinfo=False, device=False, fastmath=False, opt=True, abi='numba', abi_info=None): Compile a Python function to PTX for a given signature for the current device’s compute capabilility. See compile_ptx().

Using the C ABI

Numba internally uses its own ABI - this is as described in Device function ABI, without the extern "C" modifier. Calling Numba ABI device functions requires three issues to be addressed:

The name of the function will be mangled according to Numba’s ABI rules - these are based on the Itanium C++ ABI rules, but are extended beyond its specifications.
The Python return value is expected to be stored into a pointer value passed in the first argument.
The return value of the compiled function will contain a status code, instead of the return value of the function. For use of Numba-compiled functions outside of Numba, this can generally be ignored.

A simple way to address all these issues is to compile device functions with the C ABI instead. This results in the following:

The name of the device function in the compiled code can be controlled. By default it will match the name of the function in Python, so it is easy to determine. This is the function’s __name__, rather than __qualname__, because __qualname__ encodes additional scoping information that would make the function name hard to predict, and in a lot of cases, an illegal identifier in C.
The returned value of the Python code is placed in the return value of the compiled function.
Status codes are ignored / unreported, so they do not need to be handled.

If the name of the compiled function needs to be specified, it can be controlled by passing the name in the abi_info dict, under the key 'abi_name'.

Compilation with the C ABI is the default when using the compile() and compile_for_current_device() functions. The compile_ptx() and compile_ptx_for_current_device() functions default to the Numba ABI in order to maintain compatibility with existing use cases.

C and Numba ABI examples

The following function:

def add(x, y):
    return x + y

compiled for the Numba ABI using, for example:

ptx, resty = cuda.compile_ptx(add, int32(int32, int32), device=True)

results in PTX where the function prototype is:

.visible .func  (.param .b32 func_retval0) _ZN8__main__3addB2v1B94cw51cXTLSUwv1sCUt9Uw1VEw0NRRQPKzLTg4gaGKFsG2oMQGEYakJSQB1PQBk0Bynm21OiwU1a0UoLGhDpQE8oxrNQE_3dEii(
    .param .b64 _ZN8__main__3addB2v1B94cw51cXTLSUwv1sCUt9Uw1VEw0NRRQPKzLTg4gaGKFsG2oMQGEYakJSQB1PQBk0Bynm21OiwU1a0UoLGhDpQE8oxrNQE_3dEii_param_0,
    .param .b32 _ZN8__main__3addB2v1B94cw51cXTLSUwv1sCUt9Uw1VEw0NRRQPKzLTg4gaGKFsG2oMQGEYakJSQB1PQBk0Bynm21OiwU1a0UoLGhDpQE8oxrNQE_3dEii_param_1,
    .param .b32 _ZN8__main__3addB2v1B94cw51cXTLSUwv1sCUt9Uw1VEw0NRRQPKzLTg4gaGKFsG2oMQGEYakJSQB1PQBk0Bynm21OiwU1a0UoLGhDpQE8oxrNQE_3dEii_param_2
 )

Note that there are three parameters, for the pointer to the return value, x, and y. The name is mangled in a way that is hard to predict outside of Numba internals.

Compiling for the C ABI with:

ptx, resty = cuda.compile_ptx(add, int32(int32, int32), device=True, abi="c")

instead results in the following PTX prototype:

.visible .func  (.param .b32 func_retval0) add(
    .param .b32 add_param_0,
    .param .b32 add_param_1
)

The function name matches the Python source function name, and there are exactly two parameters, for x and y. The result of the function is directly placed in the return value:

add.s32      %r3, %r2, %r1;
st.param.b32         [func_retval0+0], %r3;

To distinguish one variant of the compiled add() function from another, the following example specifies its ABI name in the abi_info dict:

ptx, resty = cuda.compile_ptx(add, float32(float32, float32), device=True,
                              abi="c", abi_info={"abi_name": "add_f32"})

Resulting in the PTX prototype:

.visible .func  (.param .b32 func_retval0) add_f32(
    .param .b32 add_f32_param_0,
    .param .b32 add_f32_param_1
)

which will not clash with definitions by other names (e.g. the variant for int32 above).