Version 0.59.0 (31 January 2024)

This is a major Numba release. Numba now supports Python 3.12, please find a summary of all noteworthy items below.

Highlights

Python 3.12 Support

The standout feature of this release is the official support for Python 3.12 in Numba.

Please note that profiling support is temporarily disabled in this release (for Python 3.12) and several known issues have been identified during development. The Numba team is actively working on resolving them. Please refer to the respective issue pages (Numba #9289 and Numba #9291) for a list of ongoing issues and updates on progress.

(PR-#9246)

Move minimum supported Python version to 3.9.

Support for Python 3.8 has been removed, Numba’s minimum supported Python version is now Python 3.9.

(PR-#9310)

New Features

Add support for ufunc attributes and reduce

Support for ufunc.reduce and most ufunc attributes is added.

(PR-#9123)

Add a config variable to enable / disable the llvmlite memory manager

A config variable to force enable or disable the llvmlite memory manager is added.

(PR-#9341)

Improvements

Add TargetLibraryInfo pass to CPU LLVM pipeline.

The TargetLibraryInfo pass makes sure that the optimisations that take place during call simplification are appropriate for the target, without this the target is assumed to be Linux and code will be optimised to produce e.g. math symbols that do not exit on Windows. Historically this issue has been avoided through the use of Numba internal libraries carrying wrapped symbols, but doing so potentially detriments performance. As a result of this change Numba internal libraries are smaller and there is an increase in optimisation opportunity in code using exp2 and log2 functions.

(PR-#9336)

Numba deprecation warning classes are now subclasses of builtin ones

To help users manage and suppress deprecation warnings from Numba, the NumbaDeprecationWarning and NumbaPendingDeprecationWarning classes are now subclasses of the builtin DeprecationWarning and PendingDeprecationWarning respectively. Therefore, warning filters on DeprecationWarning and PendingDeprecationWarning will apply to Numba deprecation warnings.

(PR-#9347)

NumPy Support

Added support for np.indices() function.

Support is added for numpy.indices().

(PR-#9126)

Added support for np.polynomial.polynomial.Polynomial class.

Support is added for the Polynomial class from the package np.polynomial.polynomial.

(PR-#9140)

Added support for functions np.polynomial.polyutils.as_series(), as well as functions polydiv(), polyint(), polyval() from np.polynomial.polynomial.

Support is added for np.polynomial.polyutils.as_series(), np.polynomial.polynomial.polydiv(), np.polynomial.polynomial.polyint() (only the first 2 arguments), np.polynomial.polynomial.polyval() (only the first 2 arguments).

(PR-#9141)

Added support for np.unwrap() function.

Support is added for numpy.unwrap(). The axis argument is only supported when its value equals -1.

(PR-#9154)

Adds support for checking if dtypes are equal.

Support is added for checking if two dtype objects are equal, for example assert X.dtype == np.dtype(np.float64).

(PR-#9249)

CUDA API Changes

Added support for compiling device functions with a C ABI

Support for compiling device functions with a C ABI through the compile_ptx() API, for easier interoperability with CUDA C/C++ and other languages.

(PR-#9223)

Make grid() and gridsize() use 64-bit integers

cuda.grid() and cuda.gridsize() now use 64-bit integers, so they no longer overflow when the grid contains more than 2 ** 31 threads.

(PR-#9235)

Prevent kernels being dropped by implementing the used list

Kernels are no longer dropped when being compiled and linked using nvJitLink, because they are added to the @"llvm.used" list.

(PR-#9267)

Support for Windows CUDA 12.0 toolkit conda packages

The library paths used in CUDA toolkit 12.0 conda packages on Windows are added to the search paths used when detecting CUDA libraries.

(PR-#9279)

Performance Improvements and Changes

Improvement to IR copying speed

Improvements were made to the deepcopying of FunctionIR. In one case, the InlineInlineables pass is 3x faster.

(PR-#9245)

Bug Fixes

Dynamically Allocate Parfor Schedules

This PR fixes an issue where a parallel region is executed in a loop many times. The previous code used an alloca to allocate the parfor schedule on the stack but if there are many such parfors in a loop then the stack will overflow. The new code does a pair of allocation/deallocation calls into the Numba parallel runtime before and after the parallel region respectively. At the moment, these calls redirect to malloc/free although other mechanisms such as pooling are possible and may be implemented later. This PR also adds a warning in cases where a prange loop is not converted to a parfor. This can happen if there is exceptional control flow in the loop. These are related in that the original issue had a prange loop that wasn’t converted to a parfor and therefore all the parfors inside the body of the prange were running in parallel and adding to the stack each time.

(PR-#9048)

Support multiple outputs in a @guvectorize function

This PR fixes Numba #9058 where it is now possible to call a guvectorize with multiple outputs.

(PR-#9049)

Handling of None args fixed in PythonAPI.call.

Fixing segfault when args=None was passed to PythonAPI.call.

(PR-#9089)

Fix propagation of literal values in PHI nodes.

Fixed a bug in the literal propagation pass where a PHI node could be wrongly replaced by a constant.

(PR-#9144)

numpy.digitize implementation behaviour aligned with numpy

The implementation of numpy.digitize is updated to behave per numpy in a wider set of cases, including where the supplied bins are not in fact monotonic.

(PR-#9169)

numpy.searchsorted and numpy.sort behaviour updates

  • numpy.searchsorted implementation updated to produce identical outputs to numpy for a wider set of use cases, including where the provided array a is in fact not properly sorted.

  • numpy.searchsorted implementation bugfix for the case where side=’right’ and the provided array a contains NaN(s).

  • numpy.searchsorted implementation extended to support complex inputs.

  • numpy.sort (and array.sort) implementation extended to support sorting of complex data.

(PR-#9189)

Fix SSA to consider variables where use is not dominated by the definition

A SSA problem is fixed such that a conditionally defined variable will receive a phi node showing that there is a path where the variable is undefined. This affects extension code that relies on SSA behavior.

(PR-#9242)

Fixed RecursionError in prange

A problem with certain loop patterns using prange leading to RecursionError in the compiler is fixed. An example of such loop is shown below. The problem would cause the compiler to fall into an infinite recursive cycle trying to determine the definition of var1 and var2. The pattern involves definitions of variables within an if-else tree and not all branches are defining the variables.

for i in prange(N):
    for j in inner:
        if cond1:
            var1 = ...
        elif cond2:
            var1, var2 = ...

        elif cond3:
            pass

        if cond4:
            use(var1)
            use(var2)

(PR-#9244)

Support negative axis in ufunc.reduce

Fixed a bug in ufunc.reduce to correctly handle negative axis values.

(PR-#9296)

Fix issue with parfor reductions and Python 3.12.

The parfor reduction code has certain expectations on the order of statements that it discovers, these are based on the code that previous versions of Numba generated. With Python 3.12, one assignment that used to follow the reduction operator statement, such as a binop, is now moved to its own basic block. This change reorders the set of discovered reduction nodes so that this assignment is right after the reduction operator as it was in previous Numba versions. This only affects internal parfor reduction code and doesn’t actually change the Numba IR.

(PR-#9334)

Changes

Make test listing not invoke CPU compilation.

Numba’s test listing command python -m numba.runtests -l has historically triggered CPU target compilation due to the way in which certain test functions were declared within the test suite. It has now been made such that the CPU target compiler is not invoked on test listing and a test is added to ensure that it remains the case.

(PR-#9309)

Semantic differences due to Python 3.12 variable shadowing in comprehensions

Python 3.12 introduced a new bytecode LOAD_FAST_AND_CLEAR that is only used in comprehensions. It has dynamic semantics that Numba cannot model.

For example,

def foo():
    if False:
        x = 1
    [x for x in (1,)]
    return x  # This return uses undefined variable

The variable x is undefined at the return statement. Instead of raising an UnboundLocalError, Numba will raise a TypingError at compile time if an undefined variable is used.

However, Numba cannot always detect undefined variables.

For example,

def foo(a):
    [x for x in (0,)]
    if a:
        x = 3 + a
    x += 10
    return x

Calling foo(0) returns 10 instead of raising UnboundLocalError. This is because Numba does not track variable liveness at runtime. The return value is 0 + 10 since Numba zero-initializes undefined variables.

(PR-#9315)

Refactor and remove legacy APIs/testing internals.

A number of internally used functions have been removed to aid with general maintenance by reducing the number of ways in which it is possible to invoke compilation, specifically:

  • numba.core.compiler.compile_isolated is removed.

  • numba.tests.support.TestCase::run_nullary_func is removed.

  • numba.tests.support.CompilationCache is removed.

Additionally, the concept of “nested context” is removed from numba.core.registry.CPUTarget along with the implementation details. Maintainers of target extensions (those using the API in numba.core.target_extension to extend Numba support to custom/synthetic hardware) should note that the same can be deleted from target extension implementations of numba.core.descriptor.TargetDescriptor if it is present. i.e. the nested_context method and associated implementation details can just be removed from the custom target’s TargetDescriptor.

Further, a bug was discovered, during the refactoring, in the typing of record arrays. It materialised that two record types that only differed in their mutability could alias, this has now been fixed.

(PR-#9330)

Deprecations

Explicitly setting NUMBA_CAPTURED_ERRORS=old_style will raise deprecation warnings

As per deprecation schedule of old-style error-capturing, explicitly setting NUMBA_CAPTURED_ERRORS=old_style will raise deprecation warnings. This release is the last to use “old_style” as the default. Details are documented at https://numba.readthedocs.io/en/0.58.1/reference/deprecation.html#deprecation-of-old-style-numba-captured-errors

(PR-#9346)

Expired Deprecations

Object mode fall-back support has been removed.

As per the deprecation schedule for Numba 0.59.0, support for “object mode fall-back” is removed from all Numba jit-family decorators. Further, the default for the nopython key-word argument has been changed to True, this means that all Numba jit-family decorated functions will now compile in nopython mode by default.

(PR-#9352)

Removal of deprecated API @numba.generated_jit.

As per the deprecation schedule for 0.59.0, support for @numba.generated_jit has been removed. Use of @numba.extending.overload and the high-level extension API is recommended as a replacement.

(PR-#9353)

Pull-Requests: