The Threading Layers
This section is about the Numba threading layer, this is the library that is
used internally to perform the parallel execution that occurs through the use of
the parallel
targets for CPUs, namely:
The use of the
parallel=True
kwarg in@jit
and@njit
.The use of the
target='parallel'
kwarg in@vectorize
and@guvectorize
.
Note
If a code base does not use the threading
or multiprocessing
modules (or any other sort of parallelism) the defaults for the threading
layer that ship with Numba will work well, no further action is required!
Which threading layers are available?
There are three threading layers available and they are named as follows:
tbb
- A threading layer backed by Intel TBB.omp
- A threading layer backed by OpenMP.workqueue
-A simple built-in work-sharing task scheduler.
In practice, the only threading layer guaranteed to be present is workqueue
.
The omp
layer requires the presence of a suitable OpenMP runtime library.
The tbb
layer requires the presence of Intel’s TBB libraries, these can be
obtained via the conda command:
$ conda install tbb
If you installed Numba with pip
, TBB can be enabled by running:
$ pip install tbb
Note
The default manner in which Numba searches for and loads a threading layer is tolerant of missing libraries, incompatible runtimes etc.
Setting the threading layer
The threading layer is set via the environment variable
NUMBA_THREADING_LAYER
or through assignment to
numba.config.THREADING_LAYER
. If the programmatic approach to setting the
threading layer is used it must occur logically before any Numba based
compilation for a parallel target has occurred. There are two approaches to
choosing a threading layer, the first is by selecting a threading layer that is
safe under various forms of parallel execution, the second is through explicit
selection via the threading layer name (e.g. tbb
).
Setting the threading layer selection priority
By default the threading layers are searched in the order of 'tbb'
,
'omp'
, then 'workqueue'
. To change this search order whilst
maintaining the selection of a threading layer based on availability, the
environment variable NUMBA_THREADING_LAYER_PRIORITY
can be used.
Note that it can also be set via
numba.config.THREADING_LAYER_PRIORITY
.
Similar to numba.config.THREADING_LAYER
,
it must occur logically before any Numba based
compilation for a parallel target has occurred.
For example, to instruct Numba to choose omp
first if available,
then tbb
and so on, set the environment variable as
NUMBA_THREADING_LAYER_PRIORITY="omp tbb workqueue"
.
Or programmatically,
numba.config.THREADING_LAYER_PRIORITY = ["omp", "tbb", "workqueue"]
.
Selecting a threading layer for safe parallel execution
Parallel execution is fundamentally derived from core Python libraries in four forms (the first three also apply to code using parallel execution via other means!):
threads
from thethreading
module.spawn
ing processes from themultiprocessing
module viaspawn
(default on Windows, only available in Python 3.4+ on Unix)fork
ing processes from themultiprocessing
module viafork
(default on Unix).fork
ing processes from themultiprocessing
module through the use of aforkserver
(only available in Python 3 on Unix). Essentially a new process is spawned and then forks are made from this new process on request.
Any library in use with these forms of parallelism must exhibit safe behaviour under the given paradigm. As a result, the threading layer selection methods are designed to provide a way to choose a threading layer library that is safe for a given paradigm in an easy, cross platform and environment tolerant manner. The options that can be supplied to the setting mechanisms are as follows:
default
provides no specific safety guarantee and is the default.safe
is both fork and thread safe, this requires thetbb
package (Intel TBB libraries) to be installed.forksafe
provides a fork safe library.threadsafe
provides a thread safe library.
To discover the threading layer that was selected, the function
numba.threading_layer()
may be called after parallel execution. For example,
on a Linux machine with no TBB installed:
from numba import config, njit, threading_layer
import numpy as np
# set the threading layer before any parallel target compilation
config.THREADING_LAYER = 'threadsafe'
@njit(parallel=True)
def foo(a, b):
return a + b
x = np.arange(10.)
y = x.copy()
# this will force the compilation of the function, select a threading layer
# and then execute in parallel
foo(x, y)
# demonstrate the threading layer chosen
print("Threading layer chosen: %s" % threading_layer())
which produces:
Threading layer chosen: omp
and this makes sense as GNU OpenMP, as present on Linux, is thread safe.
Selecting a named threading layer
Advanced users may wish to select a specific threading layer for their use case, this is done by directly supplying the threading layer name to the setting mechanisms. The options and requirements are as follows:
Threading Layer Name |
Platform |
Requirements |
---|---|---|
|
All |
The |
|
Linux Windows OSX |
GNU OpenMP libraries (very likely this will already exist) MS OpenMP libraries (very likely this will already exist) Either the |
|
All |
None |
Should the threading layer not load correctly Numba will detect this and provide
a hint about how to resolve the problem. It should also be noted that the Numba
diagnostic command numba -s
has a section
__Threading Layer Information__
that reports on the availability of
threading layers in the current environment.
Extra notes
The threading layers have fairly complex interactions with CPython internals and system level libraries, some additional things to note:
The installation of Intel’s TBB libraries vastly widens the options available in the threading layer selection process.
On Linux, the
omp
threading layer is not fork safe due to the GNU OpenMP runtime library (libgomp
) not being fork safe. If a fork occurs in a program that is using theomp
threading layer, a detection mechanism is present that will try and gracefully terminate the forked child and print an error message toSTDERR
.On systems with the
fork(2)
system call available, if the TBB backed threading layer is in use and afork
call is made from a thread other than the thread that launched TBB (typically the main thread) then this results in undefined behaviour and a warning will be displayed onSTDERR
. Asspawn
is essentiallyfork
followed byexec
it is safe tospawn
from a non-main thread, but as this cannot be differentiated from just afork
call the warning message will still be displayed.On OSX, the
intel-openmp
package is required to enable the OpenMP based threading layer.
Setting the Number of Threads
The number of threads used by numba is based on the number of CPU cores
available (see numba.config.NUMBA_DEFAULT_NUM_THREADS
), but it can be
overridden with the NUMBA_NUM_THREADS
environment variable.
The total number of threads that numba launches is in the variable
numba.config.NUMBA_NUM_THREADS
.
For some use cases, it may be desirable to set the number of threads to a lower value, so that numba can be used with higher level parallelism.
The number of threads can be set dynamically at runtime using
numba.set_num_threads()
. Note that set_num_threads()
only allows
setting the number of threads to a smaller value than
NUMBA_NUM_THREADS
. Numba always launches
numba.config.NUMBA_NUM_THREADS
threads, but set_num_threads()
causes it to mask out unused threads so they aren’t used in computations.
The current number of threads used by numba can be accessed with
numba.get_num_threads()
. Both functions work inside of a jitted
function.
Example of Limiting the Number of Threads
In this example, suppose the machine we are running on has 8 cores (so
numba.config.NUMBA_NUM_THREADS
would be 8
). Suppose we want to run
some code with @njit(parallel=True)
, but we also want to run our code
concurrently in 4 different processes. With the default number of threads,
each Python process would run 8 threads, for a total in 4*8 = 32 threads,
which is oversubscription for our 8 cores. We should rather limit each process
to 2 threads, so that the total will be 4*2 = 8, which matches our number of
physical cores.
There are two ways to do this. One is to set the NUMBA_NUM_THREADS
environment variable to 2
.
$ NUMBA_NUM_THREADS=2 python ourcode.py
However, there are two downsides to this approach:
NUMBA_NUM_THREADS
must be set before Numba is imported, and ideally before Python is launched. As soon as Numba is imported the environment variable is read and that number of threads is locked in as the number of threads Numba launches.If we want to later increase the number of threads used by the process, we cannot.
NUMBA_NUM_THREADS
sets the maximum number of threads that are launched for a process. Callingset_num_threads()
with a value greater thannumba.config.NUMBA_NUM_THREADS
results in an error.
The advantage of this approach is that we can do it from outside of the process without changing the code.
Another approach is to use the numba.set_num_threads()
function in our code
from numba import njit, set_num_threads
@njit(parallel=True)
def func():
...
set_num_threads(2)
func()
If we call set_num_threads(2)
before executing our parallel code, it has
the same effect as calling the process with NUMBA_NUM_THREADS=2
, in that
the parallel code will only execute on 2 threads. However, we can later call
set_num_threads(8)
to increase the number of threads back to the default
size. And we do not have to worry about setting it before Numba gets imported.
It only needs to be called before the parallel function is run.
Getting a Thread ID
In some cases it may be beneficial to have access to a unique identifier for the
current thread that is executing a parallel region in Numba. For that purpose,
Numba provides the numba.get_thread_id()
function. This function is the
corollary of OpenMP’s function omp_get_thread_num
and returns an integer
between 0 (inclusive) and the number of configured threads as described above
(exclusive).
API Reference
- numba.config.NUMBA_NUM_THREADS
The total (maximum) number of threads launched by numba.
Defaults to
numba.config.NUMBA_DEFAULT_NUM_THREADS
, but can be overridden with theNUMBA_NUM_THREADS
environment variable.
- numba.config.NUMBA_DEFAULT_NUM_THREADS
The number of usable CPU cores on the system (as determined by
len(os.sched_getaffinity(0))
, if supported by the OS, ormultiprocessing.cpu_count()
if not). This is the default value fornumba.config.NUMBA_NUM_THREADS
unless theNUMBA_NUM_THREADS
environment variable is set.
- numba.set_num_threads(n)
Set the number of threads to use for parallel execution.
By default, all
numba.config.NUMBA_NUM_THREADS
threads are used.This functionality works by masking out threads that are not used. Therefore, the number of threads n must be less than or equal to
NUMBA_NUM_THREADS
, the total number of threads that are launched. See its documentation for more details.This function can be used inside of a jitted function.
- Parameters
- n: The number of threads. Must be between 1 and NUMBA_NUM_THREADS.
- numba.get_num_threads()
Get the number of threads used for parallel execution.
By default (if
set_num_threads()
is never called), allnumba.config.NUMBA_NUM_THREADS
threads are used.This number is less than or equal to the total number of threads that are launched,
numba.config.NUMBA_NUM_THREADS
.This function can be used inside of a jitted function.
- Returns
- The number of threads.
- numba.get_thread_id()
Returns a unique ID for each thread in the range 0 (inclusive) to
get_num_threads()
(exclusive).