Notes on sys.monitoring

Note

This documentation was written at the advent of Python 3.12. Future versions of Python may behave differently. It is however hoped that most of the concepts herein will remain relevant.

Python 3.12 introduced a new monitoring system under sys.monitoring. This system lets users monitor a selection of events that may be interesting for e.g. performance profiling or debugging purposes. Event monitoring is set “per tool”, so that multiple tools can be running at the same time. For each tool the events can be monitored globally per thread or locally per code object (or a mixture of both). For each tool-event combination a callback can be registered that will be called on the occurrence of the event. The callbacks are just regular functions and can do most of the things supported by Python, they also have the ability to return a special value to tell the monitoring system to disable triggering future events for the current code location.

What does this mean for Numba?

When the interpreter “encounters” a monitoring event (it actually issues them) it triggers any callbacks that are associated with that event across all tools that have registered monitoring for said event. In the case of Numba there are problems…

Numba has made it so that there’s no Python interpreter involved in the execution of a function, the function is compiled and its execution path exists only in machine code. To get to the machine code from the interpreter the Numba dispatcher is invoked, this is the last place in the stack where (in nopython mode) the Python interpreter is readily available. The dispatcher is also in some way part of the execution of the function, without the dispatcher the call to the machine code cannot easily happen from user space. As a result of this, the monitoring types and event types that Numba can support are somewhat limited as there’s such limited interpreter involvement in execution!

Looking at monitoring types in turn. Local monitoring is requested by setting monitoring on a code object. In practice this instructs the interpreter to augment the bytecode at runtime by switching certain opcodes for “instrumented” opcodes. These instrumented opcodes go via a special path in the interpreter loop whereby they will issue an “event” in association with a particular instruction at a particular offset. For example, a RETURN opcode might be replaced by an INSTRUMENTED_RETURN and a PY_RETURN event would by issued when the instrumented instruction is interpreted. This event and the offset at which it occurred being forwarded to the monitoring system. Unfortunately this presents an issue for Numba, there is no interpreter involved with execution and so events will not be emitted. It does seem like it would be possible to handle a few types of event, such as PY_START and PY_RETURN by analysing the code object at dispatch time. However, it’s possible for a user to de-instrument the code object and/or dynamically disable monitoring at a particular code location whilst executing, and as a result emulating the semantics of this would be prohibitively challenging and would likely require constant interaction with the interpreter. As a result, Numba does not support local event monitoring, the compiled function will still execute correctly if it has been set, it just has no effect on sys.monitoring.

Considering per-thread global monitoring, this manifests as the user setting some global state on the interpreter for a given thread. This state can be accessed via the sys.monitoring Python API, it’s also accessible via CPython internals. This kind of monitoring is a little more amenable to working with Numba as there’s no code object involved and state mutation during execution can only occur via object mode calls.

What does Numba do in practice?

As there’s no Python or C API to issue events (the concept is heavily linked to the interpreter itself), Numba has to look for tool-event combinations at appropriate locations in the dispatch sequence and then manually call the associated callbacks (essentially doing what the interpreter does when it issues an event). In the case of the Numba dispatcher, only a few events are relevant and only four are supported, namely

  • sys.monitoring.events.PY_START (Python function starting).

  • sys.monitoring.events.PY_RETURN (Python function returning).

  • sys.monitoring.events.RAISE (Python function raised an exception).

  • sys.monitoring.events.PY_UNWIND (Python function exiting during exception unwinding).

These events don’t really exist in the machine code, but would exist had the interpreter interpreted the equivalent bytecode. The dispatcher therefore checks for monitoring of PY_START just before control is transferred to the machine code and calls any associated callbacks. The same is done for PY_RETURN just after control is transferred back to the dispatcher from the machine code. This behaviour essentially emulates the interpreter executing bytecode and lets tools such as cProfile be able to “see” the Numba compiled function as part of the standard interpreted execution. In the case of an exception being raised in the machine code, the associated error state is handled just after control is transferred back to the dispatcher, at this point RAISE and PY_UNWIND event monitoring is checked and registered callbacks are invoked.

A note on offsets. The callback functions often take an “offset” argument which is the bytecode offset at which the event triggering the callback was encountered. In the case of PY_START this seems to be associated with the offset of the RESUME bytecode. In the case of PY_RETURN this is associated with the offset of one of the RETURN bytecodes, most generally this would only be known at runtime as there could be multiple return paths. As a result, Numba elects to just set all offsets to zero. It may eventually be possible to do some analysis and transfer the appropriate runtime information to the dispatcher from the machine code, however, at the present time the effort to do this vastly outweighs the gain.