4.1. Interpreter Structure¶
Importance for Jython¶
In Jython, support for multiple interpreters and threads is critical for certain uses.
An application may create more than one interpreter and multiple platform threads may use them. Jython has to provide clear semantics for multiple interpreters: initialisation, the scope of options, and the extent to which state is shared.
In a Java application server,
applications are separated by class loaders.
There may be multiple instantiations of
objects we normally think of as unique:
Class objects themselves and fields declared
static in them.
(A single class definition found by different class loaders
And yet, there is not complete isolation: loaders are hierarchical
and definitions found by the loaders towards the root
(basically, those from the Java system libraries)
create shared definitions.
Jython would not normally be one of these, but it could be.
Threads, platform threads owned by the application server itself, may visit the objects of multiple applications.
Jython has several times corrected its implementation of the life-cycle of interpreter structures, in the face of difficulties with threads and interpreters encountered in more complex cases. (These are not well supported by tests.) It is a source of bugs and confusion for users and maintainers alike.
Difficulty of following CPython¶
In CPython, support for multiple interpreters (sub-interpreters) has been present in the C API for a long time. Support for multiple threads is also well established, at least where the platform thread is created from CPython itself. The notion of a call-back, a platform thread entering the runtime from outside (a thread pool, say), is also supported, but here the C API is hedged with warnings that suggest this will not mix properly with the use of multiple interpreters. In spite of much helpful refactoring and re-writing between CPython 3.5 (when the present very slow project began) and CPython 3.8 (current at the time of this writing), that warning is still necessary.
It seems there may be weaknesses here in just the place Jython needs to find a strong foundation. In that case, the architecture of CPython cannot simply be followed uncritically in a Java implementation. Nor, given experience, can we take it for granted that Jython 2.7 has it perfectly right.
As a prelude to implementing statements in sub-project
(see Simple Statements and Assignment),
we attempted to get a proper structure for the interpreter:
the per-thread state,
the working state of the interpreter to which the threads belong,
and the relationship these things have to
platform-level threads (
Thread in Java).
Also, we began to consider execution frames,
and ‘live’ objects,
implemented in Python then loosed into the wider application.
This continues to evolve in
This complex subject deserves an architectural section of its own, and here it is.
4.1.2. Runtime, Thread and Interpreter (CPython)¶
CPython defines four structures, that Java would consider classes, in the vicinity of the core interpreter:
PyInterpreterStateholds state shared between threads, including the module list and built-in objects.
PyThreadStateholds per-thread state, most notably the linked list of frames that forms the Python stack.
PyFrameObjectis a Python object (type
frame) that provides the execution context for running a
PyCode, the local variables and value stack.
PyCodeObjectis an immutable Python object (type
code) holding compiled code (CPython byte code) and information that may be deduced statically from the source.
_PyRuntimeState is a statically allocated singleton
that consolidates global data from across the CPython executable,
and holds the interpreter (of which one is the “main” interpreter).
An examination of the code allows us to draw the class diagram below. Not all the relatonships are shown, but enough to support the discussion.
The choice of data structures in this part of CPython (and Jython) is shot through with the idea of multiple threads, and on exploring the CPython code, one quickly encounters, Python’s Infamous GIL (Global Interpreter Lock), a feature we don’t want to reproduce in Java.
In many places where a function is called,
CPython does not pass interpreter context as an argument.
The CPython runtime provides a method
that accesses the global
which is a pointer to the current
A thread takes the GIL by a call to
that installs its own thread state.
The loop in
ceval.c simulates concurrency
by creating an occasion for that swap between succesive byte codes.
This makes the operation of most byte codes atomic.
Very little hardware concurrency is possible.
The frame stack,
and all other state that should be used at a particular moment,
flow from identifying the correct thread state.
PyThreadState also points to the
PyInterpreterState that owns it,
and so we have the correct module state for the code executing on the stack.
_PyInterpreterState_Get() produces its answer
by first finding the current thread from the GIL.
When it comes to threads, the CPython C-API is not wholly consistent, see Initialization Bugs and Caveats. Recent work to expose sub-interpreters at the Python level in PEP 554 has clarified the definition and use of these structures. But it remains necessary to caution users against mixing sub-interpreters with the sort of manipulation of the GIL necessary to deal with Non-Python created threads.
The direction of development in this part of CPython is towards
one GIL per interpreter (in
so that interpreters are able to execute concurrently.
Interpreters do not share objects: each has their own memory pool
from which that interpreter’s objects are allocated.
As a result, threads in different interpreters
may safely increment and decrement reference counts
protected by that interpreter’s lock from errors of concurrent modification.
In fact, interpreters do not share objects by design, but it is not possible to prevent an application or extension from doing so. In the simplest case, an application may create two interpreters, i1 and i2, get a result r (a Python object) from i1, and call a function in i2 with that as argument. Two problems now arise:
When code in i2 calls a method on r, it will execute code written in the expectation that the import context in which i1’s methods were defined, will be will be present.
When i2 preforms operations on r that could lead to its destruction, or the destruction of members of it (suppose r contained a list and i2 were to delete an item), the wrong memory allocator might be called, or reference counts updated in a race with a thread in i1. The current thread only holds the lock in i2.
Notice that the first of these is a question on the meaning of the Python language, while the second is an issue for the implementation of CPython.
Sub-interpreters for Concurrency (Two Conjectures)¶
PEP 554 exposes the current C API (with its single GIL) for use from Python. It does not introduce a concurrency mechanism per se: that requires changes to the runtime. In the perception of many, however, the value of the PEP is in exposing for use an API that subsequently will support concurrency through sub-interpreters.
The proposal is to have one LIL (Local Interpreter Lock) per interpreter. It would serialise threads competing in a single interprter, except in the special cases where the LIL is explicitly released, just as now (e.g. during slow I/O). How might the runtime structures change to accommodate concurrent interpreters? It is possible to speculate as follows.
A platform thread must have a thread state in each interpreter where it handles objects (we think). Unless a platform thread is confined to one interpreter, there is a problem here: a platform thread in need of a reference to its current thread state, must find it in the LIL of the current interpreter. Previously the interpreter was found through the thread state, using the universal GIL. How does a platform thread first establish the current interpreter?
Another possibility is to map a given platform thread to the same thread state, whichever interpreter it appear in. One may then quickly find the correct thread state (as a thread-local variable perhaps). This changes how stacks and tracebacks work, but because the relationship to interpreter is many-to-many, it does not alter the fundamental problem of finding the right one. This is a question about Python, unrelated to the GIL.
A number of difficult cases may be devised, involving threads and interpreters, where it is not clear from current documentation or code how CPython would deal with the circumstances. We must make this answer well-defined in Jython, despite the inherent multiplicity of objects.
4.1.3. Use cases¶
We will catalogue several patterns in which interpreters and threads might be used. The idea is to test our architectural ideas in theory first, in a series of use cases. We may then prove the implementation by constructing test cases around them. The first are somewhat trivial, for completeness only.
Using Python Directly¶
An application obtains an interpreter and gives it work to do. It may be called to run a script (Python file) or fed commands like a REPL. Objects the application obtains as return values, or from the namespace against which commands execute, will generally by Python objects, with behaviours defined in Python code.
The Jython 2 main program is a particular case, and we’ll need that or something similar in an implementation of Python 3.
Ensure invocation is trivially easy.
Try to ensure well-known examples (Jython Book) still work.
Is automatic initialisation of the runtime a bad idea?
We may not want a global, static interpreter instance, hanging around indefinitely.
But the interpreter must exist as long as the objects it created.
We do not (we think) want
PyObjectto have its Jython 2 interface in Java.
Using Python under JSR-223¶
As previously, an application obtains an interpreter and gives it work to do. Possibilities are mostly as in Using Python Directly, except that the usage is defined by JSR-223.
The use of an interpreter via JSR-223 is not really different once the application begins making direct use of the objects it gets back.
Using Python Twice Directly¶
An application obtains two interpreters using the mechanisms in Using Python Directly, or by JSR-223. It takes an object defined in one interpreter and calls a method on it in the second. For variety, suppose the application shares the objects from the first interpreter by sharing a dictionary as the namespace of both.
A single thread is valid in two interpreters simultaneously.
A dictionary object is created before any interpreter. Does it have a current interpreter? (Some built-ins like
dictmay be guaranteed not to need import context.)
At the point
foois used in the second interpreter, the current interpreter must be
If the (platform) thread has a thread state in each interpreter, there will be two (disconnected) stacks.
Other considerations as in Using Python Directly.
Python behind the Library¶
A Java application uses a Java library. The implementor of that library chose to use Python. This is not visible in the API, but objects handled through their Java API get their behaviour in Python.
A second interpreter is also in use somehow, and is going to manipulate objects from the library. (For definiteness, assume the application uses this one directly.) The Python implementation of the objects from the library will not be apparent to the second Python interpreter.
more needed: use the thing from Python/Jython. Suppose thing has a method that takes an argument that was produced by a second interpreter?
A single thread is valid in two interpreters simultaneously.
The library is hiding the Python nature. An exception raised in
pyThingshould be caught in
thingand a library-specific exception raised.
Even a library-specific exception could embed the
PyExceptionas cause, dragging a Python traceback.
Concurrency between Interpreters¶
Not yet elaborated. Start a second thread in
accessing the same objects. Whose fault when it breaks?
The user application runs in a Java application server (like Apache Tomcat) in which user applications are not processes but segregated by class loader, and threads are re-used.
Not yet elaborated.
Thread local data and class values created in one application may still present for other applications.
Class values attached to persistent classes are not disposed of.
Approaches designed to ensure objects are not retained (e.g. use of weak references) may result in discarding state when it is still wanted.
4.1.4. Proposed Model for Jython [untested]¶
In this model, we propose a different arrangement of the critical data structures. In particular, we abandon the idea that a thread belongs to an interpreter. Although possibly controversial, this may solve problems latent in the CPython model, that make it unable to address some of the use cases.
Critical Structures Revisited¶
We have implemented this model in the
At the time of this writing,
we have not tested it with multiple threads and interpreters.
The notable differences from the CPython model are:
The relationship of
Thread(platform thread) to
ThreadStateis one-to-one and navigable both ways (for
Threads known to Python).
ThreadStateis not associated with a unique “owning”
ThreadStateis associated with multiple
Interpreters through the frames in its stack (if the stack is not empty).
The run-time system
Pyreferences a “main”
Interpreter does not own objects [untested]¶
The hypothesis is that we can implement Python,
let Java do the object lifecycle management,
and not need either to confine objects to one
or label them all with an owner.
It is an observation, rather than a hypothesis,
that Java manages the life-cycle of our objects:
we do not have to count references
and no memory allocator is therefore attached to an interpreter.
Interpreter is responsible only for “import context”:
the imported module list, import library, module path,
and certain short-cuts to built-ins (all to do with modules).
The Python-level API for interpreters (PEP 554) provides no means to share objects. However, in our use cases (for example Using Python Twice Directly), we found that sharing was difficult to avoid via the Java/C API, and we needed to be able to navigate from a Python object to the import context for which it was written. That would be satisfied if all objects referenced an owning interpreter.
Our hypothesis is that not all types of object require such a reference. We have some hypotheses about which types do require one.
frame references a particular interpreter [untested]¶
Any code that imports a module,
must import it to a particular interpreter,
in order that it should access the correct import mechanism
and list of already imported modules.
We create a Python
frame for each execution of code compiled from Python.
(Note that we don’t create a new
frame for Java/C functions.)
We therefore hypothesise that a
should hold a reference to one
It need not hold it directly,
if we can guarantee one of the attributes it aleady has,
can be guaranteed to hold it,
such as the globals or built-in dictionaries.
We do not need the interpreter reference to access an object in a
Code that already holds a reference resulting from import
only needs that reference
(typically a global variable of the same name as the module).
PyFrame is ephemeral,
so the next question has to be where the information comes from.
A frame may be the result of:
Module import, when executing the module body.
REPL, JSR-223 or explicit interpreter use.
Function or method execution.
This seems to mean that either a callable object involved in creating a frame should designate the correct interpreter, or that the interpreter is the one in the current frame.
Thread always has a
From within any platform thread (
according to our model,
we may navigate to the corresponding
We expect to implement this as a thread-local variable in the runtime.
(We’d say “global” here,
were it not for the possibility of creating instances of the “global” runtime,
under different class loaders.)
Contrary to the hypothesis,
there is no guarantee at all that an arbitrary platform thread
has been assigned a
ThreadState by the Python run-time system.
we mean that at the point we need it,
the run-time system will find or make a
This is a standard pattern with a
The hypothesis is that this is useful.
The top frame designates the current interpreter [untested]¶
This is also more of a definition, that we hypothesise is a useful (least surprising) definition to accept.
If the stack of a
ThreadState is not empty,
Interpreter designated by the top frame is “current”.
Actions in which the interpreter is not explicit,
should use that one if an interpreter is needed.
If the stack is empty, arguably no interpreter is “current”, or a default “main” interpreter could be considered current. This will be one that was invoked when creating the run-time system.
A callable designates its defining interpreter [untested]¶
Our hypothesis is that,
in order to preserve the import context prepared by an application programmer,
the interpreter current at the point of definition of a callable object
is the one that should provide context for running its code.
If the callable object does not create a
it may be excused the responsibility.
As examples, consider
PyFunction results from execution of a
This creates an object into which is bound
a reference to the globals of the defining module,
and if it is a nested definition, a closure referencing non-local variables.
This is true even within a class definition.
tp_call slot of the object created
will produce a
frame against which the compiled body (
This frame needs a reference to the defining interpreter
to give it the import context the programmer intended.
PyCFunctionObject in CPython)
is a lightweight object that does not carry globals and a closure,
and does not generally create a
PyFrame when executed.
It therefore does not need an interpreter to give it import context.
Of course, it could create a
exec is a case in point,
and for that example at least,
the current interpreter (in the top frame of the thread) is appropriate.
If somehow calling an equivalent of
exec on an empty stack,
the onus really should be on the caller to designate an interpreter.
In saying that a callable (with Python body) should designate an interpreter,
we have not insisted this be an attribute of every such object.
For those that bind globals from the defining module,
a satisfactory solution is for module global namespace,
__builtins__ guaranteed to be amongst those globals,
to designate the defining interpreter by a reserved name.
Note that a user-defined callable (defining
has thereby bound the context of that definition.
type is a callable.
PyModule designates its defining interpreter [dubious]¶
It may be convenient to implement the previous idea by ensuring that each module instance remembers the interpreter that loaded it. The current interpreter would then be within easy reach of function implementations in Java in that module.
Functions, methods and types have a
__module__ attribute already.
Unfortunately, this attribute is at best just a
and worse, writable with an arbitrary object.