7.9. Descriptors¶
Descriptors lie behind many kinds of attribute access in Python. They are central to the realisation of the Python Data Model. Behaviour we take for granted in Python depends upon this mechanism – the behaviour of classes defined in Python and those built-in.
This section covers the several types of built-in descriptor, based on a study of the CPython 3.8 source, and depicting in UML a proposed Java implementation.
Note
Re-visit 3.11 source code and correct details where different. Elements visible to Python should not have changed.
Note
The description here assumes we shall follow the CPython implementation, and reproduce in Java classes the several data structures that express the descriptor and slot concept.
While this is necessary for the elements visible as Python objects, supporting structures in the C implementation are complicated and depend on C language features to which Java has alternatives. Some of this complexity may stem from constraints of backwards compatibility in the C API that would not apply here. (We will use the Python data model as the acid test.)
To begin with, we draw our pictures from the CPython implementation, but this document is allowed to evolve in search of Java ways.
7.9.1. Introduction¶
The definitive introduction to descriptors in Python is the Descriptor HowTo Guide by Raymond Hettinger. It is fairly precise about the semantics (just what we need), but only scratches the surface of potential applications (not our interest here).
Upon a thorough investigation, one finds there are a lot of built-in descriptor types, each with their supporting relationships and structures. The descriptors are visible to code (if you know where to look), so a Python implementation must faithfully reproduce this family of types. Each descriptor and each supporting object in CPython has been crafted individually, and while some of the supporting data structures are shared, beyond the descriptor protocol itself, patterns are difficult to discern. Names are confusingly similar.
We shall cover each type of built-in descriptor separately, proposing implementation ideas. When it comes to implementing them in Java, we have some discretion about the supporting types that are not visible to code, to provide the expected behaviour another way. On the whole, we find we always need roughly the same object, but the Java version could work differently.
As the design here evolves away from its CPython starting point,
we shall be looking for a design that may be efficiently implemented
using MethodHandle
s.
We expect that this will support the use, ultimately,
of generated CallSite
s.
Notation for Sequence Diagrams¶
In the UML sequence diagrams that follow, calls (via the type slots) to special methods on Python objects are shown in a simplified form that, for brevity, makes the object itself the target.
A call to __neg__
on a Python int
may be shown simply as:
The real sequence of events is more like:
Even this is a simplification as it is difficult to represent the full detail at all.
A call to __call__
itself deserves a note too.
The signature of __call__
is, of course, (O, TUPLE, DICT)O
.
We shall generally elide the argument processing implicit in a Python call,
and show this as __call__(x, y)
directed to the target
when we mean type(obj).__call__(obj, *(x, y), {})
.
For example:
The real sequence of events in a classic call is more like:
This also is a simplification when it comes to certain steps.
7.9.2. Descriptors and __getattribute__
¶
Descriptors are mostly found and invoked from within
the slot function __getattribute__
.
This is called implicitly whenever attribute access (o.name
) is needed.
It may be invoked through the built-in function getattr
,
and in CPython both lead to PyObject_GetAttr
in object.c
.
The __getattribute__
special function may be redefined by a type,
but there are just two significant versions in a Python implementation:
the one defined for
object
(PyBaseObject
), that applies when looking up attributes on almost any instance object, andthe one defined for
type
(PyType
), that applies when looking up attributes on a type object itself.
The distinction may be illustrated for the str
type:
>>> str.replace # Access through type.__getattribute__
<method 'replace' of 'str' objects>
>>> type(str.replace)
<class 'method_descriptor'>
>>> s = 'hello'
>>> s.replace # Access through inherited object.__getattribute__
<built-in method replace of str object at 0x0000017B0EA82D30>
>>> type(s.replace)
<class 'builtin_function_or_method'>
>>> str.replace.__get__(s)
<built-in method replace of str object at 0x0000017B0EA82D30>
Notice that in the last step,
we obtain the same value from str.replace.__get__(s)
,
a descriptor access on the type followed by binding to s
,
as previously by attribute access s.replace
on the instance.
Classes may redefine __getattribute__
for their own instances,
to take full control of attribute access
(see Customizing attribute access),
but this is not common outside major framework libraries.
A good introduction to this part of attribute access is given by Brett Cannon in Unravelling Attribute Access. Brett only discusses attribute access on instances.
7.9.3. Methods in Python (PyFunction
)¶
A Python function that is defined in the body of a class definition,
becomes a function
object in the name space
that is passed into type creation.
In our implementation in Java,
this function is an instance of PyFunction
.
(In CPython,
the object is PyFunctionObject
defined in funcobject.h
.)
During type creation,
this PyFunction
is transferred to the dictionary of the PyType
.
A PyFunction
is a descriptor because it defines __get__
.
During attribute look-up, __getattribute__
recognises
the dictionary entry on the type as a descriptor,
and calls __get__
.
The object returned from PyFunction.__get__
is a PyMethod
,
an object that binds the original PyFunction
to a self
object.
(In CPython,
the object is a PyMethodObject
and is defined in classobject.h
.)
A PyMethod
is also a descriptor because it defines __get__
,
but PyMethod.__get__
ignores its arguments and returns this
,
the descriptor itself.
The binding behaviour (__get__
) of a function
is worth illustrating with sequence diagrams.
Suppose we have defined (pointlessly):
class C(str):
def foo(self, x, y):
print(f"foo called on '{self}' x+y = {(r:=x+y)}")
return r
c = C('hello')
>>> c.foo(2, 3)
foo called on 'hello' x+y = 5
5
We shall examine what happens when we call foo
.
Calling a Python Method on an Object¶
In the simple call c.foo(2, 3)
,
the first step is the attribute access c.foo
.
Under the circumstances depicted,
in which the target object is not a type,
this is handled by the __getattribute__
slot in the C
type,
that is a MethodHandle
to PyBaseObject.__getattribute__
.
We show this as a direct call from a notional program prog
,
when in reality the interpreter and the abstract object API are engaged.
As noted in Notation for Sequence Diagrams,
for simplicity,
we show slot function calls as if directed to the affected object itself.
PyBaseObject.__getattribute__
looks in the dictionary
of the type of the target object (the type C
).
The attribute access becomes a call to PyMethodDescr.__get__
,
that is the equivalent of C.__dict__['foo'].__get__(c, C)
.
This has to return something that may be called with the given arguments.
That “something” is here a PyMethod
,
in which the self
field is assigned the object c
.
(In CPython, it is a PyMethodObject
,
in which the __self__
attribute is set to the target object,
whereas when representing a function,
__self__
would be None
or the module.)
We can see that calling the PyMethod
leads effectively to C.foo(c, 2, 3)
.
This, by the way, should also work if used directly in Python.
Calling a Python Method through a Type¶
When a method call is made explicitly through the type,
for example C.foo(c, 2, 3)
,
most of the same code is involved as in the previous example.
The exception is that
the __getattribute__
slot in the C
object (a type),
is a MethodHandle
to PyType.__getattribute__
.
Its behaviour differs from that of PyBaseObject.__getattribute__
.
In these circumstances, PyType.__getattribute__
looks in the dictionary of the target object,
not that of the target’s type.
But the target object is the type C
.
This means, of course, that the same dictionary is consulted
as for a call on the instance object,
so the same descriptor is found.
The difference is in the call to __get__
that follows immediately.
PyType.__getattribute__
goes on to make a different call to __get__
,
to turn the access into C.__dict__['foo'].__get__(None, C)
.
Thus PyFunction.__get__
knows something different is required
from the call we saw previously.
Notice that the object returned to prog
from the attribute access
is the descriptor (the function f
) itself.
It is a behaviour of ((PyFunction) f).__get__(obj, type)
that when obj==null
it simply returns f
.
The call that follows is directly on this descriptor,
which is the reason for this behaviour.
7.9.4. The @classmethod Decorator (PyClassMethod
)¶
A PyClassMethod
is a descriptor because it defines __get__
.
An instance may be created to wrap any object.
Usually it is applied as a decorator to a function defined in Python.
The wrapped object is referenced by the field callable
,
which is exposed as the __func__
member.
The constructor of a Python classmethod
(the __init__
in fact)
implements the decorator @classmethod
.
In that context, it is the constructed PyClassMethod
that is inserted in the dictionary of the type under the function’s name.
A PyClassMethod
is not itself callable,
rather PyClassMethod.__get__(obj, type)
returns a PyMethod
that binds type
as the first argument of the callable,
or if type
is not given, then binds the type of obj
.
This is likely to be meaningful only if the object is callable.
If the object to which the decorator was applied is not in fact callable,
the error is raised from the invocation of that PyMethod
,
and not before.
The descriptor of a class method in a built-in class
does not involve PyClassMethod
,
but a special descriptor for built-in types.
(See Built-in Class Methods (PyClassMethodDescr).)
7.9.5. The @staticmethod Decorator (PyStaticMethod
)¶
A PyStaticMethod
is a descriptor because it defines __get__
.
As with PyClassMethod
(see The @classmethod Decorator (PyClassMethod)),
an instance may be created to wrap any object.
Often it is applied as a decorator to a function defined in Python,
but it is also applied automatically to methods of built-in classes
(defined in Java)
when they are identified as static.
(See Built-in Static Methods (PyStaticMethod).)
The wrapped object appears as the __func__
member.
The constructor of a Python staticmethod
(the __init__
in fact)
is the decorator @staticmethod
seen in Python class definitions.
In that context, the constructed PyStaticMethod
is the object in the dictionary of the type under the function’s name,
and the callable
(exposed as __func__
) is the function decorated.
A PyStaticMethod
is not itself callable,
rather PyStaticMethod.__get__
returns the associated callable,
ignoring its arguments.
(There is no real binding to do.)
If the object to which staticmethod
was applied is not in fact callable,
the error is raised from the attempted call and not before.
7.9.6. Descriptors for Built-in Types¶
Our first description of attribute access
involved the descriptor behaviour of a familiar object (a function
).
Many descriptors are built-in types, specifically created for the purpose.
In CPython, these are mostly defined in descrobject.h
and implemented in descrobject.c
.
They share much of their mechanism,
although they are not sub-classes one of another in Python.
This sub-section provides a structural overview of those descriptors translated to Java, before we launch into the detail of each. In proposing a Java design, we are constrained to match the visible behaviour of the descriptors in CPython, less so the supporting classes.
We will also follow CPython implementation details, at least provisionally. In Java, this gives rise to a fair few related classes.
Our names for these are not quite the same as CPython’s,
having elided the suffix Object
from the class names,
prefix Py
where it is not a Python object,
and the prefixes like d_
and ml_
from member names.
Fields that in C are a function pointer
may become MethodHandle
in Java,
although lambda functions and sub-classing may be chosen;
those that are a kind of “offset” could become VarHandle
;
those that are int
selectors may become enum
or EnumSet
.
7.9.7. Members (PyMemberDescr
)¶
During type creation from a Java class definition,
a PyMemberDescr
that appears in the dictionary of the PyType
is the result of an @Member
annotation applied to a field.
The PyMemberDescr
descriptor is based on a VarHandle
designating the field.
The __get__
, __set__
and __delete__
of a PyMemberDescr
must get, set or delete the field in instances of the defining type
through this VarHandle
.
Setting will be disallowed if the member is annotated as read-only,
and if setting is allowed, deletion may still be impossible for the type,
e.g. for an int
.
When disallowed, both raise an AttributeError
.
In CPython,
a PyMemberDef
has to be created for each member to be exposed,
specifying the type and offset of the member in an instance,
and placed in a short, static table referenced from the type object.
In Java, we need no intermediary:
we may make the descriptor directly,
using information available by reflection,
and from the identifying @Member
annotation,
optional properties or further annotations:
class ObjectWithMembers implements PyObject {
// ...
@Member
@DocString("The i property")
int i;
@Member("text")
String t;
@Member(readonly = true)
int i2;
@Member(readonly = true, value = "text2")
String t2;
@Member
PyObject obj;
@Member
PyUnicode strhex;
// ...
}
We express through Java sub-classes,
the specific implementation of get()
, set()
and delete()
,
appropriate to the implementation type.
These do not result in distinct Python types.
This is possible because only a fixed repertoire of implementation types
is supported.
(In CPython, the types are defined as constants in structmember.h
,
and the set/get functions contain a big case statement.
The API is used exclusively by member descriptors.)
Type, Deletion, None
and null
¶
Reference values in Java may be null
.
When we implement an attribute by a Java reference type,
we may use this possibility to augment its natural range with
either None
or “not defined”, but not both.
This possibility is not available to an attribute implemented as a primitive,
since primitive values have no equivalent of null
.
The only writeable reference amongst CPython member types is object
(signified by T_OBJECT
in the PyMemberDef.type
field).
Strings (STRING_T
) are only supported for reading,
although the possibility of null
is still accommodated in the code.
The historic norm is for get
to interpret null
internally
as None
externally
but for set
to accept None
and store it in the attribute.
The asymmetry is only a problem to the object containing the field,
which may (confusingly) distinguish null
and None
internally
that appear the same to Python.
In CPython it is acceptable to supply a null
value
to set
an attribute,
but in our Java implementation we shall spell that delete
.
It is not an error to get
an object
member
that has been deleted but remains visible as None
,
or to delete it repeatedly.
An addition to the set of CPython member types (T_OBJECT_EX
),
defines a variant for the object
member
in which a null
value signifies a deleted (or undefined) attribute.
After deleting this type of attribute,
get
and delete
raise AttributeError
,
but a set
(non-null value, even None
) re-creates it.
Descriptor type |
Current value |
|
|
|
---|---|---|---|---|
|
|
|
type error |
type error |
|
|
|
readonly |
readonly |
|
|
|
readonly |
readonly |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
attribute error |
|
attribute error |
The normal behaviour for a member that is a PyObject
will be equivalent to CPython T_OBJECT
.
We will provide T_OBJECT_EX
behaviour for any supported reference type,
through the flag DataDescriptor.OPTIONAL
and annotation property Member.optional
.
When optional == true
, the attribute may be deleted:
a subsequent attempt to get
it (or delete
it again) is an error.
When optional == false
, delete
is still a valid operation,
but sets it to null
, which appears through get
as None
.
We must also address the question of whether a field may have
a sub-type T
of PyObject
.
In that case, it would not be possible to set
a value contrary to
the declared Java type of the field.
Actions to get
and delete
such a field no new problem,
except that it would be odd for it to show as None
(when deleted)
if None
could not be assigned.
When it comes to assigning None
to a member,
we choose to have consistency between external and internal states.
a subsequent attempt to get
it (or delete
it again) is an error.
When optional == false
,
assigning None
is the same as deletion
(results in null
internally and None
externally).
When optional == true
,
assigning None
is valid if it is a legitimate value for the member.
Field type |
Optional |
Current value |
|
|
|
---|---|---|---|---|---|
|
|
|
|
type error |
type error |
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
type error |
|
|
attribute error |
type error |
attribute error |
||
|
|
|
|
|
|
|
attribute error |
|
attribute error |
||
|
|
|
|
type error |
|
|
attribute error |
type error |
attribute error |
This is consistent with CPython behaviour,
whilst allowing for more assignable reference types (if we want them).
The type errors arise because a String
(or T
a proper sub-type of PyObject
)
cannot be assigned the value None
.
Note
Strongly-typed members present a difficulty to set
operations.
Given a Python argument,
the conversion may be obvious (to a human),
but the PyMemberDescr
will not know it
unless specifically sub-classed in advance.
We cannot expect to do this for more than a few types.
The situation is even more fraught where we accept multiple implementations of a Python type. Read-only members are fine, however.
PyGetSetDescr
(next section) gives us the means we need
to master writable attributes strongly typed in Java.
7.9.8. Attributes (PyGetSetDescr
)¶
During type creation,
a PyGetSetDescr
that defines an attribute
in the dictionary of the PyType
,
is created from annotations that identify
the get, set and delete methods associated with the attribute by name.
The statically-allocated PyGetSetDef
in CPython,
that ends up as part of the PyGetSetDescr
,
has a counterpart GetSetSpec
in the Java implementation.
However, that exists only while the TypeExposer
finds the annotated methods
for each attribute in the implementation of a type.
The PyGetSetDescr
finally holds all the necessary information directly.
The java.lang.reflect.Method
objects in our GetSetSpec
become MethodHandle
s at that point
and the ephemeral GetSetSpec
is discarded.
Unlike those of a PyMemberDescr
,
the get
, set
and delete
operations of a PyGetSetDescr
are provided explicitly by the class containing the attribute.
These methods must at least convert between
the internal representation of the attribute
and one acceptable as a Python object.
However, the scheme offers an unlimited range of possibilities
for computing or transforming the stored value to expose it,
while PyMemberDescr
is limited to actual fields
with a type among those predefined.
The choice to use MethodHandle
here opens the possibility,
in a later implementation,
of making the particular get
, set
or delete
method
the target of an attribute-access CallSite
,
guarded on the type of the object in which the attribute named is sought.
The __get__
, __set__
and __delete__
of a PyGetSetDescr
,
get, set and delete the attribute in instances of the defining type,
by invoking the corresponding handle.
The implementation is free
to make any interpretation it needs of those actions.
The attribute may be made read-only by simply not specifying a setter,
and indelible by not specifying a deleter.
When not defined,
any of these handles defaults to a method that throws Slot.EmptyException
,
which the surface methods __get__
, __set__
and __delete__
catch,
and convert to a AttributeError
or a TypeError
with an informative message.
(In the CPython code base,
setting or deleting an attribute that is read-only,
produces an AttributeError
.
Deleting an attribute that is writable,
but where deletion is not allowed,
is hand-crafted per attribute,
detecting the value NULL
in the setter method.
Objects mostly produce TypeError
, but some AttributeError
,
and in pyexpat.c
it is RuntimeError
.
This unwelcome variety remains available to us by defining a deleter.)
CPython adds a void* closure
member to its PyGetSetDef
,
and provides it as an extra parameter when calling
the get
and set
methods of the PyGetSetDef
.
There are no uses of this in the CPython code base,
and we do not replicate it.
The closure must be information available when the PyGetSetDef
is created,
that is, statically and independent of the object instance.
If an operation needs additional information,
preserving context from that point,
it could be bound into the defining method.
7.9.9. Built-in Methods (PyMethodDescr
)¶
During type creation,
a PyMethodDescr
that appears in the dictionary of the PyType
,
is created from a MethodDef
specified by the class.
A MethodDef
(compare PyMethodDef
in CPython),
represents a method defined in Java,
that is to be exposed as the method of a Python object
.
In the following model,
we provisionally reproduce the CPython approach,
in which a MethodDef
has several characteristics,
expressed through a set of flags.
It seems likely that an approach using sub-classes of PyMethodDescr
will supersede this before the first implementation.
In CPython,
PyMethodDef
s occur in short, statically-defined tables,
each entry defining a method (class, static or instance) or a function.
We have already used MethodDef
to represent Java functions exposed from modules
(see Java MethodDef and PyJavaFunction).
There,
we added annotations to the functions to be exposed,
that were processed to create a MethodDef[]
table.
Each MethodDef
led to a PyJavaFunction
in the dictionary of each instance of the module.
This is not quite what we need for the methods of a class.
Here we can use the annotation idea again,
but each MethodDef
should lead to a PyMethodDescr
in the dictionary of the type.
As with every other attribute access mediated by a descriptor,
a reference to the method via an object invokes __get__
.
Rather than getting a value or object reference from the target object,
as in a data descriptor,
PyMethodDescr.__get__
returns a callable object
binding the method definition and the target object.
Thus in 'hello'.replace('ell', 'ipp')
,
the call to __get__
returns the object 'hello'.replace
,
and then that is called with the arguments ('ell', 'ipp')
.
We represent this binding object by an instance of the PyJavaFunction
class.
(In CPython the object created is another use of PyCFunctionObject
.)
Observe that PyJavaFunction
is frequently an ephemeral object,
existing only until the call can be made.
A PyMethodDescr
is itself a callable object,
invoking the method it describes as if it were the defining function.
In str.replace('hello', 'ell', 'ipp')
,
__get__
returns the object str.replace
,
which is the descriptor itself,
and then that is called with the arguments ('hello', 'ell', 'ipp')
.
No binding PyJavaFunction
is necessary in this case.
Optimising Method Calls¶
CPython has dedicated support in the compiler and byte code
(the opcodes LOAD_METHOD
and CALL_METHOD
)
that effectively converts s.replace('ell', 'ipp')
into str.replace(s, 'ell', 'ipp')
,
avoiding creation of the bound object.
The possibility can be identified at compile time,
but the translation can only be done at run-time,
when the type of s
is known and str.replace
proves to be a method.
The CPython PyMethodDescr
supports the vector call protocol,
where tp_vectorcall_offset
in the type
references the vectorcall
field in the instance.
The PyMethodDescr
in CPython sets this field to a C function pointer
that refers to one of several fixed wrapper functions,
the choice being made according to the characteristics in the PyMethodDef
.
The wrapper function always has the vector call signature,
and internally supplies these arguments
(in the right number and arrangement, with necessary checks made)
to a call to meth
in the attached PyMethodDef
.
The vector call protocol is not especially well suited
to a Java implementation.
IFor Java, we need an optimisation similar to LOAD_METHOD
-CALL_METHOD
,
but applied to Java call sites.
In an implementation based on interpreting CPython byte code,
those opcodes must be supported,
but perhaps only in their fall back forms,
or in a form preferentially optimised for Java call sites.
Calling a Built-in Method on an Object¶
The binding behaviour (__get__
) of a method descriptor
is more complex than that of a data descriptor.
It is worth illustrating it with sequence diagrams.
We will ignore the LOAD_METHOD
-CALL_METHOD
optimisation for now.
As noted in Notation for Sequence Diagrams,
for simplicity,
we show slot function calls as if directed to the affected object itself.
In the simple call 'hello'.replace('ell', 'ipp')
,
the first step is the attribute access 'hello'.replace
.
Under the circumstances depicted,
in which the target object is not a type,
this is handled by the __getattribute__
slot in the str
type,
that is a MethodHandle
to PyBaseObject.__getattribute__
.
PyBaseObject.__getattribute__
looks in the dictionary
of the type of the target object (the type str
).
The attribute access becomes a call to PyMethodDescr.__get__
,
that is the equivalent of str.__dict__['replace'].__get__('hello', str)
.
This has to return something that may be called with the given arguments.
That “something” is here a PyJavaFunction
,
in which the self
field is assigned the string 'hello'
.
(In CPython, it is a PyCFunctionObject
.)
We can see that calling the PyJavaFunction
leads effectively to str.replace('hello', 'ell', 'ipp')
.
This, by the way, should also work if used directly in Python.
We look at that next,
as it provides insight into
why PyMethodDescr
must be callable.
Calling a Built-in Method through a Type¶
When a method call is made explicitly through the type,
for example str.replace('hello', 'ell', 'ipp')
,
most of the same code is involved as in the previous example.
The exception is that
the __getattribute__
slot in the str
object (a type),
is a MethodHandle
to PyType.__getattribute__
.
The situation is like the one we examined in
Calling a Python Method through a Type.
PyType.__getattribute__
looks in the dictionary of the target object itself (the type str
),
to turn the access into str.__dict__['replace'].__get__(None, str)
.
This is a different call to PyMethodDescr.__get__
from previously,
and returns to prog
the descriptor itself.
It is a behaviour of ((PyMethodDescr) descr).__get__(obj, type)
that when obj==null
it simply returns descr.
The call that follows is directly on this descriptor,
which is why PyMethodDescr
must be callable.
7.9.10. Special Methods (PyWrapperDescr
)¶
The PyWrapperDescr
is one visible part of the mechanism that
allows a slot function defined in Java to be called from Python
using its special method name.
For example,
any type whose implementation class T
defines:
PyObject __neg__() { ... }
will define the special method __neg__
,
precisely because its dictionary contains an entry for "__neg__"
.
The entry is an instance of PyWrapperDescr
that knows how to call the Java method when executing Python.
Special Method Creation in CPython¶
In CPython,
a type defined in C, writes its slot function implementations as pointers,
into an instance of PyTypeObject
during its initialisation.
Slots the type does not define,
but may inherit,
are initially null in the type object
During type creation,
CPython examines the slots in the type,
guided by the (single, global) table of slotdef
entries.
The slotdefs[]
table in typeobject.c
is an array of these,
built by a set of clever macros.
The entries in slotdefs[]
are ordered by ascending slot offset
in the (heap) type object.
We note, but do not resolve in the present discussion, the issues that:
Some offsets are repeated in successive entries, representing slots to which more than one special method contributes, e.g.
__add__
and_radd__
contribute tonb_add
.Some special method names occur more than once, representing special methods that define more than one slot, e.g.
__add__
contributes tonb_add
and may definesq_concat
.
After establishing the type hierarchy, it will be apparent which slots are eligible to be inherited. A slot defined by the new type object (a non-null type slot) results in a new descriptor for that special method. One that is not defined initially may be filled from an inherited descriptor.
Slots are filled not only from inherited PyWrapperDescr
s,
but from any inherited descriptor of the right name.
A PyWrapperDescr
provides the slot value directly.
For any other descriptor type,
the slotdef
of matching name provides a pointer
compatible with that slot
(using the C pointer in field function
),
that will call the descriptor.
The functions that do so have names matching slot_*
and are found in typeobject.c
.
This makes it possible to define a special method in Python
that becomes the meaning of an operator through filling that type slot,
e.g. where __neg__
defined in Python
provides the meaning of unary -
through filling nb_neg
.
Special Method Creation in Java¶
In the Java implementation,
a PyWrapperDescr
that appears in the dictionary of the PyType
,
is created by a member of the Slot
object (and enum
),
and information particular to the implementing class.
This enum
replaces the slotdef
table in CPython.
We differ from CPython slightly in that
a PyWrapperDescr
is created because a Java method exists
with the right name and signature,
and then the slot in the PyType
is filled as a result.
(CPython does it the other way round:
fills the type slot, which causes creation of a wrapper.)
The PyWrapperDescr
contains a reference to the target Python type,
the method name, and a handle to the implementation compatible with the slot.
A major difference from CPython is that members Slot
,
and special method names defined in the Python data model,
are in one-to-one correspondence.
The member Slot.op_<name>
contains the information necessary
to create a descriptor, type slot and wrapper for __<name>__
,
exclusively and completely.
Note
We resolved the competition for slots and names that CPython has,
in favour of a one-to-one alignment of slots and special method names,
in the evo4
experiments.
Not every slot has been implemented,
but enough of the hard cases to provide confidence.
Jython 2 also does roughly this without violating the data model,
naming its implementation methods after the special functions.
CPython Structures¶
Although we diverge a little from CPython, as a reference we look at how calling a slot wrapper works in CPython.
In CPython, each slotdef
(also known as struct wrapperbase
)
identifies its slot in the type object by an offset
field,
and the special function by its name (like "__sub__"
).
It contains two pointers to wrapper functions
helpfully named wrapper
and function
,
and some other fields that need not concern us just now.
The wrapper functions are necessarily independent of the target object type.
function
is for synthesising a type slot
from a special method defined in Python,
while wrapper
does the job at hand:
calling a special method defined in C from Python.
We can think of slotdef
as an association class
between notional classes TypeSlot
and SpecialFunction
.
Neither of the associated classes exists in CPython as a struct
:
a TypeSlot
is known only by its offset,
while a SpecialFunction
is simply present as a name.
The association of special functions and slots is many-to-many,
because special functions may compete for slots
and contribute to more than one.
(This is the complication we believe we can avoid.)
When the PyWrapperDescrObject d
(for __sub__
, say) is called,
or a binding wrapperobject
is called
that it produced by __get__
,
the d_wrapped
field of d
provides the required implementation.
Although the descriptor holds this function pointer,
d->d_wrapped
may not be used directly to satisfy the call.
Only the slotdef
understands how to call it properly:
the number, type and arrangement of the arguments.
This “understanding” is expressed in code.
The wrapper
field in the related slotdef
points to a function with a standard signature (self, args, wrapped)
.
A PyWrapperDescrObject
is able to call that
without specific knowledge of the slot type,
because its signature is always the same.
In the signature, wrapped
is the particular slot implementation
the PyWrapperDescrObject
holds.
(In fact, there are two standard signatures: the other with keywords,
and a flag in the slotdef
decides how to cast wrapper
.)
All these wrapper functions are found in typeobject.c
,
and have names matching "wrap_*"
.
Each such wrapper has a body that knows how to arrange
these arguments in a pattern that suits the wrapped slot.
In the case of __sub__
,
the slotdef
represents an unreflected binary operation,
so wrapper
was set by static initialisation to a generic method
called wrap_binaryfunc_l
.
This function has a body equivalent to return (*wrapped)(self, args[0])
.
In the companion function wrap_binaryfunc_r
,
for wrapping reflected operations like __rsub__
,
it is return (*wrapped)(args[0], self)
,
that is, with the arguments reversed.
All this involves heavy use of pointers and offsets into structures, and casts. Surely Java can offer us a less brutal way?
A Java Equivalent¶
The Slot
is somewhat the counterpart of MethodDef
,
as described in Built-in Methods (PyMethodDescr).
There,
we proposed that an annotation would identify each function to be exposed.
The annotations were processed to create a table of definitions,
then descriptors from that table.
Here, the set of possible method names and signatures
is fixed in advance by the Python data model,
which means Slot
can be just an enum
.
The generation of descriptors for any given class
may be driven by enumerating Slot
and creating a descriptor for each method with the corresponding name.
The name of the descriptor is the name of
the special function specified by the Slot
.
The Slot
holds all the information needed
to create the appropriate PyWrapperDescr
except for the handle to the implementation.
That will be supplied during type creation
and held by the PyWrapperDescr
as MethodHandle wrapped
.
PyWrapperDescr
is directly callable (as in int.__sub__(42, 10)
)
and also supports a __get__
that returns a PyMethodWrapper
binding an instance (as in (42).__sub__(10)
).
The method handle PyWrapperDescr.wrapped
conforms to the signature provided in the Slot
of the same name.
However, PyWrapperDescr.__call__
and PyMethodWrapper.__call__
require the classic call arguments (args, kwargs)
.
(CPython’s PyMethodWrapper
does not support the vector call.
We could, but invocation does not seem likely to be critical to performance.)
In both calls,
positional arguments must be extracted from the tuple
and made Java arguments to wrapped.invokeExact
,
in a pattern characteristic of the broad type of the slot
(binary operation, reflected binary operation, attribute access, call, etc.).
This data movement is in PyWrapperDescr.callWrapped()
for both.
We express the specific invocation pattern
by overriding PyWrapperDescr.callWrapped()
in a sub-class
specific to the Signature
of the attached slot.
When a special method defined in Java is being exposed during type creation,
PyType
calls Slot.makeSlotWrapper
to obtain a descriptor.
This method forms a handle for the method implementation,
then calls Signature.makeSlotWrapper
.
Each Signature
in the enum
is a factory for the appropriate
sub-class of PyWrapperDescr
,
specialising callWrapped()
for the invocation pattern.
Both a direct call on the descriptor,
and a call to a PyMethodWrapper
depend on callWrapped()
,
and so this programs the calling behaviour entirely.
Wherever the descriptor is inherited by a Python sub-class,
the corresponding type slot will be set to wrapped
.
Most uses of the operation represented by a special function
already know the signature expected,
and go via the handle cached in the type object.
However,
in order to gain an understanding of the descriptor,
it is worth looking at how it can be be used in calls.
Calling a Special Method through its Descriptor¶
A PyWrapperDescr
is a callable object,
invoking the method it describes as if it were the defining function.
In int.__sub__(42, 10)
,
__get__
returns the object int.__sub__
,
which is the descriptor itself,
and that is then called with the arguments (42, 10)
:
>>> (d := int.__sub__)
<slot wrapper '__sub__' of 'int' objects>
>>> type(d)
<class 'wrapper_descriptor'>
>>> d(42, 10)
32
The objects that participate in the interaction are these:
We will show the classic call,
explicitly making the target
descriptor, args
tuple and
kwargs=null
visible.
The action during int.__sub__(42, 10)
runs like this:
Note that although we show d
simply as a PyWrapperDescr
,
it is actually an instance of a specialised subclass,
requested by the Slot
and constructed by the Signature
.
Calling a Special Method as a Bound Method¶
As with every other attribute access mediated by a descriptor,
a reference to the method via an object invokes __get__
.
This returns a callable object
binding the method definition and the target object.
Thus in (42).__sub__(10)
,
__get__
returns the object (42).__sub__
,
and then that is called with the argument 10
.
We represent this binding object by an instance of the PyMethodWrapper
,
that references the target instance self
, and the descriptor.
The action proceeds as follows:
7.9.11. Built-in Class Methods (PyClassMethodDescr
)¶
During type creation,
a PyClassMethodDescr
that appears in the dictionary of the PyType
,
is created from a MethodDef
specified by the class.
A MethodDef
signals that a PyClassMethodDescr
should be created,
rather than another type of method descriptor,
by its particular type.
(In CPython it uses the METH_CLASS
flag,
but in the diagram we propose using a sub-class of MethodDef
.)
PyClassMethodDescr
should not be confused with
the decorator described in The @classmethod Decorator (PyClassMethod).
This is another case where we should consider using a MethodHandle
as the basis of a Java implementation,
looking forward to possible use in a CallSite
.
In the source code,
we can use annotation again to identify a Java method as a class method
in the implementation of an exposed type.
Each MethodDef.Class
thus created should lead to a PyClassMethodDescr
in the dictionary of the type.
Unless __getattribute__
has been customised,
attribute access responds identically through types and their instances.
A reference to the method via an object invokes __get__
, as before.
However, the logic differs slightly from PyMethodDescr.__get__
,
in that when calling the method in the associated MethodDef
,
the type
argument is made the target, not the obj
argument.
This is different from PyMethodDescr
where d.__get__(null, type)
returns d
and only d.__get__(obj, type)
produces a bound method.
In PyClassMethodDescr
both return a PyJavaFunction
,
in which the target self
is the type.
(If the obj
argument is given but not the type
argument,
then type(obj)
is used.)
The target is not necessarily the class that defined the descriptor, but must be a Python sub-class of it.
In int.from_bytes(b'abcde', 'little')
,
__get__
returns the object int.from_bytes
,
which is a method bound to the type int
,
and then that is called with the arguments (b'abcde', 'little')
:
>>> (m := int.from_bytes)
<built-in method from_bytes of type object at 0x00007FFA58368D10>
>>> m.__self__
<class 'int'>
>>> m(b'abcde', 'little')
435475931745
To reach the raw descriptor, we must access it directly from the dictionary of the type:
>>> type(d := int.__dict__['from_bytes'])
<class 'classmethod_descriptor'>
>>> d.__get__(42)
<built-in method from_bytes of type object at 0x00007FFA58368D10>
>>> d.__get__(None, int)(b'abcde', 'little')
435475931745
>>> d(int, b'abcde', 'little')
435475931745
A direct call to the descriptor requires
that the type be supplied in the call.
No binding PyJavaFunction
is produced in this case.
Calling a Class Method normally in Python¶
We shall examine the binding behaviour (__get__
)
of a class method descriptor with sequence diagrams.
As already noted, it does not much matter whether the target is the type
or an instance of the type, so we choose the former.
In the simple call int.from_bytes(b'abcde', 'little')
,
the first step is the attribute access int.from_bytes
.
Under the circumstances depicted, this is handled by
the __getattribute__
slot in the int
object (a type).
That is a MethodHandle
to PyType.__getattribute__
.
As we saw with plain PyMethodDescr
,
the equivalent in types,
PyType.__getattribute__
looks in the dictionary of the target object itself (the type int
).
It finds a descriptor (a PyClassMethodDescr
),
on which to call __get__(None, int)
.
This has to return a PyJavaFunction
,
in which the self
field is assigned the type int
.
We can see that calling the PyJavaFunction
leads effectively to:
int.__dict__['from_bytes'].__get__(None, int)(b'abcde', 'little')
The optimisation in CPython
that avoids creation of an ephemeral PyJavaFunction
,
and which we referred to as we discussed PyMethodDescr
,
operates also for class method calls.
We’ll examine the sequence that produces next.
Calling a Class Method Descriptor in Python¶
Suppose a method call is made explicitly through a PyClassMethodDescr
,
for example int.__dict__['from_bytes'](b'abcde', 'little')
.
This is effectively what happens when
a class method call is optimised by CPython.
(It is difficult to imagine encountering this in user code.)
We can see how the direct call on the descriptor still lands in the implementation object.
7.9.12. Built-in Static Methods (PyStaticMethod
)¶
A PyStaticMethod
is a descriptor because it defines __get__
.
It does not, however, descend in Java from Descriptor
.
(In CPython the C struct staticmethod
does not have the C struct PyDescrObject
as a preamble.)
An instance of PyStaticMethod
is a wrapper
that may be applied to any object,
including a built-in function or a function defined in Python,
although this is likely to be meaningful only if the object is callable.
The wrapped object appears as the __func__
member
(callable
, internally).
When a PyStaticMethod
is derived from a MethodDef
during the creation of a type,
a PyJavaFunction
is first created from the MethodDef
.
Then the PyStaticMethod
is created to wrap it, and
is entered in the dictionary of the type.
A MethodDef
signals that a PyStaticMethod
should be created,
rather than another type of method descriptor,
by its specific type.
(In CPython it uses the METH_STATIC
flag,
but in the diagram we propose using a sub-class of MethodDef
.)
The constructor of a Python staticmethod
(the __init__
in fact)
also implements the decorator @staticmethod
seen in Python class definitions.
In that context, it is once more the constructed PyStaticMethod
that is inserted into the dictionary of the type,
but the callable (__func__
) is the function decorated.
A PyStaticMethod
is not itself callable,
but PyStaticMethod.__get__
returns the associated callable inside it.
(At least one argument is required to __get__
, but it will be ignored:
there is no real binding to do.)