3.3. Type and Arithmetic Operations

Code fragments in this section are taken from rt2/src/main/java/.../vsj2/evo2 and rt2/src/test/java/.../vsj2/evo2/PyByteCode2.java in the project source.

In this section we will extend the interpreter to cover a selection of unary and binary operations. Central to the way this is implemented in CPython is that every Python object contains a pointer to a PyTypeObject, and this determines its class, that is, how the operations on it are implemented.

We shall follow this design closely to obtain the same semantics, in particular, using Java MethodHandles as an equivalent to C pointer to function.

3.3.1. Representing a Python Class

In CPython, objects are C structs and all operations on them are supplied by a PyTypeObject. Every Python object points to the PyTypeObject of its Python type. The PyTypeObject contains pointers to functions supporting the operations that the Python interpreter might perform on an object. In many cases, the supporting function essentially implements a particular opcode, apart from the stack effects.

In order to emulate this, we must be able to get, from any Python object in our implementation, a type in which we find a definition of each supporting function (or from which we find the function doesn’t apply to that type of object). To this end, we define a class PyType, and require every PyObject be able to produce a reference to it.

interface PyObject {
    PyType getType();
}
class PyType implements PyObject {
    // ...
    final String name;
    // ...
    @Override
    public String toString() { return "<class '" + name + "'>"; }
}

In CPython, a PyTypeObject includes a table of pointers (called “slots”), one for each supporting function. A slot may contain NULL, meaning that that function is not defined for the type. The table is multi-level, in that some of the fields in a PyTypeObject contain pointers to sub-tables of more slots, with the possibility that that the sub-table pointer is NULL, meaning none of the slots in that table apply. For example, there is a sub-table for numerical operations, and another for operations applicable to sequences, and the pointers to these are non-null only if the type is numerical, or a sequence, respectively.

Let’s take a look at how this is used.

3.3.2. A Unary Operation UNARY_NEGATIVE

We start with a unary operation, as the logic is more complicated for binary ones.

When the CPython eval-loop encounters a UNARY_NEGATIVE opcode, it calls PyNumber_Negative on the value from the top of the stack. The definition is found in abstract.c and looks like this:

PyObject *
PyNumber_Negative(PyObject *o)
{
    PyNumberMethods *m;

    if (o == NULL) {
        return null_error();
    }

    m = o->ob_type->tp_as_number;
    if (m && m->nb_negative)
        return (*m->nb_negative)(o);

    return type_error("bad operand type for unary -: '%.200s'", o);
}

Here, CPython picks up the object type (o->ob_type) and heads for the numeric sub-table tp_as_number. If the table pointer is not null and the slot is not null, it invokes the function it finds there. null_error() raises a Python SystemError, since we should never encounter a null object pointer here, and type_error() raises the TypeError that tells you you’ve asked for an impossible operation. Both return NULL. (Raising an exception in the C API occurs by returning NULL, and leaving values set in the CPython thread state, that the eval-loop then picks up. Our design will aim to make use of Java exceptions directly.)

We might ask whether we could not in-line PyNumber_Negative in eval(), but a survey of the code base shows that it is also used elsewhere, wherever a negative is needed that adapts to the specific object type. The companion functions PyNumber_Invert, PyNumber_Add, PyNumber_Multiply, and so on, are even more widely re-used, so we take the hint and create a utility class Number to hold them.

The syntax of method handle invocation is not quite as neat as in C, but in other ways our Java version can be more succinct:

class Number extends Abstract {

    /** Python {@code -v} */
    static PyObject negative(PyObject v) throws Throwable {
        try {
            MethodHandle mh = v.getType().number.negative;
            return (PyObject) mh.invokeExact(v);
        } catch (Slot.EmptyException e) {
            throw operandError("-", v);
        }
    }

    /** Create a {@code TypeError} for the named unary op. */
    static PyException operandError(String op, PyObject v) {
        return new TypeError("bad operand type for unary %s: '%.200s'",
                op, v.getType().getName());
    }
    // ...
}

The key part to understand is v.getType().number.negative. Here we go to the PyType object of v and navigate to its definition of the negative slot, which would be called nb_negative in CPython.

There are no tests for null because we do not use null to signify an empty slot, but a special method handle UNARY_EMPTY. (As the name gives away, there must be a special handle per slot type to match the signature.) This handle leads to a method that throws EmptyException. Likewise, the reference number (which would be tp_as_number in CPython) is never null, but points to a table where every slot throws that exception.

We do not mind that throwing exceptions may be a little slow, since it mostly only happens under error conditions. If nothing is thrown, try ... catch is essentially free. Other exceptions (or arbitrary Throwables), we let propagate to the caller, as this method does not know how to handle them. Instead, we catch them in the eval() loop of our CPythonFrame.

The handle for negative in the type of v, if it is not UNARY_EMPTY, points to a method in the implementation class. Consider the case where v is a float. The implementation class is PyFloat, and the method will be this one:

class PyFloat implements PyObject {
    // ...
    static PyObject neg(PyObject v) {
        try {
            double a = ((PyFloat) v).value;
            return new PyFloat(-a);
        } catch (ClassCastException cce) {
            return Py.NotImplemented;
        }
    }
}

There is an interesting difference from the CPython version, which has the signature float_neg(PyFloatObject *v). The way we choose a PyType object guarantees that v will be a Python float, but the way we fill slots does not allow us (as CPython is allowed) arbitrarily to cast the function signature. Instead, we make the cast here and catch the exception. Since it never happens (unless there is a bug in the implementation) perhaps we should raise an internal error, or simply let the NPE propagate (with no try...catch at all).

3.3.3. A Binary Operation BINARY_ADD

For this specimen binary operation, the wrapper is also like that in CPython:

class Number {
    // ...
    /** Python {@code v+w} */
    static PyObject add(PyObject v, PyObject w) throws Throwable {
        try {
            PyObject r = binary_op1(v, w, Slot.NB.add);
            if (r != Py.NotImplemented)
                return r;
        } catch (Slot.EmptyException e) {}
        throw operandError("+", v, w);
    }
    // ...
}

We do not (yet) deal with the addition of sequences, meaning concatenation.

Note the function binary_op1, named identically to its Python counterpart, contains the special logic that Python applies to binary operations. Like CPython’s, it may return Py.NotImplemented if neither object knows how to implement the operation. Unlike CPython’s, it may also throw EmptyException if it invokes an “empty” slot. These Py.NotImplemented may be the actual return value of an implementation in Python of __add__ or __radd__. These have exactly the same significance here, and in either case, we drop through to call typeError().

Our equivalent of CPython binop_op1 in abstract.c is made somewhat simpler by this strategy and an absence of null tests:

class Number {
    // ...
    private static PyObject binary_op1(PyObject v, PyObject w,
            Slot.NB binop) throws Slot.EmptyException, Throwable {
        PyType vtype = v.getType();
        PyType wtype = w.getType();

        MethodHandle slotv = binop.getSlot(vtype);
        MethodHandle slotw;

        if (wtype == vtype || (slotw = binop.getSlot(wtype)) == slotv)
            // Both types give the same result
            return (PyObject) slotv.invokeExact(v, w);

        else if (!wtype.isSubTypeOf(vtype)) {
            // Ask left (if not empty) then right.
            if (slotv != BINARY_EMPTY) {
                PyObject r = (PyObject) slotv.invokeExact(v, w);
                if (r != Py.NotImplemented)
                    return r;
            }
            return (PyObject) slotw.invokeExact(v, w);

        } else {
            // Right is sub-class: ask first (if not empty).
            if (slotw != BINARY_EMPTY) {
                PyObject r = (PyObject) slotw.invokeExact(v, w);
                if (r != Py.NotImplemented)
                    return r;
            }
            return (PyObject) slotv.invokeExact(v, w);
        }
    }

    private static final MethodHandle BINARY_EMPTY =
            Slot.Signature.BINARY.empty;
    // ...
}

In cases where we may have to let both objects answer, we check the first slot to see if it is empty, rather than letting it throw and having to catch it to try the other slot. (Note the occurrence here of BINARY_EMPTY.) In other places, however, we do not test for an empty slot, since throwing the EmptyException is a satisfactory ending.

We do not at present implement Python sub-classing, but the test is there (returning false) so we can exhibit the logic.

The argument Slot.NB binop may be puzzling. It is actually a specially-crafted Java enum that is able to look up a method handle in a PyType. More on this next.

3.3.4. How we Fill the Slots

This is quite complicated.

CPython’s typeobject.c is around eight thousand lines long, and this compactness (!) is obtained by extensive use of C macros, to generate both tabular data and entire function definitions. We get something similar using inheritance and the Java enum.

Our equivalent of the CPython PyNumberMethods is within PyType:

class PyType {
    // ...
    /** Tabulates the number methods (slots) of a particular type. */
    static class NumberMethods {
        MethodHandle negative = Slot.NB.negative.empty;
        MethodHandle add = Slot.NB.add.empty;
        MethodHandle subtract = Slot.NB.subtract.empty;
        MethodHandle multiply = Slot.NB.multiply.empty;
        //...
    }
}

The members are the slots and construction sets them all “empty” (throwing EmptyException). There will be such a class for each tp_as_* sub-table in a CPython PyTypeObject. The field names are identical to CPython’s without the prefix nb_, but if we follow CPython, the method names to which they map are (sometimes) not the same.

We need succinct ways to refer to the slots, to define their signatures, and to specify the methods that they call. We do this through some specially-crafted Java enums, with appropriate behaviour. We have already seen Slot.NB in action, but the way we create the whole family interesting.

We begin by defining the allowable signatures for methods that fill slots. (Compare these with the typedefs in CPython Include/object.h: UNARY is unaryfunc, SQ_ASSIGN is ssizeobjargproc, etc.. The constructor arguments are the same as in a call to Java MethodType.methodType(), and from them we create both a MethodType type, and a MethodHandle empty conforming to that type, that throws EmptyException.

class Slot {

    private static final MethodHandles.Lookup LOOKUP =
            MethodHandles.lookup();

    static class EmptyException extends Exception {}

    private static final Class<PyObject> O = PyObject.class;
    private static final Class<?> I = int.class;
    private static final Class<?> B = boolean.class;
    private static final Class<?> V = void.class;
    private static final Class<Opcode.PyCmp> CMP = Opcode.PyCmp.class;
    // ...

    /**
     * An enumeration of the acceptable signatures for slots in
     * {@code PyType.*Methods} tables.
     */
    enum Signature {
        UNARY(O, O),
        BINARY(O, O, O),
        TERNARY(O, O, O, O),
        PREDICATE(B, O),
        LEN(I, O),
        RICHCMP(O, O, O, CMP),
        SQ_INDEX(O, O, I),
        SQ_ASSIGN(V, O, I, O),
        MP_ASSIGN(V, O, O, O);

        final MethodType type;
        final MethodHandle empty;

        Signature(Class<?> returnType, Class<?>... ptype) { /* ... */ }
    }
    // ...
}

The next stage is to create an enum for each sub-table of slots, and the slots in the PyType itself, using these Signature constants to type the slots:

class Slot {
    // ...
    enum NB implements Any {

        negative(Signature.UNARY, "neg"),
        add(Signature.BINARY),
        subtract(Signature.BINARY, "sub"),
        multiply(Signature.BINARY, "mul"),

        final String methodName;
        final MethodType type;
        final MethodHandle empty;
        final VarHandle slotHandle;

        NB(Signature signature, String methodName) {
            this.methodName = methodName == null ? name() : methodName;
            this.type = signature.type;
            this.empty = signature.empty;
            this.slotHandle = EnumUtil.slotHandle(this);
        }

        NB(Signature signature) { this(signature, null); }

        // ...

        @Override
        public boolean isDefinedFor(PyType t) {
            return (MethodHandle) slotHandle.get(t.number) != empty;
        }

        @Override
        public MethodHandle findInClass(Class<?> c) {
            return EnumUtil.findInClass(this, c);
        }
    }

    interface Any {
        Group group();
        String name();
        String getMethodName();
        MethodType getType();
        MethodHandle getEmpty();
        boolean isDefinedFor(PyType t);
        MethodHandle findInClass(Class<?> c);
        MethodHandle getSlot(PyType t);
        void setSlot(PyType t, MethodHandle mh);
    }

    private static class EnumUtil {

        static VarHandle slotHandle(Any slot) {
            Class<?> methodsClass = slot.group().methodsClass;
            try {
                // The field *Methods has the same name as the enum
                return LOOKUP.findVarHandle(methodsClass, slot.name(),
                        MethodHandle.class);
            } catch (NoSuchFieldException | IllegalAccessException e) {
                // ...
            }
        }

        static MethodHandle findInClass(Any slot, Class<?> c) {
            try {
                // The method has the same name in every implementation
                return LOOKUP.findStatic(c, slot.getMethodName(),
                        slot.getType());
            } catch (NoSuchMethodException | IllegalAccessException e) {
                return slot.getEmpty();
            }
        }
    }
    // ...
}

Notice that enum NB implements an interface Any that specifies its behaviour. It also makes use of a helper class to supply behaviour common between the enumerations NB, SQ, etc..

The enum NB does much more than designate a slot as it would in C. The object is a getter/setter for slots in types, and this works whichever sub-table the slot is in, thanks to the use of java.lang.invoke.VarHandle slotHandle. The field (slot) a particular enum member gets and sets has the same name as the enumeration constant itself. So where t is the PyType target, Slot.NB.negative.setSlot sets t.number.negative, while Slot.TP.str.setSlot sets t.str.

Note

The use of slot names as method names has the side effect that the same method may satisfy two slots. This happens in the case of MP.length and SQ.length. If a type defines static int length(PyObject) it will fill both t.mapping.length and t.sequence.length. In the CPython code base, there are several types that define one but not the other slot. Every type that defines both, defines them withthe same function. If necessary, we can give the slots distinct names (universally).

In the member declarations we get to specify the name of the method (in some class implementing the type) that will be placed in the slot, and the signature that method should have. findInClass is the method that will go looking for it, supported by EnumUtil.findInClass. The Signature implies both the MethodType of the required implementation and the particular “empty” handle that should fill the slot otherwise.

Since the enum NB knows what “empty” looks like for this slot, we can ask it to test for that in isDefinedFor(PyType t). We can test a retrieved MethodHandle directly, as in binary_op1 above, but the constant we use has to be the right kind of “empty” for the slot.

Finally, since the several Slot.XX enumerations all implement Any, it is practicable to use one, or work through lists of them, without caring which sub-table any particular slot is in. When necessary, the group method will reveal that.

We can use this apparatus in the construction of a PyType like so:

class PyType implements PyObject {

    static final PyType TYPE = new PyType("type", PyType.class);

    @Override
    public PyType getType() { return TYPE; }
    final String name;
    private final Class<? extends PyObject> implClass;

    // Method suites for standard abstract types.
    final NumberMethods number;
    final SequenceMethods sequence;
    final MappingMethods mapping;

    // Methods to implement standard operations.
    MethodHandle hash;
    MethodHandle repr;
    MethodHandle str;

    PyType(String name, Class<? extends PyObject> implClass) {
        this.name = name;
        this.implClass = implClass;

        // Initialise slots to implement standard operations.
        hash = Slot.TP.hash.findInClass(implClass);
        repr = Slot.TP.repr.findInClass(implClass);
        str = Slot.TP.str.findInClass(implClass);

        // If immutable, could use NumberMethods.EMPTY, etc.
        (number = new NumberMethods()).fillFromClass(implClass);
        (sequence = new SequenceMethods()).fillFromClass(implClass);
        (mapping = new MappingMethods()).fillFromClass(implClass);
     }
    // ...
}

The method PyType.NumberMethods.fillFromClass (SequenceMethods and MappingMethods are essentially the same):

class PyType implements PyObject {
    // ...
    static class NumberMethods {

        MethodHandle negative = Slot.NB.negative.empty;
        MethodHandle add = Slot.NB.add.empty;
        MethodHandle subtract = Slot.NB.subtract.empty;
        MethodHandle multiply = Slot.NB.multiply.empty;
        // ...

        void fillFromClass(Class<? extends PyObject> c) {
            for (Slot.NB s : Slot.NB.values()) {
                MethodHandle mh = s.findInClass(c);
                if (mh != s.empty) { s.setSlot(this, mh); }
            }
        }
    }
}

This sets all the slots by reflection on the implementation class c.

There are opportunities for optimisation, spotting when a type does not define any slots in a particular sub-table, and using a shared constant. Some care will be required over whether and when a type actually allows slots to be redefined. CPython makes this distinction between built-in types and “heap types”, but where a type is allocated is not really the issue. Appropriate visibility of mutators and validity checks will be needed. For now, all our types admit modification of their slots.