Generators

I initially thought of implementing generators using either cl-cont or Paul Graham’s continuation-passing macros, but I no longer think this strategy is necessary or appropriate. Using continuations to implement generators is unnecessary because generators do not need to store the entire call chain. Using continuations is inappropriate because both implementations transform code into continuation-passing style, which can cause problems with implementations that don’t properly optimize away tail calls.1

Before describing a different implementation of Python generators, I’ll attempt to describe the main problems we need to solve:

  1. Generators need to store the values of local variables when yield causes them to suspend.
  2. Generators must be able to resume execution at an arbitrary point in the code body. This requirement includes jumping into loops and if branches
  3. Generators need to store intermediate calculation values when yield is used as a sub-expression
  4. Generators must support the two coroutine methods, throw and send.
  5. Generators must allow yielding from with and try statements. Generators must ensure proper release of resources when garbage collected.

1. Local variables

The first problem is similar to the upward funarg problem in implementing closures. In fact, since Python’s local variables are statically scoped2, we can “lift” these variables out of the generator scope and simply let Lisp handle the “local” variables for us. For example, say we wanted to implement the following Python generator:

def fib_gen():
    fib = 1
    prev = 0
    n = 0
    while True:
        fib, prev = fib + prev, fib
        n += 1
        yield (fib, n)
        # Yes, we're skipping the zero-th Fibonacci number

We could implement a similar concept in Lisp using let over lambda as follows:

(defun fib-gen ()
  (let ((fib 1) (prev 0) (n 0))
    (lambda ()
      (psetq fib (+ fib prev)
             prev fib)
      (incf n)
      (values fib n))))

Instead of a generator with a next() method, our lisp code returns a closure, but its behavior is otherwise identical.

2. Resuming execution

Since we’re only implementing generators for our Python DSL, we only need to worry about a few control-flow statements. With the exception of try/finally and the with statement, all the statements (if, for, while, break) can be compiled into a tagbody and go forms (yes, I’m going there…). We can therefore wrap the generator body in a tagbody and expand the control-flow statements into appropriate tags and go calls. By appropriately tagging yield statements, we can simply go to the right place in the generator body when we need to resume.

Let’s translate another Python generator into a Lisp closure:

def xrange(end):
    i = 0
    while i < end:
        yield i
        i += 1

In Lisp, this could be:

(defun xrange (end)
  (let ((i 0) (status 'not-started))
    (lambda ()
      (block gen
        (tagbody
         (case status
           ('not-started nil)
           ('started (go yield-restart))
           ('ended (go while-end)))
         (setq status 'started)
         while-start
         (unless (< i end) (go while-end))
         (return-from gen i)
         yield-restart
         (incf i)
         (go while-start)
         while-end
         (setq status 'ended)
         (error "generator ended"))))))

Though the tagbody results in ugly, difficult to read code, it gives us the requisite flexibility to jump nearly anywhere in the generator body. By extending status and the case statement as necessary, we can support multiple yield expressions in the body of a generator.

3. Intermediate calculations

tagbody doesn’t quite let us jump anywhere in the body of a generator. For example, we can’t pause the evaluation of arguments to a function before calling the function. E.g., in f(expr(), (yield)), the generator must store the value of expr() before suspending, and must use the value to call f when it resumes.

The technique to deal with this case is to convert the intermediate calculations into explicit (gensym-ed) variables, inserting the label between the hidden assignment and the function call.3 Let’s see an example:

def update_items(owner):
    for item in get_items_from_db(owner):
        update(item.get_name(),
               (yield item),  # Allow caller to modify item
               now())

In Lisp this generator could be

(defun update-items (owner)
  (let (item temp-1 temp-2 (status 'not-started))
    (lambda (val)
      (block gen
        (tagbody
         (case status
           ('not-started nil)
           ('started (go yield-restart))
           ('ended (go end)))
         (setq status 'started)
         (setq temp-1 (get-items-from-db owner))
         for-begin
         (handler-case
          (setq item (funcall temp-1))  ; item = next(temp_1)
          (error () (go for-end)))
         (setq temp-2 (get-name item))
         (return-from gen item)
         yield-restart
         (update temp-2 val (now))  ; val has the value sent in
         (go for-begin)
         for-end
         (setq status 'ended)
         end
         (error "Generator finished"))))))

4. Throw and send

Our last example implemented a send method. A full generator implementation could use a message passing mechanism to signal the closure whether the call was a next, send, or throw and pass in the value or exception, if any.

5. Release of resources

Edit: The original version of this post glossed over the try, finally, and with implementations. I will expand this section over the next couple of days to address this non-trivial point.

Unlike for and while, try...finally (and its sibling, with), cannot be emptied out into an enclosing tagbody, since in Lisp they ultimately need to be wrapped in an unwind-protect.

The Python documentation specifies that the __del__ method of the generator calls its close method, ensuring that any finally clauses and with blocks get resolved properly. Implementing the close method as a wrapper around throw is straightforward; the difficulty lies in ensuring the __del__ method gets called. This difficulty extends beyond just generators to any object that defines custom finalization code and will be addressed separately.

Implementation

I am still working on the translation tool to make all these changes automatically. One key problem is expanding a single form (with the AST for a loop) into multiple forms right in the tagbody. I’ll post the code and a discussion of some of these issues when they’re resolved.

Footnotes

  1. I think most implementations support this optimization. See this post by Marc Simpson for more details. Nonetheless, it seems dangerous to depend on such a feature for proper support.

  2. By static I mean that local variables are determined at compile time rather than run time, and result in the use of the LOAD_FAST opcode.

  3. Lambda: The Ultimate Imperative, by Steele and Sussman, has a good discussion of evaluation order (see the endnote “evalorder”). Technically, we only need to temporarily store expressions that don’t evaluate trivially, though in this case not all variables evaluate trivially (e.g., module-level variables can change values between yielding and resuming).

The Initial Spec

As I started to implement some of the nodes in our Python DSL, I realized there were a few elementary operations like attribute access and assignment, exception raising, function calling and object creation that I kept needing to perform at the Python level. Since these operations are core to the language, they are highly interdependent, and we can’t just implement them sequentially. For example, to implement function calling, we need to be able to raise exceptions for invalid arguments; to raise exceptions we need to be able to create them; and to create them we need to be able to call their class.

To get around this circularity, we will first specify the interface that these operations will obey and then we can attack these in any order we like.

Attribute access

Attribute access will be provided through a single function:

(defun getattr (obj &rest attrs))

We allow multiple attributes to be given in case of nested attribute access. Thus o.attr in Python translates to (getattr o attr) and o.attr1.attr2 translates into (getattr o attr1 attr2).

getattr will define a generalized variable so that assignment and deletion work transparently (they way it does in Python).

Assignment

Contrary to Lisp, Python variables default to the innermost lexical scope, unless they are declared nonlocal or global. Managing this difference will be the responsibility of our assign macro:

(defmacro assign (target value))

Since Python has it’s own generalized variables (limited to attribute and subscript access), assign will play a role similar to setf in Lisp. Note that for simplicity assign will not destructure target if it is a Python tuple or list, and will instead raise an error. If it becomes necessary, we can provide a py-destructuring-bind macro on top of assign.

New functions

In Python, we can create functions using either the def statement or the lambda expression. Since def creates function bindings in the local scope (contrary to Lisp’s defun), def can be thought of as a multi-expression lambda together with an assignment. It will be convenient for us to separate the function creation from the function assignment (for decorators, binding methods, and possibly more). Our primary means of creating Python functions will be using pylambda:

(defmacro pylambda ((&optional args starargs kwargs starkwargs)
                    &body body))

The slightly strange signature is designed to allow replicating Python’s signature unambiguously.

(pylambda ((x y (z 3))
           args
           (kw1 (kw2 2))
           kwargs))

is equivalent to

lambda x, y, z=3, *args, kw1, kw2=2, **kwargs: None

Unlike lambda in Python, however, pylambda is not restricted to a single Python expression. Thus def can be translated into a pylambda combined with an assignment, which we wrap as:

(defmacro pydef (name
                 (&optional (&rest args) starargs (&rest kwargs) starkwargs)
                 &body body))

Calling callables

Until we decide on the exact structure of Python objects in general, and functions in particular, it will be good to have an easily available hook we can use to wrap/unwrap objects, rearrange arguments, etc. To provide this hook to our program, we’ll use a function to call our Python callable objects:

(defun pycall (func &optional args starargs kwargs starkwargs))

pycall is equivalent to the pseudo-Python:

func(args[0], args[1], ... args[n],
     *starargs,
     kw.keys[0]=kw.vals[0], kw.keys[1]=kw.vals[1], ... kw.keys[m]=kw.vals[m],
     **starkwargs)

In other words, positional arguments are passed as a (Lisp) list in args and keyword arguments are in kwargs (as an alist), while starargs and starkwargs handle the * and ** arguments.

Exceptions

For the core language features, we’ll only need to be able to raise built-in exceptions. For now we’ll only define a function for doing so easily, leaving the more general raise statement for later.

(defun raise-builtin (class-name &key args kwargs))

raise-builtin accepts a class name, given as a (case-sensitive) keyword (e.g., :|TypeError|), a (Lisp) list of positional arguments, and an alist of keyword arguments. The arguments are passed to the class constructor to create the instance that will be raised.

In order to handle exceptions, we’ll draw a similar distinction between builtin and user exceptions. The following macro lets us handle builtin exceptions:

(defmacro py-builtin-try (try-body &rest except-finally-clauses))

except-finally-clauses must have the form (exception (&optional var) body), where exception is a keyword that could be given to raise-builtin, or t to indicate a finally clause. var is the symbol that will be bound to the exception instance in the execution of body. There can only be zero or one finally clauses, and if there is one, it must be the last clause in the macro call. Inside an exception handler body, (raise) may be used to re-raise the exception in the same context.

New Python classes

Creating new Python classes does not require any new code, since we can use the three argument form of type to create the class and assign to place it in the local namespace. It will be convenient, however, to provide a shortcut that takes Lisp objects for its arguments:

(defmacro pyclass (name (&rest bases) &optional dict))

pyclass creates a new Python class named name with base-classes given by bases (a Lisp list). If the optional argument dict (a hash-table or alist) is given, it will fill the class namespace. This function is roughly equivalent to name = type(name, bases, dict)

First draft of the Data Model

Function calls

I think the most efficient way of handling function calls in Python will be to translate them into function calls in Lisp. The main difficulties will be:

  • Argument parsing This PEP describes how Python parses function calls and binds arguments to their values. I think with some care, much of this functionality can be replicated using macros.
  • Scope This PEP describes how Python creates scope. Since Python uses lexical scoping, we can translate these rules efficiently into Lisp lets (though likely using setq more often than in traditional Lisp code). Importantly, we can determine at compile-time which names are bound in which blocks.
  • Returning values Translated code will likely use Lisp’s return-from operator much more than idiomatic Lisp, and will similarly have to take care to return None when the Python function doesn’t provide a return value

Exception handling

Python’s try: ... except:... blocks should be readily translatable into signals and handlers in Lisp. Since we’re implementing Python functions as Lisp functions, in principle the Python stack will be (a subsegment of) the Lisp stack. However, to allow introspection and keep track of the Python-specific stack frames, it will likely be necessary to keep a parallel stack, which will likely take the form of a dynamic variable, *current-frame*. We will also be able to create traceback objects from this parallel stack.

Classes and types

Python classes will be translated into CLOS classes. While I don’t expect this translation will be an easy task, I think CLOS + MOP is sufficiently flexible to allow an efficient translation to be generated. Initially though, classes will be implemented as CLOS objects (of a particular a Lisp class py-class), and instances will similarly be implemented as CLOS instances (of py-object super-class of py-class). Class features (attribute resolution, metaclasses, etc.) would be implemented manually (i.e., outside the MOP scheme). Practically speaking, a Python class statement would be translated into a (make-instance 'py-class) Lisp form, with some initializing of the class.

Longer-term, a more efficient translation would likely implement Python classes as different CLOS classes, using MOP machinery to ensure attribute resolution, metaclasses, etc. work the same way as in Python. I.e., translating a Python class statement into a Lisp defclass form.

Built-in types

Many of the built-in types that get used a lot in Python code will be implemented with the Lisp built-in types, and thus will not inherit from py-object. We’ll therefore have to special-case some of the syntax-related functions (like getattr) to deal with these objects, but it will make it easier for Lisp code to call Python code. I think that added simplicity for client code is worth the complexity in the translation code.

Numeric types

Python numbers have easy analogues in Lisp types:

Python type Lisp type
int integer
float double-float
complex complex

Strings, bytes, bytearray, memoryview

Similar to what we did when defining the Python DSL, we’ll represent bytes and bytearray in our runtime as vectors, only for efficiency we’ll use the type (unsigned-byte 8). We’ll have to manually impose the immutability of bytes.

Python str objects are a little tricky to implement since Common Lisp does not require supporting any characters beyond the 96 standard characters. To use native functions wherever possible, we’ll likely have to use a structure similar to the one CPython uses. Strings whose characters all fall within the implementation’s definition of character are then represented with native strings, and all other strings are represented with vectors of type (unsigned-byte x) for x in {8, 16, 32}.

As for memoryview, the whole issue of the C API is something I want to punt on for now, but if the need arose I could see implementing this class for the built-in types.

List

Per the docs Python’s lists are implemented as variable-length arrays, so we will do the same. We’ll leave the “cleverness” behind the resizing of the array to Lisp for now though.

Tuple

Tuples will also be arrays, with the immutability imposed by our code.

Dict

Per the same docs, Python’s dicts are implemented as hash tables, which we will also do. In the future, it might make sense to mimic Python’s implementation of the hashing algorithms and hash-table access.

dict_keys, dict_items, dict_values

These will likely be full-blown objects to allow for their “live” state.

Generators

I think the most logical way to implement these is as wrappers to continuations. I’ll have to explore the different continuations libraries (or go look at Paul Graham’s On Lisp for his “fake” implementation).

File objects

These will be wrappers around Lisp file streams, with Python’s open getting translated into an appropriate call to Lisp’s open.

Other

Sets and frozensets

Initially, as Python did, these could be hashtables with dummy values. I imagine there are small-enough implementations of these out there that including (via copy-paste-adapt) one in py-lisp would not be too difficult.

Range

These will be full-blown objects

A Python DSL

Creating a DSL to express Python code in Lisp

As I stated in the first post, the first step in our workflow will be to translate Python code into an abstract syntax tree (AST), using Python itself, and then dump the tree into a Lisp form.

The Python docs aren’t great, but Green Tree Snakes has a great reference with a list of all the nodes in Python’s abstract grammar and a description of what they’re for.

Another way of obtaining a list of all the node types is as follows:

>>> import ast
>>> import inspect
>>> for name, c in inspect.getmembers(ast, inspect.isclass):
>>>     if issubclass(c, ast.AST):
>>>         print(name)
Add
And
Assert
# <snip>
unaryop
withitem

Dumping the AST to a Lisp string

There are a lot of nodes, so at this stage rather than manually decide how to represent every node, we’ll dump the AST tree into generic s-expression with minimal Python-side processing.

The first thing we’ll want to define is a target format. Since we want to do minimal Python processing, we’ll use simple cons trees: Nodes will follow the pattern (node-name sub-node-1 sub-node-2 ... sub-node-n) and their literal attributes will be represented by Lisp literal values.

For example, we’ll want to say

translate(ast.parse("def f(arg1):\n  pass"))

and have it return a string containing the following form:

(|py-Module|
 ((|py-FunctionDef| "f"
                    (|py-arguments| ((|py-arg| "arg1" |None|))
                                    |None|
                                    ()
                                    ()
                                    |None|
                                    ())
                    ((|py-Pass|))
                    ()
                    |None|)))

(of course, we won’t expect the translator to pretty print the form :)

We can use a simple recursive function to handle most cases:

def translate(node_or_literal):
    """Convert an AST node or Python literal into a Lisp form (recursively)."""
    if isinstance(node_or_literal, ast.AST):
        symbol = "|py-%s|" % node_or_literal.__class__.__name__
        if not node_or_literal._fields:  # this is a leaf node
            return "(%s)" % symbol
        args = " ".join(translate(sub_nl)
                        for _, sub_nl in ast.iter_fields(node_or_literal))
        return "(%s %s)" % (symbol, args)
    return translate_literal(node_or_literal)

We’re punting on literal translation for a minute to momentarily savor this quick success.

Translating AST literals into Lisp

So far we’ve handled half of the translation we’d described; now we are going to translate Python literals. Per the official docs, the Python abstract grammar has six builtin types which can end up becoming literals in the AST:

Grammar type AST attribute Lisp type
identifier str simple-vector
int int integer
string str simple-vector
bytes bytes simple-vector
object Used for ast.Num, where it’s used to store a number number
singleton Used for ast.NameConstant, holds True, False, or None t, nil, or |None|
* There’s an implied seventh type list, which is indicated in the grammar by the use of * list

Of all the attributes possible, only None is not built into Lisp as a literal; we can leave it as a symbol for now and define it as an object later on. We can represent bytes as literal vectors of fixnums using the #() notation.

It’s worth noting that these choices for literals don’t have to correspond to our run-time (or even compiled) representations for objects. We can make forms like |Bytes| or |Str| generate a better represenation in Lisp, and keep our Python code simple for now.

Unicode and Python strings

It may seem strange to not use Lisp strings to represent Python str objects. Although Lisp has a string type, unicode support across implementations is not great (and not guaranteed by the standard), so we need a simple, portable represenation for the translated source.

One such representation for identifier and string literals is to encode them as big-endian UTF-32 and then represent them like we represent bytes. Because the grammar doesn’t contain any places that accept more than one of string, identifier, or bytes, we’ll be able to convert the #() vector into its run-time represenation once we decide what that is and how to implement our DSL.

Representing floats properly

The Python data model explicitly states that numbers.Real leave you “at the mercy of the underlying machine architecture for the accepted range and handling of overflow.” The float class (which is a different class altogether), states that they “are usually implemented using double in C.”

Unfortunately, the Common Lisp spec does not specify standard float sizes (though it does state minimums), so we are not necessarily able to represent the same set of floats in Python and Lisp. The best we can do is output float literals using repr, which ensures that float(repr(x)) == x for all floats x other than inf, -inf, and NaN.

Another issue we’ll have to deal with later is representing the IEEE 754 special values inf, -inf, NaN, which Common Lisp does not require, and properly handling overflow calculations. Since Python doesn’t have inf and nan literals, we don’t have to worry about these problems (yet).

Our translation code for literals can therefore be specified as:

def translate_literal(literal):
    "Translate a Python literal into a Lisp literal."
    if isinstance(literal, str):
        return translate_literal(literal.encode("utf-32-be"))
    elif isinstance(literal, bytes):
        return "#(%s)" % " ".join(map(str, literal))
    elif literal is None:
        return "|None|"
    elif isinstance(literal, bool):
        return "t" if literal else "nil"
    elif isinstance(literal, list):
        return "(%s)" % " ".join(map(translate, literal))
    elif isinstance(literal, complex):
        return "#C(%r %r)" % (literal.real, literal.imag)
    else:  # Should be an integer or float
        return repr(literal)

Where to go from here?

Our implementation of the translation code has determined a first version of our DSL. From here, we’ll begin implementing macros and functions so our translated code can run! I expect there will be some node types we’ll go back and special-case in Python, where different forms may be easier to work with in Lisp. Of course, we’ll also have to design our (first version of the) data model to fully implement many of the nodes here.

Python is not Common Lisp

Project aim

Simply put, I want to bring the vast number of Python libraries to the Common Lisp world. I want to give a better answer to this Stack Overflow question.

The process I envision would operate as follows:

  1. Python community develops awesome library pyX to do X.
  2. Avid Lisper downloads pyX and runs the source through a Python script that generates Lisp code.
  3. Said Lisper writes a thin wrapper around the generated code, packages the code as library CLX and puts it up on Quicklisp or similar.
  4. A different (possibly less avid) Lisper without access to a Python interpreter and needing to do X downloads CLX along with its dependency pylisp. She happily peforms X in Lisp, no Python necessary.

Why this project?

In my experience, Python has many more and better libraries than Common Lisp. I have many times wanted to work on certain projects that I think would be well suited for CL, only to discover that there’s no robust (for example) PDF-parsing library.

Why this structure?

The way I see it, there are fundamentally three ways of bridging the Python-Lisp gap:

1. Embed Common Lisp in Python

The example I first came across is Peter Norvig’s, but the closest thing I’ve found is Hy, which is more of a Python dialect written in S-expressions (and freaking awesome, too!).

I suspect that doing this properly would be a massive undertaking, amounting to a full re-write of CL in Python.

2. Embed Python in Lisp

There is cl-python, which targets compatibility with Python 2.7, but it is unfortunately no longer maintained. This seems to me like the closest anyone has come to bridging the two languages. It installs with quicklisp, but the test suite does not pass on my machine. (There are a bunch of “Expected test failure for … did not occur.”; I may dig into the source to investigate what is going on.) At any rate, this project seems more ambitious than my own, since it provides a means of calling Lisp from Python as well.

This is my favored approach. The Python ast has relatively few nodes, most of which should be relatively straightforward to implement. I expect that classes may require some MOP trickery to get right and function calls/proper scoping of variables may require special care. Some of the introspection machinery in particular seems difficult to reproduce. The oft-pointed to similarity between Python and Lisp is, I think, quite-apparent at this level.

3. Provide an FFI

I’m not sure, but I think burgled-batteries is the top choice here, though it’s been 3 years since the last commit. They list Pyffi (appears to only support Python up to 2.5), cl-python, and Python-on-lisp (dead since 06).

I think this is a tremendously elegant solution which basically reduces your requirements down to a DLL. Despite having written several Python C-extensions, I still find its C API a bit daunting.

Project plan

  1. Initially, I will rely on a Python interpreter to translate Python source, into an AST, and then into an S-expression representation of the language. (Perhaps longer-term, the project may warrant having its own lexer/parser to generate ASTs.) I expect to create a Python script to perform this rough translation first.
  2. The first real order of business will be to implement all the AST nodes as Lisp in a require-able runtime. Many of the nodes will be straightforward to write as macros, but I expect that ClassDef will require some careful work with MOP and Assign/Delete/Name will require some overhead to track scope. Some of the other nodes I’ll likely just punt to the next step. (E.g., Import)
  3. The next step will be creating the built-in types and functions (including the aforementioned __import__) that most Python programs rely on.
  4. Once we can run basic (import-less) programs, we’ll need to re-implement the part of the standard library that does not have a pure-Python equivalent. I imagine some of these will take longer (e.g. _socket) than others (e.g. math).
  5. Figure out how to use C modules

Throughout all of this, I hope to learn all the nooks and crannies of Python and understand their design decisions with the language, while increasing my familiarity with CL.