30 Nov 2014
I initially thought of implementing generators using either
cl-cont
or
Paul Graham’s continuation-passing macros,
but I no longer think this strategy is necessary or appropriate. Using
continuations to implement generators is unnecessary because
generators do not need to store the entire call chain. Using
continuations is inappropriate because both implementations transform
code into continuation-passing style, which can cause problems with
implementations that don’t properly optimize away tail calls.
Before describing a different implementation of Python generators,
I’ll attempt to describe the main problems we need to solve:
- Generators need to store the values of local variables when
yield
causes them to suspend.
- Generators must be able to resume execution at an arbitrary point
in the code body. This requirement includes jumping into loops and
if branches
- Generators need to store intermediate calculation values when
yield
is used as a sub-expression
- Generators must support the two coroutine methods,
throw
and
send
.
- Generators must allow yielding from
with
and try
statements. Generators must ensure proper release of resources when
garbage collected.
1. Local variables
The first problem is similar to the upward funarg problem in
implementing closures. In fact, since Python’s local variables are
statically scoped, we can “lift” these variables out of
the generator scope and simply let Lisp handle the “local” variables
for us. For example, say we wanted to implement the following Python
generator:
def fib_gen():
fib = 1
prev = 0
n = 0
while True:
fib, prev = fib + prev, fib
n += 1
yield (fib, n)
# Yes, we're skipping the zero-th Fibonacci number
We could implement a similar concept in Lisp using
let over lambda as follows:
(defun fib-gen ()
(let ((fib 1) (prev 0) (n 0))
(lambda ()
(psetq fib (+ fib prev)
prev fib)
(incf n)
(values fib n))))
Instead of a generator with a next()
method, our lisp code returns a
closure, but its behavior is otherwise identical.
2. Resuming execution
Since we’re only implementing generators for our Python DSL, we only
need to worry about a few control-flow statements. With
the exception of try
/finally
and the with
statement, all the
statements (if
, for
, while
, break
) can be compiled into a
tagbody
and go
forms (yes, I’m going there…). We can therefore
wrap the generator body in a tagbody
and expand the control-flow
statements into appropriate tags and go
calls. By appropriately
tagging yield
statements, we can simply go
to the right place in
the generator body when we need to resume.
Let’s translate another Python generator into a Lisp closure:
def xrange(end):
i = 0
while i < end:
yield i
i += 1
In Lisp, this could be:
(defun xrange (end)
(let ((i 0) (status 'not-started))
(lambda ()
(block gen
(tagbody
(case status
('not-started nil)
('started (go yield-restart))
('ended (go while-end)))
(setq status 'started)
while-start
(unless (< i end) (go while-end))
(return-from gen i)
yield-restart
(incf i)
(go while-start)
while-end
(setq status 'ended)
(error "generator ended"))))))
Though the tagbody results in ugly, difficult to read code, it gives
us the requisite flexibility to jump nearly anywhere in the generator
body. By extending status
and the case statement as necessary, we
can support multiple yield
expressions in the body of a generator.
tagbody
doesn’t quite let us jump anywhere in the body of a
generator. For example, we can’t pause the evaluation of arguments to
a function before calling the function. E.g., in f(expr(),
(yield))
, the generator must store the value of expr()
before
suspending, and must use the value to call f
when it resumes.
The technique to deal with this case is to convert the intermediate
calculations into explicit (gensym-ed) variables, inserting the label
between the hidden assignment and the function call. Let’s
see an example:
def update_items(owner):
for item in get_items_from_db(owner):
update(item.get_name(),
(yield item), # Allow caller to modify item
now())
In Lisp this generator could be
(defun update-items (owner)
(let (item temp-1 temp-2 (status 'not-started))
(lambda (val)
(block gen
(tagbody
(case status
('not-started nil)
('started (go yield-restart))
('ended (go end)))
(setq status 'started)
(setq temp-1 (get-items-from-db owner))
for-begin
(handler-case
(setq item (funcall temp-1)) ; item = next(temp_1)
(error () (go for-end)))
(setq temp-2 (get-name item))
(return-from gen item)
yield-restart
(update temp-2 val (now)) ; val has the value sent in
(go for-begin)
for-end
(setq status 'ended)
end
(error "Generator finished"))))))
4. Throw and send
Our last example implemented a send
method. A full generator
implementation could use a message passing mechanism to signal the
closure whether the call was a next
, send
, or throw
and pass in
the value or exception, if any.
5. Release of resources
Edit: The original version of this post glossed over the try
,
finally
, and with
implementations. I will expand this section over
the next couple of days to address this non-trivial point.
Unlike for
and while
, try...finally
(and its sibling, with
),
cannot be emptied out into an enclosing tagbody
, since in Lisp they
ultimately need to be wrapped in an unwind-protect
.
The Python documentation specifies that the __del__
method of the
generator calls its close
method, ensuring that any finally
clauses and with
blocks get resolved properly. Implementing the
close
method as a wrapper around throw
is straightforward; the
difficulty lies in ensuring the __del__
method gets called. This
difficulty extends beyond just generators to any object that defines
custom finalization code and will be addressed separately.
Implementation
I am still working on the translation tool to make all these changes
automatically. One key problem is expanding a single form (with the
AST for a loop) into multiple forms right in the tagbody. I’ll post
the code and a discussion of some of these issues when they’re resolved.
12 Nov 2014
As I started to implement some of the nodes in our Python DSL, I
realized there were a few elementary operations like attribute access
and assignment, exception raising, function calling and object
creation that I kept needing to perform at the Python level. Since
these operations are core to the language, they are highly
interdependent, and we can’t just implement them sequentially. For
example, to implement function calling, we need to be able to raise
exceptions for invalid arguments; to raise exceptions we need to be
able to create them; and to create them we need to be able to call
their class.
To get around this circularity, we will first specify the interface
that these operations will obey and then we can attack these in any
order we like.
Attribute access
Attribute access will be provided through a single function:
(defun getattr (obj &rest attrs))
We allow multiple attributes to be given in case of nested attribute
access. Thus o.attr
in Python translates to (getattr o attr)
and
o.attr1.attr2
translates into (getattr o attr1 attr2)
.
getattr
will define a generalized variable so that assignment and
deletion work transparently (they way it does in Python).
Assignment
Contrary to Lisp, Python variables default to the innermost lexical
scope, unless they are declared nonlocal
or global
. Managing this
difference will be the responsibility of our assign
macro:
(defmacro assign (target value))
Since Python has it’s own generalized variables (limited to attribute
and subscript access), assign will play a role similar to setf
in
Lisp. Note that for simplicity assign
will not destructure target
if it is a Python tuple or list, and will instead raise an error. If
it becomes necessary, we can provide a py-destructuring-bind
macro
on top of assign.
New functions
In Python, we can create functions using either the def
statement or
the lambda
expression. Since def
creates function bindings in the
local scope (contrary to Lisp’s defun
), def
can be thought of as a
multi-expression lambda
together with an assignment. It will be
convenient for us to separate the function creation from the function
assignment (for decorators, binding methods, and possibly more). Our
primary means of creating Python functions will be using
pylambda
:
(defmacro pylambda ((&optional args starargs kwargs starkwargs)
&body body))
The slightly strange signature is designed to allow replicating
Python’s signature unambiguously.
(pylambda ((x y (z 3))
args
(kw1 (kw2 2))
kwargs))
is equivalent to
lambda x, y, z=3, *args, kw1, kw2=2, **kwargs: None
Unlike lambda
in Python, however, pylambda
is not restricted to a
single Python expression. Thus def
can be translated into a
pylambda
combined with an assignment, which we wrap as:
(defmacro pydef (name
(&optional (&rest args) starargs (&rest kwargs) starkwargs)
&body body))
Calling callables
Until we decide on the exact structure of Python objects in general,
and functions in particular, it will be good to have an easily
available hook we can use to wrap/unwrap objects, rearrange arguments,
etc. To provide this hook to our program, we’ll use a function to call
our Python callable objects:
(defun pycall (func &optional args starargs kwargs starkwargs))
pycall
is equivalent to the pseudo-Python:
func(args[0], args[1], ... args[n],
*starargs,
kw.keys[0]=kw.vals[0], kw.keys[1]=kw.vals[1], ... kw.keys[m]=kw.vals[m],
**starkwargs)
In other words, positional arguments are passed as a (Lisp) list in
args
and keyword arguments are in kwargs
(as an alist), while
starargs
and starkwargs
handle the *
and **
arguments.
Exceptions
For the core language features, we’ll only need to be able to raise
built-in exceptions. For now we’ll only define a function for doing so
easily, leaving the more general raise
statement for later.
(defun raise-builtin (class-name &key args kwargs))
raise-builtin
accepts a class name, given as a (case-sensitive)
keyword (e.g., :|TypeError|
), a (Lisp) list of positional arguments,
and an alist of keyword arguments. The arguments are passed to the
class constructor to create the instance that will be raised.
In order to handle exceptions, we’ll draw a similar distinction
between builtin and user exceptions. The following macro lets us
handle builtin exceptions:
(defmacro py-builtin-try (try-body &rest except-finally-clauses))
except-finally-clauses
must have the form (exception (&optional
var) body)
, where exception
is a keyword that could be given to
raise-builtin
, or t
to indicate a finally clause. var
is the
symbol that will be bound to the exception instance in the execution
of body
. There can only be zero or one finally clauses, and if there
is one, it must be the last clause in the macro call. Inside an
exception handler body, (raise)
may be used to re-raise the
exception in the same context.
New Python classes
Creating new Python classes does not require any new code, since we
can use the three argument form of type
to create the class and
assign
to place it in the local namespace. It will be convenient,
however, to provide a shortcut that takes Lisp objects for its
arguments:
(defmacro pyclass (name (&rest bases) &optional dict))
pyclass
creates a new Python class named name
with base-classes
given by bases
(a Lisp list). If the optional argument dict
(a
hash-table
or alist
) is given, it will fill the class
namespace. This function is roughly equivalent to name = type(name,
bases, dict)
28 Oct 2014
Function calls
I think the most efficient way of handling function calls in Python
will be to translate them into function calls in Lisp. The main
difficulties will be:
- Argument parsing
This PEP describes
how Python parses function calls and binds arguments to their
values. I think with some care, much of this functionality can be
replicated using macros.
- Scope This PEP
describes how Python creates scope. Since Python uses lexical
scoping, we can translate these rules efficiently into Lisp
let
s
(though likely using setq
more often than in traditional Lisp
code). Importantly, we can determine at compile-time which names are
bound in which blocks.
- Returning values Translated code will likely use Lisp’s
return-from
operator much more than idiomatic Lisp, and will
similarly have to take care to return None
when the Python
function doesn’t provide a return value
Exception handling
Python’s try: ... except:...
blocks should be readily translatable
into signals and handlers in Lisp. Since we’re implementing Python
functions as Lisp functions, in principle the Python stack will be (a
subsegment of) the Lisp stack. However, to allow introspection and
keep track of the Python-specific
stack frames,
it will likely be necessary to keep a parallel stack, which will
likely take the form of a dynamic variable, *current-frame*
. We will
also be able to create traceback
objects from this parallel stack.
Classes and types
Python classes will be translated into CLOS classes. While I don’t
expect this translation will be an easy task, I think CLOS + MOP is
sufficiently flexible to allow an efficient translation to be
generated. Initially though, classes will be implemented as CLOS
objects (of a particular a Lisp class py-class
), and instances will
similarly be implemented as CLOS instances (of py-object
super-class
of py-class
). Class features (attribute resolution, metaclasses,
etc.) would be implemented manually (i.e., outside the MOP
scheme). Practically speaking, a Python class
statement would be
translated into a (make-instance 'py-class)
Lisp form, with some
initializing of the class.
Longer-term, a more efficient translation would likely implement
Python classes as different CLOS classes, using MOP machinery to
ensure attribute resolution, metaclasses, etc. work the same way as in
Python. I.e., translating a Python class
statement into a Lisp
defclass
form.
Built-in types
Many of the built-in types that get used a lot in Python code will be
implemented with the Lisp built-in types, and thus will not inherit
from py-object
. We’ll therefore have to special-case some of the
syntax-related functions (like getattr
) to deal with these objects,
but it will make it easier for Lisp code to call Python code. I think
that added simplicity for client code is worth the complexity in the
translation code.
Numeric types
Python numbers have easy analogues in Lisp types:
Python type |
Lisp type |
int |
integer |
float |
double-float |
complex |
complex |
Strings, bytes, bytearray, memoryview
Similar to what we did when defining the Python DSL, we’ll represent
bytes
and bytearray
in our runtime as vectors, only for
efficiency we’ll use the type (unsigned-byte 8)
. We’ll have to
manually impose the immutability of bytes
.
Python str
objects are a little tricky to implement since Common
Lisp does not require supporting any characters beyond the
96 standard characters. To
use native functions wherever possible, we’ll likely have to use a
structure similar to
the one CPython uses.
Strings whose characters all fall within the implementation’s
definition of character
are then represented with native strings,
and all other strings are represented with vectors of type
(unsigned-byte x)
for x
in {8, 16, 32}
.
As for memoryview
, the whole issue of the C API is something I want
to punt on for now, but if the need arose I could see implementing
this class for the built-in types.
List
Per
the docs
Python’s list
s are implemented as variable-length arrays, so we will
do the same. We’ll leave the “cleverness” behind the resizing of the
array to Lisp for now though.
Tuple
Tuples will also be arrays, with the immutability imposed by our code.
Dict
Per
the same docs,
Python’s dict
s are implemented as hash tables, which we will also
do. In the future, it might make sense to mimic Python’s
implementation of the hashing algorithms and hash-table access.
dict_keys, dict_items, dict_values
These will likely be full-blown objects to allow for their “live” state.
Generators
I think the most logical way to implement these is as wrappers to
continuations. I’ll have to explore the different continuations
libraries (or go look at
Paul Graham’s On Lisp for
his “fake” implementation).
File objects
These will be wrappers around Lisp file streams, with Python’s open
getting translated into an appropriate call to Lisp’s open
.
Other
Sets and frozensets
Initially, as Python did, these could be hashtables with dummy
values. I imagine there are small-enough implementations of these out
there that including (via copy-paste-adapt) one in py-lisp
would not
be too difficult.
Range
These will be full-blown objects
24 Oct 2014
Creating a DSL to express Python code in Lisp
As I stated in the first post, the first step in our workflow will be
to translate Python code into an abstract syntax tree (AST), using
Python itself, and then dump the tree into a Lisp form.
The Python docs aren’t
great, but
Green Tree Snakes
has a great reference with a list of all the nodes in Python’s
abstract grammar and a description of what they’re for.
Another way of obtaining a list of all the node types is as follows:
>>> import ast
>>> import inspect
>>> for name, c in inspect.getmembers(ast, inspect.isclass):
>>> if issubclass(c, ast.AST):
>>> print(name)
Add
And
Assert
# <snip>
unaryop
withitem
Dumping the AST to a Lisp string
There are a lot of nodes, so at this stage rather than manually decide
how to represent every node, we’ll dump the AST tree into generic
s-expression with minimal Python-side processing.
The first thing we’ll want to define is a target format. Since we want
to do minimal Python processing, we’ll use simple cons trees: Nodes
will follow the pattern (node-name sub-node-1 sub-node-2
... sub-node-n)
and their literal attributes will be represented by
Lisp literal values.
For example, we’ll want to say
translate(ast.parse("def f(arg1):\n pass"))
and have it return a string containing the following form:
(|py-Module|
((|py-FunctionDef| "f"
(|py-arguments| ((|py-arg| "arg1" |None|))
|None|
()
()
|None|
())
((|py-Pass|))
()
|None|)))
(of course, we won’t expect the translator to pretty print the form :)
We can use a simple recursive function to handle most cases:
def translate(node_or_literal):
"""Convert an AST node or Python literal into a Lisp form (recursively)."""
if isinstance(node_or_literal, ast.AST):
symbol = "|py-%s|" % node_or_literal.__class__.__name__
if not node_or_literal._fields: # this is a leaf node
return "(%s)" % symbol
args = " ".join(translate(sub_nl)
for _, sub_nl in ast.iter_fields(node_or_literal))
return "(%s %s)" % (symbol, args)
return translate_literal(node_or_literal)
We’re punting on literal translation for a minute to momentarily savor
this quick success.
Translating AST literals into Lisp
So far we’ve handled half of the translation we’d described; now we
are going to translate Python literals. Per the official docs, the
Python abstract grammar has six builtin types which can end up
becoming literals in the AST:
Grammar type |
AST attribute |
Lisp type |
identifier |
str |
simple-vector |
int |
int |
integer |
string |
str |
simple-vector |
bytes |
bytes |
simple-vector |
object |
Used for ast.Num , where it’s used to store a number |
number |
singleton |
Used for ast.NameConstant , holds True , False , or None |
t , nil , or |None| |
* |
There’s an implied seventh type list , which is indicated in the grammar by the use of * |
list |
Of all the attributes possible, only None
is not built into Lisp as
a literal; we can leave it as a symbol for now and define it as an
object later on. We can represent bytes
as literal vectors of
fixnum
s using the #()
notation.
It’s worth noting that these choices for literals don’t have to
correspond to our run-time (or even compiled) representations for
objects. We can make forms like |Bytes|
or |Str|
generate a better
represenation in Lisp, and keep our Python code simple for now.
Unicode and Python strings
It may seem strange to not use Lisp string
s to represent Python
str
objects. Although Lisp has a string
type, unicode support
across implementations is not great (and not guaranteed by the
standard), so we need a simple, portable represenation for the
translated source.
One such representation for identifier
and string
literals is to
encode them as big-endian UTF-32
and then represent them like we
represent bytes
. Because the grammar doesn’t contain any places that
accept more than one of string
, identifier
, or bytes
, we’ll be
able to convert the #()
vector into its run-time represenation once
we decide what that is and how to implement our DSL.
Representing floats properly
The Python data model
explicitly states that numbers.Real
leave you “at the mercy of the
underlying machine architecture for the accepted range and handling of
overflow.” The float
class (which is a different class altogether),
states
that they “are usually implemented using double
in C.”
Unfortunately,
the Common Lisp spec
does not specify standard float sizes (though it does state
minimums), so we are not necessarily able to represent the same set of
floats in Python and Lisp. The best we can do is output float literals
using repr
, which ensures that float(repr(x)) == x
for all floats
x
other than inf
, -inf
, and NaN
.
Another issue we’ll have to deal with later is representing the IEEE
754
special values inf
, -inf
, NaN
, which Common Lisp does not
require, and properly handling overflow calculations. Since Python
doesn’t have inf
and nan
literals,
we don’t have to worry about these problems (yet).
Our translation code for literals can therefore be specified as:
def translate_literal(literal):
"Translate a Python literal into a Lisp literal."
if isinstance(literal, str):
return translate_literal(literal.encode("utf-32-be"))
elif isinstance(literal, bytes):
return "#(%s)" % " ".join(map(str, literal))
elif literal is None:
return "|None|"
elif isinstance(literal, bool):
return "t" if literal else "nil"
elif isinstance(literal, list):
return "(%s)" % " ".join(map(translate, literal))
elif isinstance(literal, complex):
return "#C(%r %r)" % (literal.real, literal.imag)
else: # Should be an integer or float
return repr(literal)
Where to go from here?
Our implementation of the translation code has determined a first
version of our DSL. From here, we’ll begin implementing macros and
functions so our translated code can run! I expect there will be some
node types we’ll go back and special-case in Python, where different
forms may be easier to work with in Lisp. Of course, we’ll also have
to design our (first version of the) data model to fully implement
many of the nodes here.
21 Oct 2014
Project aim
Simply put, I want to bring the vast number of Python libraries to the
Common Lisp world. I want to give a better answer to
this Stack Overflow question.
The process I envision would operate as follows:
- Python community develops awesome library
pyX
to do X
.
- Avid Lisper downloads
pyX
and runs the source through a Python
script that generates Lisp code.
- Said Lisper writes a thin wrapper around the generated code,
packages the code as library
CLX
and puts it up on Quicklisp or
similar.
- A different (possibly less avid) Lisper without access to a Python
interpreter and needing to do
X
downloads CLX
along with its
dependency pylisp
. She happily peforms X
in Lisp, no Python
necessary.
Why this project?
In my experience, Python has many more and better libraries than
Common Lisp. I have many times wanted to work on certain projects that
I think would be well suited for CL, only to discover that there’s no
robust (for example) PDF-parsing library.
Why this structure?
The way I see it, there are fundamentally three ways of bridging the
Python-Lisp gap:
1. Embed Common Lisp in Python
The example I first came across is
Peter Norvig’s, but the closest thing
I’ve found is Hy
, which is more of
a Python dialect written in S-expressions (and freaking awesome,
too!).
I suspect that doing this properly would be a massive undertaking,
amounting to a full re-write of CL in Python.
2. Embed Python in Lisp
There is cl-python
, which
targets compatibility with Python 2.7, but it is unfortunately no
longer maintained. This seems to me like the closest anyone has come
to bridging the two languages. It installs with quicklisp, but the
test suite does not pass on my machine. (There are a bunch of
“Expected test failure for … did not occur.”; I may dig into the
source to investigate what is going on.) At any rate, this project
seems more ambitious than my own, since it provides a means of calling
Lisp from Python as well.
This is my favored approach. The Python ast
has relatively few
nodes, most of which should be relatively straightforward to
implement. I expect that classes may require some MOP trickery to get
right and function calls/proper scoping of variables may require
special care. Some of the introspection machinery in particular seems
difficult to reproduce. The oft-pointed to
similarity between Python and Lisp
is, I think, quite-apparent at this level.
3. Provide an FFI
I’m not sure, but I think
burgled-batteries
is the top choice here, though it’s been 3 years since the last
commit. They list Pyffi
(appears to
only support Python up to 2.5), cl-python
, and
Python-on-lisp
(dead since 06).
I think this is a tremendously elegant solution which basically
reduces your requirements down to a DLL. Despite having written
several Python C-extensions, I still find its C API a bit daunting.
Project plan
- Initially, I will rely on a Python interpreter to translate Python
source, into an
AST
, and
then into an S-expression representation of the language. (Perhaps
longer-term, the project may warrant having its own lexer/parser to
generate ASTs.) I expect to create a Python script to perform this
rough translation first.
- The first real order of business will be to implement all the AST
nodes as Lisp in a
require
-able runtime. Many of the nodes will be
straightforward to write as macros, but I expect that ClassDef
will require some careful work with MOP and
Assign
/Delete
/Name
will require some overhead to track
scope. Some of the other nodes I’ll likely just punt to the next
step. (E.g., Import
)
- The next step will be creating the
built-in types and functions
(including the aforementioned
__import__
) that most Python
programs rely on.
- Once we can run basic (
import
-less) programs, we’ll need to
re-implement the part of the standard library that does not have a
pure-Python equivalent. I imagine some of these will take longer
(e.g. _socket
) than others (e.g. math
).
- Figure out how to use C modules
Throughout all of this, I hope to learn all the nooks and crannies
of Python and understand their design decisions with the language,
while increasing my familiarity with CL.