Generators
30 Nov 2014I initially thought of implementing generators using either
cl-cont
or
Paul Graham’s continuation-passing macros,
but I no longer think this strategy is necessary or appropriate. Using
continuations to implement generators is unnecessary because
generators do not need to store the entire call chain. Using
continuations is inappropriate because both implementations transform
code into continuation-passing style, which can cause problems with
implementations that don’t properly optimize away tail calls.1
Before describing a different implementation of Python generators, I’ll attempt to describe the main problems we need to solve:
- Generators need to store the values of local variables when
yield
causes them to suspend. - Generators must be able to resume execution at an arbitrary point in the code body. This requirement includes jumping into loops and if branches
- Generators need to store intermediate calculation values when
yield
is used as a sub-expression - Generators must support the two coroutine methods,
throw
andsend
. - Generators must allow yielding from
with
andtry
statements. Generators must ensure proper release of resources when garbage collected.
1. Local variables
The first problem is similar to the upward funarg problem in implementing closures. In fact, since Python’s local variables are statically scoped2, we can “lift” these variables out of the generator scope and simply let Lisp handle the “local” variables for us. For example, say we wanted to implement the following Python generator:
def fib_gen():
fib = 1
prev = 0
n = 0
while True:
fib, prev = fib + prev, fib
n += 1
yield (fib, n)
# Yes, we're skipping the zero-th Fibonacci number
We could implement a similar concept in Lisp using let over lambda as follows:
(defun fib-gen ()
(let ((fib 1) (prev 0) (n 0))
(lambda ()
(psetq fib (+ fib prev)
prev fib)
(incf n)
(values fib n))))
Instead of a generator with a next()
method, our lisp code returns a
closure, but its behavior is otherwise identical.
2. Resuming execution
Since we’re only implementing generators for our Python DSL, we only
need to worry about a few control-flow statements. With
the exception of try
/finally
and the with
statement, all the
statements (if
, for
, while
, break
) can be compiled into a
tagbody
and go
forms (yes, I’m going there…). We can therefore
wrap the generator body in a tagbody
and expand the control-flow
statements into appropriate tags and go
calls. By appropriately
tagging yield
statements, we can simply go
to the right place in
the generator body when we need to resume.
Let’s translate another Python generator into a Lisp closure:
def xrange(end):
i = 0
while i < end:
yield i
i += 1
In Lisp, this could be:
(defun xrange (end)
(let ((i 0) (status 'not-started))
(lambda ()
(block gen
(tagbody
(case status
('not-started nil)
('started (go yield-restart))
('ended (go while-end)))
(setq status 'started)
while-start
(unless (< i end) (go while-end))
(return-from gen i)
yield-restart
(incf i)
(go while-start)
while-end
(setq status 'ended)
(error "generator ended"))))))
Though the tagbody results in ugly, difficult to read code, it gives
us the requisite flexibility to jump nearly anywhere in the generator
body. By extending status
and the case statement as necessary, we
can support multiple yield
expressions in the body of a generator.
3. Intermediate calculations
tagbody
doesn’t quite let us jump anywhere in the body of a
generator. For example, we can’t pause the evaluation of arguments to
a function before calling the function. E.g., in f(expr(),
(yield))
, the generator must store the value of expr()
before
suspending, and must use the value to call f
when it resumes.
The technique to deal with this case is to convert the intermediate calculations into explicit (gensym-ed) variables, inserting the label between the hidden assignment and the function call.3 Let’s see an example:
def update_items(owner):
for item in get_items_from_db(owner):
update(item.get_name(),
(yield item), # Allow caller to modify item
now())
In Lisp this generator could be
(defun update-items (owner)
(let (item temp-1 temp-2 (status 'not-started))
(lambda (val)
(block gen
(tagbody
(case status
('not-started nil)
('started (go yield-restart))
('ended (go end)))
(setq status 'started)
(setq temp-1 (get-items-from-db owner))
for-begin
(handler-case
(setq item (funcall temp-1)) ; item = next(temp_1)
(error () (go for-end)))
(setq temp-2 (get-name item))
(return-from gen item)
yield-restart
(update temp-2 val (now)) ; val has the value sent in
(go for-begin)
for-end
(setq status 'ended)
end
(error "Generator finished"))))))
4. Throw and send
Our last example implemented a send
method. A full generator
implementation could use a message passing mechanism to signal the
closure whether the call was a next
, send
, or throw
and pass in
the value or exception, if any.
5. Release of resources
Edit: The original version of this post glossed over the
try
,finally
, andwith
implementations. I will expand this section over the next couple of days to address this non-trivial point.
Unlike for
and while
, try...finally
(and its sibling, with
),
cannot be emptied out into an enclosing tagbody
, since in Lisp they
ultimately need to be wrapped in an unwind-protect
.
The Python documentation specifies that the __del__
method of the
generator calls its close
method, ensuring that any finally
clauses and with
blocks get resolved properly. Implementing the
close
method as a wrapper around throw
is straightforward; the
difficulty lies in ensuring the __del__
method gets called. This
difficulty extends beyond just generators to any object that defines
custom finalization code and will be addressed separately.
Implementation
I am still working on the translation tool to make all these changes automatically. One key problem is expanding a single form (with the AST for a loop) into multiple forms right in the tagbody. I’ll post the code and a discussion of some of these issues when they’re resolved.
Footnotes
-
I think most implementations support this optimization. See this post by Marc Simpson for more details. Nonetheless, it seems dangerous to depend on such a feature for proper support. ↩
-
By static I mean that local variables are determined at compile time rather than run time, and result in the use of the
LOAD_FAST
opcode. ↩ -
Lambda: The Ultimate Imperative, by Steele and Sussman, has a good discussion of evaluation order (see the endnote “evalorder”). Technically, we only need to temporarily store expressions that don’t evaluate trivially, though in this case not all variables evaluate trivially (e.g., module-level variables can change values between yielding and resuming). ↩