Asyncio Implementation Overview
===============================

I've been learning the new-in-Python-3.4 :mod:`asyncio` module recently, since
I want to employ it in a project.  I started reading the docs, and after
reading a bit about the ``EventLoop`` I clicked through to the chapter on
coroutines, ``Future`` and ``Task``.  And got rather confused.

After poking around for a while, reading other articles on asyncio, talking to
one of the developers, and looking through the source code, I'm pretty sure
I've figured out how it works, and what roles coroutines, ``Future`` and
``Task`` play.  Someone who knows asyncio who reviewed this article briefly
commented that it was way too long and complex, that the concepts really
should be simple.  I think if I were aiming to explain how to *use*
:mod:`asyncio` I wouldn't have an article this long (indeed, some of the
ones I read were quite short).  But what I wanted, as an experienced
Python programmer new to both :mod:`asyncio` in particular and async
programming in general, was an explanation of how it *worked*, and what
role these various classes actually played in making it work.

I did not find any articles that explained things at this level (that doesn't
meant they don't exist, I just didn't find one), so I wrote one in order to
solidify my understanding.  And, indeed, I don't feel my understanding was
complete until I *finished* writing the article.

So, on to my (hopefully correct) explanation of how :mod:`asyncio` works.

The most fundamental building block of asyncio is the concept of the
:class:`~asyncio.Future`.  This is similar to
:class:`concurrent.futures.Future`, but adapted so that it works with the
second most fundamental component of asyncio, the :ref:`EventLoop
<asyncio-event-loop>` [1]_.

Conceptually a ``Future`` object is really very simple.  It is a holder for
(eventually) a result or exception, and also for a list of callbacks to be
called when it is "done" (that is, when there is a result, an exception, or the
``Future`` has been canceled).

Conceptually, the ``EventLoop`` is also very simple: each time through the
loop, it calls any callback in the list of 'ready' callbacks (the ``call_soon``
list), and then uses a :mod:`selector <selectors>` to wait either for the next
pending IO operation to complete or the time for the next scheduled task to
arrive, at which point it adds the callback that will handle the event to the
``call_soon`` list and starts a new loop iteration.

An asyncio program can be written in "callback" style using just these two
components: ``Future`` objects are used for signalling by attaching callbacks
to be scheduled for execution by the ``EventLoop`` when the ``Future``'s
``set_result`` method is called (or some other call is made that marks the
``Future`` as "done").  Other callbacks are scheduled with the ``EventLoop`` to
handle IO events and to run scheduled tasks, and when these callbacks run they
call the appropriate methods on the appropriate ``Future``\ s to mark them as
"done" and therefore trigger the ``Future``'s callbacks to run.

The power of asyncio programming, however, comes from two additional
components: :ref:`coroutine` and :class:`~asyncio.Task`\ s.  These two
components tie ``Futures`` and the ``EventLoop`` into a system that allows one
to write procedural-looking code that, under the hood, is async code.

Note: the following discussion simplifies certain advanced details of how
coroutines work (and that I currently don't understand :) in order to make the
fundamental mechanisms clearer.

The nature of a coroutine is that it is a Python generator function that uses
only [2]_ :ref:`yield from <yieldexpr>`.  When writing code using asyncio,
instead of calling a function using python function call syntax and obtaining a
result:

    res = normal_function()

you use ``yield from``:

    res = yield from async_function()

In the above snippet, ``async_function`` is a function that returns either
a ``Future`` or a ``coroutine``.

A ``Task`` is, itself, a ``Future``, and it wraps a ``coroutine`` (or another
``Future``, but there's no reason to do that).  When a ``Task`` is created, it
adds a callback to the ``EventLoop``'s ``call_soon`` queue that starts the
iteration of the ``coroutine`` it is wrapping.  That is, it arranges to call
:keyword:`next` on the ``coroutine``.  The result of that call to ``next`` has
one of three valid results:  a ``Future``,  a :exc:`StopIteration` exception
with a value, or some other exception.

If it is an exception, the ``Task`` schedules a ``call_soon`` callback with the
``EventLoop`` that, on the next pass through the loop, will :keyword:`throw`
the exception into the ``coroutine``.  This means that the exception will be
raised at the point where the (innermost) ``yield from`` call was made. 

If the result is a ``Future``, the ``Task`` schedules a callback on the
``Future`` to call the ``Task`` when the ``Future`` has completed.  When some
other thread of control eventually causes the ``Future`` to move to the "done"
state, the ``Future`` will schedule that callback to run.  That callback in
turn will schedule another ``call_soon`` callback that will call ``next`` on
the ``coroutine``.

If the result is a ``StopIteration`` exception, the ``Task`` sets the value
associated with the exception (which will be what the wrapped ``coroutine``
specified in its :keyword:`return` statement) as its result via ``set_result``
(remember, the ``Task`` is a ``Future``).

All ``coroutines`` make calls to other ``coroutines`` and ``Future``\ s using
``yield from``.  What ``yield from`` does is to iterate over the object passed
to it and ``yield`` each result in turn.  If we call ``yield from`` on a
generator, and that generator in turn calls ``yield from``, the values from the
inner iterator will be yielded as values from the outer ``yield from``.  Since
coroutines only call ``yield from`` on other coroutines or on ``Futures``, this
means that when a ``Task`` callback calls :keyword:`next` on the ``coroutine``
it wraps what it gets back is a ``Future``, and it then schedules a callback on
the ``Future`` and control returns to the ``EventLoop``.  Control thus returns
to the ``EventLoop`` after *each* iteration of the *innermost* iterator in the
``coroutine`` call chain, no matter how deeply nested in a chain of ``yeild
from``\ s that ``Future`` was.

When a ``Future`` completes, it schedules the callback provided by the ``Task``
that wraps the ``coroutine`` that was at the top of the chain of ``yield
from``\ s that resulted in ``yield from`` being called on that ``Future``, and
then it executes a :keyword:`return` statement, passing ``return`` the value
that was set on the ``Future`` via ``set_result``.  The ``Future``-scheduled
callback (provided by the ``Task`` that wraps the top level ``coroutine``)
schedules another callback that will make another call to ``next`` on the
``coroutine``.  That causes all of the ``yield from``\ s in the chain to
request the next value, which for the innermost ``yield from`` will cause the
``coroutine`` that executed it to obtain the value :keyword:`return`\ ed by the
``Future``, and that ``coroutine`` will continue execution with value in hand.
When that lowest level ``coroutine`` itself reaches its end and returns a
value, the ``yield from`` that called it returns that value and the next higher
``coroutine``, that executed that ``yield from``, will continue execution with
value in hand.  And so on until the top level ``coroutine`` completes and
returns the value that becomes the value of the ``Future`` that is the
``Task``.

To summarise at a slightly higher level, the overall flow in an asyncio program
is that we execute procedural style code, and every time we get to a ``yield
from`` statement the execution of that procedural code is suspended.  This may
go on for several levels of ``yield from`` call, but eventually a ``Future``
will be yielded and make its way back up to the ``Task``, and we will start a
new pass through the ``EventLoop``.  The ``EventLoop`` will then run any
``call_soon`` callbacks.  When all ``call_soon`` callbacks have run, the
``EventLoop`` uses a :mod:`selector <selectors>` to wait for the next IO event
or the next callback that was scheduled to run at a specific time.  Those IO or
timed events will provide values that will be set on certain ``Future``
objects, which will trigger the scheduling of ``call_soon`` callbacks which
will in turn cause the ``corouties`` that were waiting for those ``Futures`` to
be scheduled via ``call_soon`` to have ``next`` called on them and thus get
another chance to run.  This continues until all ``Futures`` are complete,
including the ``Task`` or ``Task``\ s that the main ``EventLoop`` is waiting
for (or the ``EventLoop`` is explicitly shut down).

From the point of view of the ``coroutine``, this looks like procedural code:
the ``coroutine`` (using ``yield from``) calls a subroutine, gets back a value,
and continues on with its computations.  When you write the ``coroutine`` you
don't (for the most part) have to worry about the fact that there is an
uncertain amount of time that will elapse between the ``yield from`` call and
the acquisition of the result.

You do, of course, have to be cognizant of the potential for deadlocks and the
mutation of shared data by other ``coroutines``, just as you would in any
programming involving multitasking.  However, in async code, you do *not* have
to worry about *simultaneous* modification of shared data:  the other code can
*only* execute when you call ``yield from`` [3]_.

And there you have it.  Using this "one cool trick" (``yield from``) we can
write async code as if it was procedural code.

.. [1] In fact, the asyncio ``EventLoop`` is plugable.  There are many different
   event loops that can be used, including third party loops such as Twisted.
   It is the concept of the EventLoop that is fundamental.

.. [2] A ``coroutine`` can also use a bare ``yield`` statement, which will
   yield control to the ``EventLoop`` but schedule the next iteration of the
   ``coroutine`` as ``call_soon``.  This is a way to cooperatively yield
   control during what might otherwise be a cycle-stealing long computation.

.. [3] Unless you are using the threading support to handle blocking calls.