Asyncio Implementation Overview =============================== I've been learning the new-in-Python-3.4 :mod:`asyncio` module recently, since I want to employ it in a project. I started reading the docs, and after reading a bit about the ``EventLoop`` I clicked through to the chapter on coroutines, ``Future`` and ``Task``. And got rather confused. After poking around for a while, reading other articles on asyncio, talking to one of the developers, and looking through the source code, I'm pretty sure I've figured out how it works, and what roles coroutines, ``Future`` and ``Task`` play. Someone who knows asyncio who reviewed this article briefly commented that it was way too long and complex, that the concepts really should be simple. I think if I were aiming to explain how to *use* :mod:`asyncio` I wouldn't have an article this long (indeed, some of the ones I read were quite short). But what I wanted, as an experienced Python programmer new to both :mod:`asyncio` in particular and async programming in general, was an explanation of how it *worked*, and what role these various classes actually played in making it work. I did not find any articles that explained things at this level (that doesn't meant they don't exist, I just didn't find one), so I wrote one in order to solidify my understanding. And, indeed, I don't feel my understanding was complete until I *finished* writing the article. So, on to my (hopefully correct) explanation of how :mod:`asyncio` works. The most fundamental building block of asyncio is the concept of the :class:`~asyncio.Future`. This is similar to :class:`concurrent.futures.Future`, but adapted so that it works with the second most fundamental component of asyncio, the :ref:`EventLoop ` [1]_. Conceptually a ``Future`` object is really very simple. It is a holder for (eventually) a result or exception, and also for a list of callbacks to be called when it is "done" (that is, when there is a result, an exception, or the ``Future`` has been canceled). Conceptually, the ``EventLoop`` is also very simple: each time through the loop, it calls any callback in the list of 'ready' callbacks (the ``call_soon`` list), and then uses a :mod:`selector ` to wait either for the next pending IO operation to complete or the time for the next scheduled task to arrive, at which point it adds the callback that will handle the event to the ``call_soon`` list and starts a new loop iteration. An asyncio program can be written in "callback" style using just these two components: ``Future`` objects are used for signalling by attaching callbacks to be scheduled for execution by the ``EventLoop`` when the ``Future``'s ``set_result`` method is called (or some other call is made that marks the ``Future`` as "done"). Other callbacks are scheduled with the ``EventLoop`` to handle IO events and to run scheduled tasks, and when these callbacks run they call the appropriate methods on the appropriate ``Future``\ s to mark them as "done" and therefore trigger the ``Future``'s callbacks to run. The power of asyncio programming, however, comes from two additional components: :ref:`coroutine` and :class:`~asyncio.Task`\ s. These two components tie ``Futures`` and the ``EventLoop`` into a system that allows one to write procedural-looking code that, under the hood, is async code. Note: the following discussion simplifies certain advanced details of how coroutines work (and that I currently don't understand :) in order to make the fundamental mechanisms clearer. The nature of a coroutine is that it is a Python generator function that uses only [2]_ :ref:`yield from `. When writing code using asyncio, instead of calling a function using python function call syntax and obtaining a result: res = normal_function() you use ``yield from``: res = yield from async_function() In the above snippet, ``async_function`` is a function that returns either a ``Future`` or a ``coroutine``. A ``Task`` is, itself, a ``Future``, and it wraps a ``coroutine`` (or another ``Future``, but there's no reason to do that). When a ``Task`` is created, it adds a callback to the ``EventLoop``'s ``call_soon`` queue that starts the iteration of the ``coroutine`` it is wrapping. That is, it arranges to call :keyword:`next` on the ``coroutine``. The result of that call to ``next`` has one of three valid results: a ``Future``, a :exc:`StopIteration` exception with a value, or some other exception. If it is an exception, the ``Task`` schedules a ``call_soon`` callback with the ``EventLoop`` that, on the next pass through the loop, will :keyword:`throw` the exception into the ``coroutine``. This means that the exception will be raised at the point where the (innermost) ``yield from`` call was made. If the result is a ``Future``, the ``Task`` schedules a callback on the ``Future`` to call the ``Task`` when the ``Future`` has completed. When some other thread of control eventually causes the ``Future`` to move to the "done" state, the ``Future`` will schedule that callback to run. That callback in turn will schedule another ``call_soon`` callback that will call ``next`` on the ``coroutine``. If the result is a ``StopIteration`` exception, the ``Task`` sets the value associated with the exception (which will be what the wrapped ``coroutine`` specified in its :keyword:`return` statement) as its result via ``set_result`` (remember, the ``Task`` is a ``Future``). All ``coroutines`` make calls to other ``coroutines`` and ``Future``\ s using ``yield from``. What ``yield from`` does is to iterate over the object passed to it and ``yield`` each result in turn. If we call ``yield from`` on a generator, and that generator in turn calls ``yield from``, the values from the inner iterator will be yielded as values from the outer ``yield from``. Since coroutines only call ``yield from`` on other coroutines or on ``Futures``, this means that when a ``Task`` callback calls :keyword:`next` on the ``coroutine`` it wraps what it gets back is a ``Future``, and it then schedules a callback on the ``Future`` and control returns to the ``EventLoop``. Control thus returns to the ``EventLoop`` after *each* iteration of the *innermost* iterator in the ``coroutine`` call chain, no matter how deeply nested in a chain of ``yeild from``\ s that ``Future`` was. When a ``Future`` completes, it schedules the callback provided by the ``Task`` that wraps the ``coroutine`` that was at the top of the chain of ``yield from``\ s that resulted in ``yield from`` being called on that ``Future``, and then it executes a :keyword:`return` statement, passing ``return`` the value that was set on the ``Future`` via ``set_result``. The ``Future``-scheduled callback (provided by the ``Task`` that wraps the top level ``coroutine``) schedules another callback that will make another call to ``next`` on the ``coroutine``. That causes all of the ``yield from``\ s in the chain to request the next value, which for the innermost ``yield from`` will cause the ``coroutine`` that executed it to obtain the value :keyword:`return`\ ed by the ``Future``, and that ``coroutine`` will continue execution with value in hand. When that lowest level ``coroutine`` itself reaches its end and returns a value, the ``yield from`` that called it returns that value and the next higher ``coroutine``, that executed that ``yield from``, will continue execution with value in hand. And so on until the top level ``coroutine`` completes and returns the value that becomes the value of the ``Future`` that is the ``Task``. To summarise at a slightly higher level, the overall flow in an asyncio program is that we execute procedural style code, and every time we get to a ``yield from`` statement the execution of that procedural code is suspended. This may go on for several levels of ``yield from`` call, but eventually a ``Future`` will be yielded and make its way back up to the ``Task``, and we will start a new pass through the ``EventLoop``. The ``EventLoop`` will then run any ``call_soon`` callbacks. When all ``call_soon`` callbacks have run, the ``EventLoop`` uses a :mod:`selector ` to wait for the next IO event or the next callback that was scheduled to run at a specific time. Those IO or timed events will provide values that will be set on certain ``Future`` objects, which will trigger the scheduling of ``call_soon`` callbacks which will in turn cause the ``corouties`` that were waiting for those ``Futures`` to be scheduled via ``call_soon`` to have ``next`` called on them and thus get another chance to run. This continues until all ``Futures`` are complete, including the ``Task`` or ``Task``\ s that the main ``EventLoop`` is waiting for (or the ``EventLoop`` is explicitly shut down). From the point of view of the ``coroutine``, this looks like procedural code: the ``coroutine`` (using ``yield from``) calls a subroutine, gets back a value, and continues on with its computations. When you write the ``coroutine`` you don't (for the most part) have to worry about the fact that there is an uncertain amount of time that will elapse between the ``yield from`` call and the acquisition of the result. You do, of course, have to be cognizant of the potential for deadlocks and the mutation of shared data by other ``coroutines``, just as you would in any programming involving multitasking. However, in async code, you do *not* have to worry about *simultaneous* modification of shared data: the other code can *only* execute when you call ``yield from`` [3]_. And there you have it. Using this "one cool trick" (``yield from``) we can write async code as if it was procedural code. .. [1] In fact, the asyncio ``EventLoop`` is plugable. There are many different event loops that can be used, including third party loops such as Twisted. It is the concept of the EventLoop that is fundamental. .. [2] A ``coroutine`` can also use a bare ``yield`` statement, which will yield control to the ``EventLoop`` but schedule the next iteration of the ``coroutine`` as ``call_soon``. This is a way to cooperatively yield control during what might otherwise be a cycle-stealing long computation. .. [3] Unless you are using the threading support to handle blocking calls.