Aymeric Augustin July 19, 2020 14 min read

Sans I/O when rubber meets the road

If you never heard of Sans I/O…

Sans I/O is a software design pattern for implementing network protocols.

An I/O-free library contains:

no network I/O;
no asynchronous control flow.

An I/O integration layer complements the library, connecting it to a network I/O framework, which usually involves asynchronous control flow.

While this may seem counter-intuitive, following this discipline is expected to yield significant benefits.

The I/O-free library is reusable because it isn’t tied to a particular I/O framework. It builds upon the lowest possible common denominator: receiving and sending sequences of bytes. It’s also composable with any other I/O-free library, for the same reason.
The I/O-free library is more amenable to high-quality software engineering practices. Of course, the I/O integration layer still has to deal with network unreliability, but the associated complexity is quarantined to that layer instead of infecting the entire library.

To make the best of this blog post, if you aren’t familiar with Sans I/O, take a look at the Sans I/O documentation page or watch Building Protocol Libraries The Right Way, an enlightening and entertaining talk by Cory Benfield at PyCon 2016.

Sans I/O tackles real problems

As the creator of websockets, even though I had fully bought into asyncio — or maybe because of that — Cory’s talk really resonated with me.

When I started websockets, I wanted to build a good implementation of the WebSocket protocol (RFC 6455) on top of asyncio¹. I had a design goal of making the API feel “asyncio-native”.

Thanks to coroutines, the core of websockets was elegant and robust. I took me a few tries to figure out the right design for managing the connection under the hood and exposing a convenient and safe API. Eventually, I converged on a design that didn’t attract too many bug reports. I was so proud that I made a diagram to show off.

Of course, websockets was totally coupled to asyncio, both for network I/O and for control flow. Guess what happened next.

Some users were happy. Other users ~~failed to appreciate the design~~ had practical problems to solve within a set of constraints where asyncio didn’t fit. They started asking for other concurrency models. I added a couple FAQ entries to clarify that asyncio was the way to go and moved on.

Then several factors gradually changed my mind:

I looked at connecting through a proxy. Ideally, this should involve nothing more that composing the WebSocket protocol implementation on top of a HTTP proxy or SOCKS proxy implementation. Unfortunately, there was no clear way to implement this composition.
Other projects started using websockets as a library, for example application servers supporting both HTTP and WebSocket connections. The integration story was quite poor. For example Sanic didn’t get new features as I added them to websockets.
There was also a discussion for integrating websockets in httpx. That didn’t go very far as I didn’t have a good API available. Eventually it was ruled out-of-scope.
I was following Nathaniel J. Smith’s work on trio as well as David Beazley’s on curio. I was interested in porting websockets to these frameworks, but again, the barrier to experimentation was very high, as websockets was completely tied to asyncio.

Such extensions of websockets’ original scope would be easier with a Sans I/O core.

Furthermore, even within the original scope, Cory put the finger on several pain points I had experienced.

Paint point #1: testing is complicated and slow

I’m a strong believer in automated testing. Continuous integration enforces 100% branch coverage in websockets. This isn’t a silver bullet but it’s a good safety net.

To get there and to stay there, I spent an incredible amount of time fighting asyncio.

Since the protocol implementation is strongly coupled with asyncio, both in terms of network I/O and asynchronous control flow, everything needs to run in an event loop. Initially test cases called self.loop.run_until_complete(...) for every interaction with websockets, which was very verbose.

Following an excellent suggestion by Chris Jerdonek, we added magic to support writing tests as coroutines. Every project that uses coroutines heavily needs something like this. The asynctest library provides an off-the-shelf solution.

This made tests less verbose but they were still slow. In this test module, every test spins up an event loop, starts a server, starts a client, connects the client to the server, then actually does something and makes assertions on what happens. This is an especially slow and wasteful way to send a few bytes and see how the protocol implementation handles them.

Also, in some cases, the most pragmatic way to trigger events in a given order is to schedule them one, two or five milliseconds into the future. The delay ensures that websockets has finished processing all previous events. That makes the test suite significantly slower than it should be. Probably there’s a way to control time in the event loop and make the test suite faster but I’m wary of introducing such a difference between tests and reality.

Any test that involves an arbitrary delay is flaky; websockets’ tests are no exception there. I could coerce most tests into submission in my development environment. I multiplied all delays by 10 in continuous integration in order to be less vulnerable to unexpected lags. For a long time, one test remained flaky and I could never figure out why. I don’t remember if the problem went away or if I removed the test.

Besides, while I was hardening the library, I was getting bug reports where I could infer which events had happened in what order, but I had a very hard time reproducing that scenario in a test. In some cases, I ended up depending on asyncio implementation details to an unreasonable degree. For example, some tests run the event loop once or twice so that it processes some callbacks, but not other callbacks, and then make assertions on the state at that point. In other cases, I had to give up and flag the code path with # pragma: no cover.

Finally, due to the protocol implementation being intertwined with I/O, tests are so complicated that it’s very hard to be sure of what they do exactly. This transport mock should give you a feeling for the extent of the damage.

Long story short, while I was quite happy with the library, I was never happy with the test suite.

Pain point #2: writing yet another HTTP implementation

A WebSocket connection starts with a HTTP handshake, which requires at least a basic HTTP/1.1 implementation, so I had to write one for websockets. Actually, I wrote two.

The first one parsed the request line or status line then asked the standard library’s MIME implementation to parse the headers. However, it still contained logic to read the correct amount of data before passing it to the MIME parser.

The second one is written from scratch based on relevant RFCs. Fortunately, WebSocket only requires a small subset of HTTP, which made this a manageable endeavor. And I’m the kind of guy who enjoys writing a HTTP parser :-)

This minimal implementation seems to work well enough in practice and fits in 200 lines of code. Even if I could rely on a third-party implementation, I think I’d keep the built-in version to avoid a large dependency, or at least make the dependency optional.

Then I started looking at bootstrapping a WebSocket connection with a HTTP/2 handshake (RFC8441). HTTP/2 is a different beast than HTTP/1.1. There’s no way I’m writing a minimal implementation.

If websockets was designed according to the Sans I/O principles, then it should be easier to switch its HTTP implementation for another HTTP/1.1 implementation, perhaps one with better performance, or even for a HTTP/2 implementation.

I have only one source of comfort in this area. In his talk, Cory points out that everyone gets line continuation in HTTP wrong at least once. I bailed out before embarrassing myself ;-)

Pain point #3: composition doesn’t work

At some point, users of websockets started asking how to use it behind a proxy.

Supporting HTTP proxies doesn’t look too hard. I have a draft pull request that works. Once again, I’m re-implementing a protocol outside of websockets’ focus because I don’t have a reasonable way to integrate someone else’s work.

Supporting SOCKS proxies is a completely different story. There, I really need to build upon a third-party implementation of the SOCKS protocol.

In theory, the transport and protocol abstractions in asyncio are supposed to provide composability. One should be able to use a lower-level protocol as the transport for a higher level protocol.

In practice, this doesn’t seem to work very well. I don’t believe many libraries successfully take advantage of this possibility. asyncio doesn’t even implement TLS as a protocol on top of a TCP transport. Instead it provides a TLS transport called _SSLProtocolTransport. If composition worked, perhaps asyncio would use it internally?

With enough effort, probably there’s a solution, but I think it would be fragile. It would require reviewing asyncio’s implementation details to make sure the behavior is correct in all cases, especially edge cases. These details can change: for example, at some point, asyncio stopped closing the transport when it received EOF; I had to undo this change in websockets.

Sans I/O is tricky to implement

For all these reasons, I’ve been working intermittently on rewriting websockets as an I/O-free core, an asyncio integration layer, and other I/O integration layers, like trio, curio, or sockets & threads.

I made little progress for almost two years :-(

After reading the Sans I/O documentation and getting excited, I failed to realize how steep the learning curve was going to be.

Porting an existing project to Sans I/O is even harder than starting a new project with Sans I/O from scratch. Not only do you need to figure out how to write the I/O-free core, but you also keep wondering about how you’re going to reconnect all the I/O-related details that you spent years getting right.

Cory’s talk shows this API:

events = handle_data(in_bytes)
out_bytes = perform_action()

This is great marketing!

It fits in a slide, it makes the concept obvious, and I was sold instantly.

However, while porting websockets to Sans I/O², I realized this API was too simplified to be practical. Since I don’t need to fit in a slide, I’m going to share what I learnt. Hopefully I can flatten the learning curve for others.

Don’t get me wrong: despite the difficulties, I still believe that Sans I/O is a very good model. I’m just managing expectations :-)

So, if you’re considering adopting Sans I/O, here’s what you should be aware of.

Lesson #1: protocols have state

As far as I understand, the main parts of a Sans I/O library are:

a state machine that maintains the state of the connection;
a parser that receives incoming data, updates the state, and produces incoming events;
a serializer that receives outgoing events, updates the state, and produces outgoing data.

If you listened to Cory’s talk, he mentions “parsers and state machines” several times. He doesn’t mention “serializers”, presumably because they’re less challenging.

Object-oriented programming is a good way to encapsulate the state. To account for the state machine, the API becomes:

connection = Connection()  # manages the state of the connection
events = connection.handle_data(in_bytes)
out_bytes = connection.perform_action()

Lesson #2: protocols may handle automatic responses

Sometimes, incoming events require a specific response. I believe this shouldn’t be exposed to users of the library, because doing so would just create busy work.

For example, when you use the socket API, you don’t have to bother sending an ACK every time you receive a SYN. All TCP protocol implementations that I know of abstract this away.

Not all authors of Sans I/O libraries agree with me. Some libraries require their users to handle automatic responses explicitly. Usually, when an incoming event requires a response, they provide a helper to generate the corresponding outgoing event.

In the WebSocket protocol, when an endpoint receives a ping frame, it must answer with a pong frame. I want to handle this automatically, so users don’t have to.

To support this possibility, the function that handles incoming data must be able to return outgoing data:

connection = Connection()
# handle_data now returns out_bytes in addition to events
events, out_bytes = connection.handle_data(in_bytes)
out_bytes = connection.perform_action()

Lesson #3: protocols need to handle errors

The above is fine until you consider what happens when receiving incoming data or performing actions that infringe the protocol.

For perform_action, there’s an easy answer: raise an exception and leave the connection state unchanged.

For handle_data, I believe the correct answer is also to raise an exception. However, this requires rethinking the API entirely!

Assuming a stream-oriented protocol running over TCP — rather than a datagram-oriented protocol running over UDP — consider these two scenarios:

# SCENARIO 1
connection = Connection()
events, out_bytes = connection.handle_data(good_bytes)
# send out_bytes and do something with events
events, out_bytes = connection.handle_data(bad_bytes)
# exception raised!

# SCENARIO 2
connection = Connection()
events, out_bytes = connection.handle_data(good_bytes + bad_bytes)
# exception raised!

They should give the same result, but they don’t. The first one produces some events. The second one doesn’t.

There are several ways to resolve this:

While processing incoming data, store incoming events and outgoing bytes in the connection state, where they can be fetched later, even if processing fails in the meantime. handle_data() returns nothing under normal circumstances and raises an exception on invalid inputs.
If processing fails, store the exception in the connection state, where it can be fetched later. handle_data() never raises an exception. This is the exact opposite of option 1.
Send exceptions as events in the stream of incoming events instead of raising them. This puts everything into the return value.

Option 1 is my favorite because it’s the only one that makes natural use of exceptions. Of the three options, it’s the most likely to be implemented correctly.

So the API becomes:

connection = Connection()

# handle_data raises an exception on invalid inputs
connection.handle_data(in_bytes)
events = connection.events_received()
out_bytes = connection.bytes_to_send()

# perform_action raises an exception on invalid actions
out_bytes = connection.perform_action()

The worst implementation issue that could reasonably happen is fetching the incoming events but forgetting to fetch and send the outgoing data. It that’s a concern, that could be fixed by providing a single function to fetch events and outgoing bytes simultaneously.

Lesson #4: protocols care about I/O

Cory mentions this difficulty in his talk. He gives flow control as an example. He says that I/O-free libraries should provide levers for these needs, which is a valid answer.

I hit this question long before I started caring about flow control. In fact, it appears as soon as you start thinking about connection termination.

Within the constraints of Sans I/O, all we can do is signal the end of the data stream. The I/O integration layer will have to deal with the rest.

For incoming data, the easiest is to add a connection.handle_eof() method. Then the parser knows that it won’t receive more data. The connection can update its state, likely by initiating the appropriate closing procedure.

For outgoing data, I’m using the empty bytestring b"" as a sentinel that signals that the connection must be closed. This required changing the return value of bytes_to_send() to a list of bytestrings.

The API now looks like:

connection = Connection()

connection.handle_data(in_bytes)
connection.handle_eof()

events = connection.events_received()
out_bytes_or_eof = connection.bytes_to_send()  # returns List[bytes]

out_bytes_or_eof = connection.perform_action()  # returns List[bytes]

Sans I/O is low level

Once you’ve figured out the structure of the API, you’re still left with a low level starting point, due to the constraints of Sans I/O. Fortunately there are reasonable solutions to get back to a more comfortable level for writing the actual implementation.

Difficulty #1: there’s no help with network I/O

This is part of the definition of Sans I/O.

In my case, the WebSocket protocol contains:

a handshake based on the HTTP protocol, which requires reading data until a delimiter — namely, the LF character — to extract the request or status line and then each request or response header;
a data transfer phase with a TLV framing protocol, which requires repeatedly reading a predefined number of bytes.

These are the main ways to read data in a stream-oriented protocol, the last one being reading until EOF.

If you haven’t received enough incoming data, you need to wait until you get more data. The standard solution is to use a buffer to accumulate incoming data until there’s enough to perform the read operation.

asyncio provides the StreamReader class for this purpose. In an I/O-free library, you need to bring you own stream reader.

There are two ways to implement such a stream reader:

with a bytearray: extend it when receiving data, slice it when reading data;
with a Deque[bytes]: append to it when receiving data, popleft when reading data, slice and appendleft the remaining data if the read doesn’t fall on a boundary.

I chose option 1 because it’s easier to implement and because asyncio uses it. I believe option 2 may do one less memory copy, which could be faster, especially for large messages. Perhaps I’ll benchmark it.

Difficulty #2: there’s no help with concurrency

This is the other part of the definition of Sans I/O.

When you send a ping frame with websockets, it returns an asyncio.Future that completes when receiving the corresponding pong frame. You can’t do that with Sans I/O because you aren’t allowed to use control flow primitives from asyncio.

Perhaps relying on concurrent.futures.Future would be acceptable here. Else, the closest you can do is adding a callback argument to the function that sends a ping frame, then calling it when receiving the pong. Dropping back from coroutines to callbacks is painful. The alternative is to let the I/O integration layer takes care of this, at the cost of duplicating the logic.

There are other requirements, such as timeouts, which cannot be handled at all in a Sans I/O world. These have to be pushed to the I/O integration layer.

That was the bad news. Now, the good news is that you still write the entire protocol implementation with coroutines! Well, strictly speaking, they won’t be what Python calls coroutines since 3.6 — these aren’t allowed in Sans I/O — but their predecessors: generator-based coroutines.

The pendulum swings. The first version of websockets dates back to Python 3.3. I wrote it with yield from because generator-based coroutines were the only kind of coroutine available before Python 3.6. Then I converted it to async / await and native coroutines when I dropped support for Python < 3.6. Now I’m converting it back to yield from and generator-based coroutines!

Generator-based coroutines are bit tricky to work with. Libraries like ohneio may help there.

The alternative to generator-based coroutines is to try parsing a message, give up if there isn’t enough data, retry when more data has arrived, give up again if there still isn’t enough data, etc. which is inefficient for large messages.

As far as I can tell, this is how wsproto works, and that’s one of the reasons why websockets isn’t built on top of wsproto. (EDIT — Nathaniel points out that wsproto only parses headers this way. Since a WebSocket header takes at most 14 bytes, in practice, the first TCP packet will always contain the whole header. Therefore, performance is still good.)

There’s a more basic reason: websockets predates wsproto by several years!

Difficulty #3: protocols care about I/O (bis)

Even after handling EOF correctly as discussed above, some protocol-level concerns still fall under the responsibility of the I/O integration layer. As a consequence, their implementation will be duplicated in each I/O integration layer, for each I/O model.

For example, when establishing a WebSocket connection, websockets follows HTTP redirects. This may require opening a connection to a different host or a different port, which isn’t possible in an I/O-free library. As a consequence, the responsibility of checking if a response is a HTTP redirect, perhaps opening a new connection, and trying a new WebSocket handshake falls on the I/O integration layer.

This restricts the Sans I/O approach to the lower level parts, primarily the parsing of incoming data. You can only escape I/O up to a certain point. I found that point to be lower than I initially hoped for.

Takeaways

Even though I’m still in the middle of converting websockets to Sans I/O, I’m getting a reasonable idea of what the end result will look like and how I can get there.

I’m already convinced that Sans I/O will be a huge improvement for testing and reusability, in addition to extending use cases for websockets.

While developing a deeper understanding of stream readers, I also developed a better appreciation of what asyncio makes us take for granted.

Drawing the line between the I/O-free library and the I/O integration layer will likely be critical to the success of the Sans I/O refactoring.

Let’s see how the implementation goes. Perhaps I’ll learn enough to write another blog post. I’ll keep you posted.

Thanks for reading!

Many thanks to Cory Benfield, David Beazley, and Nathaniel J. Smith: not only does this post build upon their teaching, but they were also kind enough to review it.

I wrote the first version websockets while Guido van Rossum was writing tulip, which was later renamed to asyncio when it was merged to the standard library. ↩
This is still a work in progress at the time I’m writing this. ↩

Fractal Ideas