Last year I started work on a new async networking library for Python. I’d never written any ‘async’ code before and wasn’t sure how hard it would be. As it turns out: Python’s async library is quite good – but there’s still room for improvement in my opinion. In this post I’m going to talk about async networking in Python. The features I like and the ones I don’t. Then I’ll talk about the library I created to try improve networking in Python.
I started my async journey by looking through the examples in the online Python docs. These docs are great and remind me of the developer-friendly PHP docs. Eventually I found an example for a TCP server.
class ProtocolClass(asyncio.Protocol):
# ... callback methods here
# e.g. data_receive(self, data) ...
# (There is a different function to call for UDP servers that does the same thing.)
server = await loop.create_server(
# Factory for making Protocol objects for new clients.
lambda: ProtocolClass(),
# Details for the listen socket ...
'127.0.0.1',
8888
)
What this code means is: every time a new client connects – create a Protocol object for them. The method that does this is the ‘factory’ lambda function. The class also has special callback methods that are run for various events. It’s pretty simple. Unfortunately, the moment you try use async functions you’re in for a world of pain.
As I learned: Python does indeed have async networking features. It turns out you can use loop.create_connection. So if you want to write ‘await connection() … await send … await recv’ for TCP – this code is for you.
# Open a new TCP connection.
reader, writer = await asyncio.open_connection('127.0.0.1', 8888)
# Send some data down the socket.
writer.write(b"message")
await writer.drain()
# Read some data back.
data = await reader.read(100)
# Close the socket.
writer.close()
await writer.wait_closed()
# You may be wondering what the 'async' version for UDP looks like?
# ... and the answer is: there isn't one.
The main benefit to using async await is the program preserves its sequential control flow but ‘blocking’ operations don’t stop the program. Instead, if a task needs to wait for a result, execution is given back to the event loop, and other tasks are free to run or be resumed when they’re ready. One disadvantage to coroutines is you have to manually check for changes. E.g. if there’s new data to be read from a stream it’s your job to await it.
From these examples – a myriad of issues stands out to me:
My idea to solve these problems is to encapsulate everything in a well-designed, consistent API. The API should allow endpoints to be used with both async functions and Protocol-style callbacks; It should not have any transport-specific code (so the same code will work with TCP and UDP); It should support network interfaces well; And IPv6 should be as easy to use as IPv4.
Here’s what I came up with.
The first problem I wanted to solve was the lack of network interface support. In network programming it’s common to see code that glosses over interface management. What makes this so appealing is the operating system supports default interfaces. So why bother choosing one? The problem is: if you write code that only uses the default interface then your program won’t be able to utilise all routes – possibly an issue for some software.
from p2pd import *
async def main():
# Select interface and choose a route to bind to.
# No interface name = default interface.
i = await Interface()
route = i.route()
# You can also load the NAT details.
await i.load_nat()
print(i) # All addresses and NAT behaviour
Previously, I spoke about how the asyncio module only provides async functions for TCP. I thought this was a major limitation. So I added support for async UDP. Obviously UDP isn’t reliable so sometimes recv calls time out. But importantly – doing I/O isn’t going to block the program so the event loop will be free to work on other tasks.
# Open UDP endpoint.
await route.bind() # port=0 ...
dest = await Address("p2pd.net", 7, route)
pipe = await pipe_open(UDP, dest, route)
pipe.subscribe()
# Async networking.
await pipe.send(b"echo back this message.", dest.tup)
out = await pipe.recv(timeout=2)
print(out)
# Cleanup the endpoint.
await pipe.close()
# More info on the basics here:
# https://p2pd.readthedocs.io/en/latest/python/basics.html
A key goal I had for this library was to provide the same APIs for most use-cases. No matter if the transport is TCP or UDP; IPv4 or IPv6; Server or client. The main abstraction I use is called a ‘pipe.’ Pipes allow developers to choose what programming model to use. They support async coroutines and event-based callbacks. My library fully supports using coroutines as callbacks or regular functions.
async def msg_cb(msg, client_tup, pipe):
await pipe.send(msg, client_tup)
# Adds a message handler before pipe creation.
# Can use callbacks or awaits.
pipe = pipe_open(TCP, route=route, msg_cb=msg_cb)
# Alternatively you can use add_msg_cb.
pipe.add_msg_cb(msg_cb)
Some servers need to support multiple protocols and address families. For this there is the Daemon class. There’s not much to it. It simply handles creating pipes for you.
class EchoServer(Daemon):
def __init__(self):
super().__init__()
async def msg_cb(self, msg, client_tup, pipe):
await pipe.send(msg, client_tup)
async def main(route):
await route.bind(port=12345)
server = await EchoServer().listen_all(
[route],
[12345, 8080],
[TCP, UDP],
af=AF_ANY
)
You’ll notice that the only way to build servers in Python is with callbacks. I think this is usually a good model. But what if you want to write an async version? That is – what if you want to await accepting a new client? Well, now you can. Simply await the pipe and it will return a regular pipe for the next client that connects to the server.
client = await pipe
await client.send(b"hello")
Currently, my examples have been ‘push’ and ‘pull.’ However, sometimes it’s useful to be able to subscribe to certain messages. Consider UDP for a moment. In UDP messages may arrive in any order. Therefore, it’s common to see protocols using unique IDs in response messages that mirror the IDs used in requests. What this means is ideally it should be possible to subscribe to certain patterns and await the results. That’s how async recv works in P2PD.
You subscribe using a regex pattern and await a response. If you look at the async examples earlier you may see I called subscribe(). What this means is ‘subscribe to all messages.’ Any message matching that pattern will be added to their own queue. You can then await that queue using the recv() call. It’s very flexible.
More info on that here: https://p2pd.readthedocs.io/en/latest/python/queues.html
Having a library that works well is great and I’ve already used it to build many programs. But the reason I started this project was to make peer-to-peer connections easier.
Some of the coolest software today seems to use peer-to-peer networking. Bitcoin, Bittorrent, Skype, and any number of games all use peer-to-peer features. These services are powerful because they let their users be part of running them rather than relying on a trusted third-party. The downside is they’re more complex. Routers, NATs, firewalls, and dynamic IPs all contribute to making the process difficult.
To do P2P networking right involves a mishmash of esoteric ideas.
In P2PD there are nodes who run their own TCP servers that implement one or more protocol handlers. These are the msg_cb functions listed earlier. Nodes have their own address that can be given out to connect to them. The address includes a lot of information like what interfaces the node has, it’s NAT configurations, information on signalling servers, and so on. The Node object has a connect function to handle making P2P connections.
More details on peer-to-peer networking here: https://p2pd.readthedocs.io/en/latest/python/index.html
Libp2p is currently the most popular library for peer-to-peer networking. There are implementations of Libp2p in many languages and the Go version appears to be the most complete. A question I see arising is ‘how does P2PD compare to Libp2p?’ I won’t write a full essay here but here are the cliff notes.
P2PD is written in Python and targets Python version 3.6 or higher (3.5 and higher on non-Windows.) It supports most platforms. But what about other languages? Is it possible to use P2PD from languages that aren’t Python? What I came up with was a special REST server for doing P2P networking. The server let’s you lookup information on your interfaces, make peer-to-peer connections, push/pull/pub/sub, and more.
More details on that here: https://p2pd.readthedocs.io/en/latest/rest_api.html
I think it would be possible to use APE’s build of Python to have P2PD packaged for any device. You would then only have to execute a file and the library would do the rest. For software that needs more control over sockets I think it would be possible to share sockets with another process.
If you made it this far then thanks for reading! If you liked this post you can check out P2PD here:
I’m also looking for my next software engineering role. So if you need someone who ships hit me up.