Sign Up
Hero

Protocol Types in Python 3.8

A quick introduction to the new Protocol class in Python 3.8 and how it enables structural typing

Introduction

In the previous installment, we have seen how to type annotations enable gradual typing in Python and some of the common typing patterns that Python developers use when writing code.

In this article, we're going to take a look at the new Protocol classes introduced in Python 3.8 and how it enables, even in typed contexts, structural typing, and other idiomatic patterns in Python.

What Is A Protocol

Protocol is a very generic word, both in regular language as well as in Computer Science. Most of us are probably familiar with it from hearing TCP Protocol, UDP Protocol or also HTTP Protocol. Dictionaries have also dedicated definitions for it:

A set of rules governing the exchange or the transmission of data between devices

Which indeed makes sense. All the examples listed above are communication protocols between two remote devices with a set of rules that are governing the transmission. In the case of TCP, for instance, the protocol mandates the shape of the message, the possible operations, the error policies as well as the rules for possible retransmission of a message.

On the other hand, it might be kind of weird for people to talk about protocols in a programming language. After all, there is no communication involved in a program, so what is the meaning of it?

We can answer the question by taking a look at another more generic definition that is not coupled with computer science:

The accepted or established code of procedure or behavior in any group, organization, or situation. (New Oxford Dictionary)

Let's notice the difference with the previous definition we found above:

  • No devices involved.
  • No data transmission involved.
  • No data involved, per se.

In fact, the nature of the protocol we're going to be talking about today. By removing some of the constraints of the previous definition, it is possible to reason about protocols in the context of a programming language.

Protocols are indeed not a new idea and have been around for a long time. For instance, Clojure supports protocols explicitly.

(defprotocol Shape
  (area [_])
  (perimeter [_]))
  
(defrecord Square [l])

(defrecord Circle [r]
  Shape
  (area [_] (* Math/PI (Math/pow r 2)))
  (perimeter [_] (* r Math/PI)))

(defrecord Rectangle [w h]
  Shape
  (area [_] (* w h))
  (perimeter [_] (* 2 (+ w h))))

This code is basically creating a set of functions grouped in a Protocol called Shape; every type that wants to adhere to such Protocol has to implement such methods.

Successively, we create two new types — Circle and Rectangle where we implement the area and perimeter methods that are defined in the protocol. Because of this, we can call any of the protocol methods on any of the type instance implementing it:

(def c (->Circle 10))
(area c)
; 314.1592653589793

(def r (->Rectangle 10 12))
(area r)
; 120

(def s (->Square 10))
(area s)
; Execution error (IllegalArgumentException) at user/eval147$fn$G (REPL:1).
; No implementation of method: :area of protocol: #'user/Shape found for class: user.Square

We can see that the methods are called correctly in the first two instances, but it fails on the Square object because it does not implement the Shape Protocol. Also, note that no class or inheritance is involved in this.

It is also possible to implement a protocol on a type that is already existing, even on types we do not own:

(extend-protocol Shape
  Square
  (area [s] (Math/pow (:l s) 2))
  (perimeter [s] (* 4 (:l s))))
  
(extend-protocol Shape
  java.lang.String
  (area [_] 1)
  (perimeter [_] 10))
  
(def s ->Square 10)
  
(area s)
; 100.0

(area "hello world")
; 1

In this example, we have indeed implemented the same protocol on a type that has been precedently created (Square) and then even implemented the protocol on the built-in String type. This allows us to obtain what is called "polymorphism a la carte", where classes and inheritance are not involved/required. The original type is not even aware that a protocol implementation is being attached to it.

The rationale behind the Protocols is well explained in Simple made easy or in the documentation page

Protocols In Python

Protocols in Python work a little bit differently. While the concept is the same as Clojure, Python does not need explicit protocol declaration. If a type has the methods specified in the protocol, then it implements the protocol.

For instance, the len function requires a Sized type as its argument, which is every type that implements the __len__ function:

class Team:
    def __init__(self, members):
        self.__members = members

justice_league_fav = Team(["batman", "wonder woman", "flash"])

print(len(justice_league_fav)) # TypeError: object of type 'Team' has no len()

The Team class does not implement the __len__ method, and the runtime is throwing an error. We can change the class to implement such method:

class SizedTeam:
    def __init__(self, members):
        self.__members = members

    def __len__(self):
        return len(self.__members)

justice_league_fav = SizedTeam(["batman", "wonder woman", "flash"])

print(len(justice_league_fav)) # 3

In the same way, we have seen in Clojure, no inheritance was involved, and the runtime has been able to execute the code.

Python 3 defines a number of protocols that can be implemented by our own types to be used around. The most common ones are:

  • Sized: any type implementing the __len__ method
  • Iterable: any type implementing the __iter__ method
  • Iterator: any type implementing the __iter__ and __next__ methods

The complete list, though, is available on the documentation page

Protocol Types In Python

The problem comes when we try to apply the same Protocol concept in a typed context. Suppose, for instance; we have a custom class defining a protocol describing an open/close method for IO operations:

import io

class IOResource:
    def __init__(self, uri: str):
        pass

    def open(self) -> int:
        pass

    def close(self) -> None:
        pass


class FileResource:
    def __init__(self, uri: str):
        self.uri = uri

    def open(self):
        self.file = io.FileIO(self.uri)
        return self.file.fileno()

    def close(self):
        self.file.close()


def write_resource_to_disk(r: IOResource):
    pass


write_resource_to_disk(FileResource("file.txt"))

Try to paste this code in a new file called IOResource.py and then try to run mypy on it:

mypy IOResource.py

IOResource.py:11: error: Argument 1 to "write_resource_to_disk" has incompatible type "FileResource"; expected "IOResource"

You can see we received a type error claiming (rightfully) that FileResource is not an IOResource subclass, even though our class is implemented all the required methods.

The same also happens to the builtin collection types, in case we're using Python 3.7 and running an older version of mypy (0.521, to be specific) on the following code:

from typing import Sized

def multiply_len(val: Sized) -> int:
    return 2 * len(val)

class SizedTeam:
    def __init__(self, members):
        self.__members = members

    def __len__(self):
        return len(self.__members)

multiply_len(SizedTeam(["batman", "wonder woman", "flash"]))

# team.py:13: error: Argument 1 to "multiply_len" has incompatible type "SizedTeam"; expected "Sized"

Basically, the type system is not smart enough to recognize the implicit protocol through structural typing, leaving a bunch of idiomatic Python constructs out of the game when working in typed contexts.

Fortunately speaking, the issue has been fixed in Python 3.8 (or with any mypy >= 0.521) with the introduction of Protocol classes. If we take the collection example from above and try it on Python 3.8 — it will indeed work.

We can also fix now the IOResource example to make it work in Python 3.8:

import io
from typing import Protocol

class IOResource(Protocol):
    def __init__(self, uri: str):
        pass

    def open(self) -> int:
        pass

    def close(self) -> None:
        pass


class FileResource:
    def __init__(self, uri: str):
        self.uri = uri

    def open(self):
        self.file = io.FileIO(self.uri)
        return self.file.fileno()

    def close(self):
        self.file.close()


def write_resource_to_disk(r: IOResource):
    pass


write_resource_to_disk(FileResource("file.txt"))

If we try to run mypy on this file again, we should receive no error.

The only change that was required was to make the base class inherit from a new special class called Protocol. This is a special class that enables the type system to go structural and not nominal.

Rundown of Protocols Features

Let's now consider this protocol class:

from typing import Protocol
import io

class IOResource(Protocol):
    uri: str

    def __init__(self, uri: str):
        pass

    def open(self) -> int:
        pass

    def close(self) -> None:
        pass

And let's use it to check what we can do when using Protocols.

Protocols are defined by including the special typing.Protocol class in the base class list. The annotated class does not lose any semantics of a regular abstract base class; they are just handled specially by the type checker.

All the functions and variables of a Protocol class are also protocol members, no matter the decorator on the top. This means they have to present in the target classes in order to consider them compliant with the protocol:

import io


class FileResource:
    data: str
    file: io.FileIO

    def __init__(self, uri: str):
        self.data = uri

    def open(self):
        self.file = io.FileIO(self.uri)
        return self.file.fileno()

    def close(self):
        self.file.close()



write_resource_to_disk(FileResource("file.txt")) # Argument 1 to "write_resource_to_disk" has incompatible type "FileResource"; expected "IOResource"

When running this code through mypy, the type checker will detect the class is missing the uri string property that's defined in the protocol class.

A class can also inherit directly from a Protocol class — making the relationship explicit. This does not change anything, and it is indeed not required (and discouraged):

- class FileResource:
+ class FileResource(IOResource):

A Protocol class can also aggregate other protocols if necessary:

- class IOResource(Protocol):
+ class IOResource(Sized, Protocol):

In this case, the IOResource is a protocol with the methods we have defined above and it's also a Sized, meaning the __len__ method must be implemented:

class FileResource():
    uri: str
    file: io.FileIO

+    def __len__(self):
+        return len(self.file.readall())

It is also possible to create "aggregate" protocol classes, if necessary:

from typing import Protocol
import io

class IOResource(Protocol):
    uri: str

    def __init__(self, uri: str):
        pass

    def open(self) -> int:
        pass

    def close(self) -> None:
        pass

class SizedIOResource(IOResource, Sized, Protocol):
  pass

Any class willing to adhere to the SizedIOResource protocol has to implement all the methods in IOResource and the __len__ method as well.

Protocol classes cannot be instantiated, and both a typing error and a runtime error will be thrown since they're internally abstract classes:

q = IOResource("/dev/file.txt") # Cannot instantiate abstract class 'IOResource' with abstract attributes '__len__' and 'uri'

Conclusions

Protocol classes are very useful to keep some idiomatic patterns in Python, even in typed contexts. While the type system is yet not complete, this is an important milestone to close the loop of the story.