In the previous installment, we have seen how to type annotations enable gradual typing in Python and some of the common typing patterns that Python developers use when writing code.
In this article, we're going to take a look at the new Protocol classes introduced in Python 3.8 and how it enables, even in typed contexts, structural typing, and other idiomatic patterns in Python.
What Is A Protocol
Protocol is a very generic word, both in regular language as well as in Computer Science. Most of us are probably familiar with it from hearing TCP Protocol, UDP Protocol or also HTTP Protocol. Dictionaries have also dedicated definitions for it:
A set of rules governing the exchange or the transmission of data between devices
Which indeed makes sense. All the examples listed above are communication protocols between two remote devices with a set of rules that are governing the transmission. In the case of TCP, for instance, the protocol mandates the shape of the message, the possible operations, the error policies as well as the rules for possible retransmission of a message.
On the other hand, it might be kind of weird for people to talk about protocols in a programming language. After all, there is no communication involved in a program, so what is the meaning of it?
We can answer the question by taking a look at another more generic definition that is not coupled with computer science:
The accepted or established code of procedure or behavior in any group, organization, or situation. (New Oxford Dictionary)
Let's notice the difference with the previous definition we found above:
- No devices involved.
- No data transmission involved.
- No data involved, per se.
In fact, the nature of the protocol we're going to be talking about today. By removing some of the constraints of the previous definition, it is possible to reason about protocols in the context of a programming language.
Protocols are indeed not a new idea and have been around for a long time. For instance, Clojure supports protocols explicitly.
(defprotocol Shape (area [_]) (perimeter [_])) (defrecord Square [l]) (defrecord Circle [r] Shape (area [_] (* Math/PI (Math/pow r 2))) (perimeter [_] (* r Math/PI))) (defrecord Rectangle [w h] Shape (area [_] (* w h)) (perimeter [_] (* 2 (+ w h))))
This code is basically creating a set of functions grouped in a Protocol called
Shape; every type that wants to adhere to such Protocol has to implement such methods.
Successively, we create two new types —
Rectangle where we implement the
perimeter methods that are defined in the protocol. Because of this, we can call any of the protocol methods on any of the type instance implementing it:
(def c (->Circle 10)) (area c) ; 314.1592653589793 (def r (->Rectangle 10 12)) (area r) ; 120 (def s (->Square 10)) (area s) ; Execution error (IllegalArgumentException) at user/eval147$fn$G (REPL:1). ; No implementation of method: :area of protocol: #'user/Shape found for class: user.Square
We can see that the methods are called correctly in the first two instances, but it fails on the
Square object because it does not implement the
Shape Protocol. Also, note that no class or inheritance is involved in this.
It is also possible to implement a protocol on a type that is already existing, even on types we do not own:
(extend-protocol Shape Square (area [s] (Math/pow (:l s) 2)) (perimeter [s] (* 4 (:l s)))) (extend-protocol Shape java.lang.String (area [_] 1) (perimeter [_] 10)) (def s ->Square 10) (area s) ; 100.0 (area "hello world") ; 1
In this example, we have indeed implemented the same protocol on a type that has been precedently created (
Square) and then even implemented the protocol on the built-in
String type. This allows us to obtain what is called "polymorphism a la carte", where classes and inheritance are not involved/required. The original type is not even aware that a protocol implementation is being attached to it.
Protocols In Python
Protocols in Python work a little bit differently. While the concept is the same as Clojure, Python does not need explicit protocol declaration. If a type has the methods specified in the protocol, then it implements the protocol.
For instance, the len function requires a
Sized type as its argument, which is every type that implements the
class Team: def __init__(self, members): self.__members = members justice_league_fav = Team(["batman", "wonder woman", "flash"]) print(len(justice_league_fav)) # TypeError: object of type 'Team' has no len()
Team class does not implement the
__len__ method, and the runtime is throwing an error. We can change the class to implement such method:
class SizedTeam: def __init__(self, members): self.__members = members def __len__(self): return len(self.__members) justice_league_fav = SizedTeam(["batman", "wonder woman", "flash"]) print(len(justice_league_fav)) # 3
In the same way, we have seen in Clojure, no inheritance was involved, and the runtime has been able to execute the code.
Python 3 defines a number of protocols that can be implemented by our own types to be used around. The most common ones are:
Sized: any type implementing the
Iterable: any type implementing the
Iterator: any type implementing the
The complete list, though, is available on the documentation page
Protocol Types In Python
The problem comes when we try to apply the same Protocol concept in a typed context. Suppose, for instance; we have a custom class defining a protocol describing an open/close method for IO operations:
import io class IOResource: def __init__(self, uri: str): pass def open(self) -> int: pass def close(self) -> None: pass class FileResource: def __init__(self, uri: str): self.uri = uri def open(self): self.file = io.FileIO(self.uri) return self.file.fileno() def close(self): self.file.close() def write_resource_to_disk(r: IOResource): pass write_resource_to_disk(FileResource("file.txt"))
Try to paste this code in a new file called
IOResource.py and then try to run
mypy on it:
mypy IOResource.py IOResource.py:11: error: Argument 1 to "write_resource_to_disk" has incompatible type "FileResource"; expected "IOResource"
You can see we received a type error claiming (rightfully) that
FileResource is not an
IOResource subclass, even though our class is implemented all the required methods.
The same also happens to the builtin collection types, in case we're using Python 3.7 and running an older version of mypy (0.521, to be specific) on the following code:
from typing import Sized def multiply_len(val: Sized) -> int: return 2 * len(val) class SizedTeam: def __init__(self, members): self.__members = members def __len__(self): return len(self.__members) multiply_len(SizedTeam(["batman", "wonder woman", "flash"])) # team.py:13: error: Argument 1 to "multiply_len" has incompatible type "SizedTeam"; expected "Sized"
Basically, the type system is not smart enough to recognize the implicit protocol through structural typing, leaving a bunch of idiomatic Python constructs out of the game when working in typed contexts.
Fortunately speaking, the issue has been fixed in Python 3.8 (or with any mypy >= 0.521) with the introduction of Protocol classes. If we take the collection example from above and try it on Python 3.8 — it will indeed work.
We can also fix now the
IOResource example to make it work in Python 3.8:
import io from typing import Protocol class IOResource(Protocol): def __init__(self, uri: str): pass def open(self) -> int: pass def close(self) -> None: pass class FileResource: def __init__(self, uri: str): self.uri = uri def open(self): self.file = io.FileIO(self.uri) return self.file.fileno() def close(self): self.file.close() def write_resource_to_disk(r: IOResource): pass write_resource_to_disk(FileResource("file.txt"))
If we try to run mypy on this file again, we should receive no error.
The only change that was required was to make the base class inherit from a new special class called
Protocol. This is a special class that enables the type system to go structural and not nominal.
Rundown of Protocols Features
Let's now consider this protocol class:
from typing import Protocol import io class IOResource(Protocol): uri: str def __init__(self, uri: str): pass def open(self) -> int: pass def close(self) -> None: pass
And let's use it to check what we can do when using Protocols.
Protocols are defined by including the special
typing.Protocol class in the base class list. The annotated class does not lose any semantics of a regular abstract base class; they are just handled specially by the type checker.
All the functions and variables of a Protocol class are also protocol members, no matter the decorator on the top. This means they have to present in the target classes in order to consider them compliant with the protocol:
import io class FileResource: data: str file: io.FileIO def __init__(self, uri: str): self.data = uri def open(self): self.file = io.FileIO(self.uri) return self.file.fileno() def close(self): self.file.close() write_resource_to_disk(FileResource("file.txt")) # Argument 1 to "write_resource_to_disk" has incompatible type "FileResource"; expected "IOResource"
When running this code through mypy, the type checker will detect the class is missing the
uri string property that's defined in the protocol class.
A class can also inherit directly from a Protocol class — making the relationship explicit. This does not change anything, and it is indeed not required (and discouraged):
- class FileResource: + class FileResource(IOResource):
A Protocol class can also aggregate other protocols if necessary:
- class IOResource(Protocol): + class IOResource(Sized, Protocol):
In this case, the IOResource is a protocol with the methods we have defined above and it's also a
Sized, meaning the
__len__ method must be implemented:
class FileResource(): uri: str file: io.FileIO + def __len__(self): + return len(self.file.readall())
It is also possible to create "aggregate" protocol classes, if necessary:
from typing import Protocol import io class IOResource(Protocol): uri: str def __init__(self, uri: str): pass def open(self) -> int: pass def close(self) -> None: pass class SizedIOResource(IOResource, Sized, Protocol): pass
Any class willing to adhere to the
SizedIOResource protocol has to implement all the methods in
IOResource and the
__len__ method as well.
Protocol classes cannot be instantiated, and both a typing error and a runtime error will be thrown since they're internally abstract classes:
q = IOResource("/dev/file.txt") # Cannot instantiate abstract class 'IOResource' with abstract attributes '__len__' and 'uri'
Protocol classes are very useful to keep some idiomatic patterns in Python, even in typed contexts. While the type system is yet not complete, this is an important milestone to close the loop of the story.