Reducing boilerplate with data classes

Before we dive deeper into details of Python classes, we will take a small detour. We will discuss a relatively new addition to the Python language, which are data classes. The dataclasses module, introduced in Python 3.7, provides a decorator and function that allows you to easily add generated special methods to your own classes.

Consider the following example. We are building a program that does some geometric computation and want to have a class that allows us to hold information about two-dimensional vectors. We will display the data of the vectors on the screen and perform common mathematical operations, such as addition, subtraction, and equality comparison. We already know that we can use special methods to achieve that goal. We can implement our Vector class as follows:

class Vector:
def __init__(self, x, y):
self.x = x
self.y = y

def __add__(self, other):
"""Add two vectors using + operator"""
return Vector(
self.x + other.x,
self.y + other.y,
)

def __sub__(self, other):
"""Subtract two vectors using - operator"""
return Vector(
self.x - other.x,
self.y - other.y,
)

def __repr__(self):
"""Return textual representation of vector"""
return f"<Vector: x={self.x}, y={self.y}>"

def __eq__(self, other):
"""Compare two vectors for equality"""
return self.x == other.x and self.y == other.y

The following is the interactive session example that shows how it behaves when used with common operators:

>>> Vector(2, 3)
<Vector: x=2, y=3>
>>> Vector(5, 3) + Vector(1, 2)
<Vector: x=6, y=5>
>>> Vector(5, 3) - Vector(1, 2)
<Vector: x=4, y=1>
>>> Vector(1, 1) == Vector(2, 2)
False
>>> Vector(2, 2) == Vector(2, 2)
True

The preceding vector implementation is quite simple, but involves a lot of repetitive code that could be avoided. If your program uses many similar simple classes that do not require complex initialization, you'll end up writing a lot of boilerplate code just for the __init__(), __repr__(), and __eq__() methods.

With the dataclasses module, we can make our Vector class code a lot shorter:

from dataclasses import dataclass


@dataclass
class Vector:
x: int
y: int

def __add__(self, other):
"""Add two vectors using + operator"""
return Vector(
self.x + other.x,
self.y + other.y,
)

def __sub__(self, other):
"""Subtract two vectors using - operator"""
return Vector(
self.x - other.x,
self.y - other.y,
)

The dataclass class decorator reads annotations of the Vector class attribute and automatically creates the __init__(), __repr__(), and __eq__() methods. The default equality comparison assumes that two instances are equal if all their respective attributes are equal to each other.

But that's not all. Data classes offer many useful features. They can easily be made compatible with other Python protocols, too. Let's assume we want our Vector class instances to be immutable. Thanks to this, they could be used as dictionary keys and as content sets. You can do this by simply adding a frozen=True argument to the dataclass decorator, as in the following example:

@dataclass(frozen=True)
class FrozenVector:
x: int
y: int

Such a frozen Vector data class becomes completely immutable, so you won't be able to modify any of its attributes. You can still add and subtract two Vector instances as in our example; these operations simply create new Vector objects. 

The final piece of useful information we will cover about data classes in this chapter is that you can define default values for specific attributes using the field() constructor. You can use both static values and constructors of other objects. Consider the following example:

>>> @dataclass
... class DataClassWithDefaults:
... static_default: str = field(default="this is static default value")
... factory_default: list = field(default_factory=list)
...
>>> DataClassWithDefaults()
DataClassWithDefaults(static_default='this is static default value', factory_default=[])

The next section discusses subclassing built-in types.