Descriptors in Python
Table of contents
NOTE: This article assumes basic familiarity with object-oriented programming in Python.
A descriptor is an object that hooks into attribute retrievals, and possibly updates and deletes, on another object. This is a way to fully "own the dot (.) operator", beyond overriding __getattr__
.
More formally, a descriptor is any object that has one or more of the __get__
, __set__
, and __delete__
special methods. If it has only the __get__
method, it is referred to as a non-data descriptor. If it has either __set__
or __delete__
, it is known as a data descriptor.
To add a descriptor to an object, you place it as a class-level attribute in the object's class definition:
class Foo:
descriptor = MyDescriptor(*init_args)
Examples
These examples are highly contrived to show that a lot of things can be done with descriptors, even weird things.
(NDD -> Non-Data Descriptor, DD -> Data Descriptor
).
1.) For this non-data descriptor, we suppose it gets a string attribute foo
on an object and returns its reverse. This is to show a common use case of descriptors: they can encapsulate attributes whose values depend on processing other attributes.
class ReversedFooNDD:
def __get__(self, obj, objtype):
if obj is None:
# `obj` != None implies the descriptor is retrieved on
# an instance `obj`.
# Retrieving the descriptor as `<class>.descriptor` has
# `obj` being passed as `None`.
# `objtype` refers to the class that has the descriptor
# in both cases.
#
# If the descriptor is retrieved as `<class>.descriptor`,
# return the descriptor.
return self
# Else, return the reverse of `obj.foo`.
return obj.foo[::-1]
2.) Suppose we have an attribute on an instance that we want to track its values. We could do this by printing its values and appending them to a file anytime it's updated. We could use a data descriptor with __set__
to encapsulate this.
class TrackedAttrDD:
# You can store class attributes on a descriptor class.
# They can be extended like regular classes.
write_path = "write_path.ext"
# Recall that __get__ isn't required for a data descriptor,
# only __set__ or __delete__ or both.
def __set__(self, obj, value):
with open(self.write_path, "a") as f:
print(value)
f.write(value)
3.) Suppose we have an attribute that when deleted (i.e. del obj.attr
) should delete a file at a certain file path. We could use a data descriptor that references the file path and has a __delete__
method to encapsulate this.
import os
class DeletableFileDD:
def __init__(self, filepath):
# Yes, descriptor classes can have `__init__`.
# They're still just classes.
self.filepath = filepath
def __delete__(self, obj):
try:
os.unlink(self.filepath)
print(f"Deleted file {self.filepath}")
except FileNotFoundError:
pass
Now let's place these descriptors in a class that uses them.
class UsesDescriptors:
# Descriptors are placed at the class level.
reversed_foo = ReversedFooNDD()
tracked_attr = TrackedAttrDD()
deletable_file = DeletableFileDD("my_path.ext")
def __init__(self, foo):
self.foo = foo
uses_descriptors = UsesDescriptors(foo="reverse_me")
uses_descriptors.reversed_foo # Returns "em_esrever".
uses_descriptors.tracked_attr = 2 # Updates file "write_path.ext".
The major difference between data and non-data descriptors is in the priority given to each during attribute retrievals for objects that use them. The priority is data descriptor > instance attribute > non-data-descriptor.
For example, suppose we have attributes matching the names of the descriptors above in the __init__
method of the user class.
class UsesDescriptors:
reversed_foo = ReverseFooNDD()
tracked_attr = TrackedAttrDD()
deletable_file = DeletableFileDD("my_path.ext")
def __init__(self, foo="reverse_me"):
self.foo = foo
self.reversed_foo = "reversed_foo"
self.tracked_attr = "tracked_attr"
self.deletable_file = "deletable_file"
Now let's create a descriptor user object.
uses_descriptors = UsesDescriptors()
"tracked_attr"
is printed. Also, "write_path.ext"
is written to. Why? Recall TrackedAttrDD
's __set__
implementation.
def __set__(self, obj, value):
with open(self.write_path, "a") as f:
print(value)
f.write(value)
In UsesDescriptor
's __init__
, self.tracked_attr
still refers to the tracked_attr
descriptor at the class level, since data descriptors have the highest priority in attribute retrievals on objects.
Continuing, AttributeError
is raised when the interpreter gets to the self.deletable_file = ...
line since the DeletableFileDD
instance is still referenced and doesn't have a __set__
method.
Now what happens when we retrieve reversed_foo
?
print(uses_descriptors.reversed_foo)
This prints "reversed_foo"
since an instance attribute takes precedence over a non-data descriptor.
Suppose we don't want references to ReversedFooNDD
instances to be settable, we could add a __set__
method that raises an AttributeError
. This essentially converts it to a data descriptor, giving it the highest priority in the attribute retrieval chain.
class ReversedFooDD:
def __get__(self, obj, objtype):
... # Same as before.
def __set__(self, obj, value):
raise AttributeError
class UsesDescriptors:
reversed_foo = ReversedFooDD()
... # Same as before.
def __init__(self, foo="reverse_me"):
... # Same as before.
self.reversed_foo = "reversed_foo"
... # Same as before.
uses_descriptors = UsesDescriptors()
On getting to the self.reversed_foo = ...
line, AttributeError
is raised since reversed_foo
now refers to a data descriptor (ReversedFooDD
) with its __set__
method set to raise the error.
Cached property example
Suppose we have a property that is expensive to compute for an object. We can cache its result when it's first computed and subsequently retrieve it from this cache.
The cached_property
class below is a way to achieve this with descriptors.
class cached_property:
def __init__(self, method):
self.method_func = method
def __set_name__(self, owner, name):
# This method is called when a user class with this
# descriptor is defined.
#
# `owner` is the user class object. `name` is the class-
# level attribute name for the descriptor.
#
# Set an attribute on the descriptor instance based on name.
print(name) # Unnecessary. Just for the sake of explanation.
self.name = f"_{name}_cache"
def __get__(self, obj, objtype):
if obj is None:
return self
if hasattr(obj, self.name):
return getattr(obj, self.name)
result = self.method_func(obj)
setattr(obj, self.name, result)
return result
def __set__(self, obj, value):
raise AttributeError
def __delete__(self, obj):
# We only clear the cache.
try:
delattr(obj, self.name)
except AttributeError:
pass
class Foo:
# Using the decorator syntax.
@cached_property
def expensive_property1(self):
# Do expensive computation and return it.
...
def expensive_property2(self):
# Do expensive computation and return it.
...
# An equivalent method of definition.
expensive_property2 = cached_property(expensive_property2)
foo = Foo()
Pay attention to the newly introduced __set_name__
method that can be used in any descriptor. If the script containing this file is run, "expensive_property1" and "expensive_property2" get printed due to the print(name)
line in the __set_name__
method---these are the names assigned to the descriptors placed at the class level.
foo.expensive_property1
would first compute the expensive property and then store it. Subsequent retrievals would be from this cache. (Same for foo.expensive_property2
.)
Conclusion and further reading
The descriptor protocol is behind a lot of key implementations in Python. Bound methods, property
, staticmethod
, classmethod
, are all implemented with the descriptor protocol. Even Django's implementation of its Object Relational Mapping uses this protocol.
See more on this in the official Python documentation for descriptors: Python docs descriptor guide.