Descriptors in Python

NOTE: This article assumes basic familiarity with object-oriented programming in Python.

A descriptor is an object that hooks into attribute retrievals, and possibly updates and deletes, on another object. This is a way to fully "own the dot (.) operator", beyond overriding __getattr__.

More formally, a descriptor is any object that has one or more of the __get__, __set__, and __delete__ special methods. If it has only the __get__ method, it is referred to as a non-data descriptor. If it has either __set__ or __delete__, it is known as a data descriptor.

To add a descriptor to an object, you place it as a class-level attribute in the object's class definition:

class Foo:
    descriptor = MyDescriptor(*init_args)

Examples

These examples are highly contrived to show that a lot of things can be done with descriptors, even weird things.

(NDD -> Non-Data Descriptor, DD -> Data Descriptor).

1.) For this non-data descriptor, we suppose it gets a string attribute foo on an object and returns its reverse. This is to show a common use case of descriptors: they can encapsulate attributes whose values depend on processing other attributes.

class ReversedFooNDD:
    def __get__(self, obj, objtype):
        if obj is None:
            # `obj` != None implies the descriptor is retrieved on
            # an instance `obj`.
            # Retrieving the descriptor as `<class>.descriptor` has
            # `obj` being passed as `None`.
            # `objtype` refers to the class that has the descriptor
            # in both cases.
            #
            # If the descriptor is retrieved as `<class>.descriptor`,
            # return the descriptor.
            return self
        # Else, return the reverse of `obj.foo`.
        return obj.foo[::-1]

2.) Suppose we have an attribute on an instance that we want to track its values. We could do this by printing its values and appending them to a file anytime it's updated. We could use a data descriptor with __set__ to encapsulate this.

class TrackedAttrDD:
    # You can store class attributes on a descriptor class.
    # They can be extended like regular classes.
    write_path = "write_path.ext"

    # Recall that __get__ isn't required for a data descriptor,
    # only __set__ or __delete__ or both.
    def __set__(self, obj, value):
        with open(self.write_path, "a") as f:
            print(value)
            f.write(value)

3.) Suppose we have an attribute that when deleted (i.e. del obj.attr) should delete a file at a certain file path. We could use a data descriptor that references the file path and has a __delete__ method to encapsulate this.

import os


class DeletableFileDD:
    def __init__(self, filepath):
        # Yes, descriptor classes can have `__init__`.
        # They're still just classes.
        self.filepath = filepath

    def __delete__(self, obj):
        try:
            os.unlink(self.filepath)
            print(f"Deleted file {self.filepath}")
        except FileNotFoundError:
            pass

Now let's place these descriptors in a class that uses them.

class UsesDescriptors:
    # Descriptors are placed at the class level.
    reversed_foo = ReversedFooNDD()
    tracked_attr = TrackedAttrDD()
    deletable_file = DeletableFileDD("my_path.ext")

    def __init__(self, foo):
        self.foo = foo


uses_descriptors = UsesDescriptors(foo="reverse_me")

uses_descriptors.reversed_foo  # Returns "em_esrever".
uses_descriptors.tracked_attr = 2  # Updates file "write_path.ext".

The major difference between data and non-data descriptors is in the priority given to each during attribute retrievals for objects that use them. The priority is data descriptor > instance attribute > non-data-descriptor.

For example, suppose we have attributes matching the names of the descriptors above in the __init__ method of the user class.

class UsesDescriptors:
    reversed_foo = ReverseFooNDD()
    tracked_attr = TrackedAttrDD()
    deletable_file = DeletableFileDD("my_path.ext")

    def __init__(self, foo="reverse_me"):
        self.foo = foo
        self.reversed_foo = "reversed_foo"
        self.tracked_attr = "tracked_attr"
        self.deletable_file = "deletable_file"

Now let's create a descriptor user object.

uses_descriptors = UsesDescriptors()

"tracked_attr" is printed. Also, "write_path.ext" is written to. Why? Recall TrackedAttrDD's __set__ implementation.

def __set__(self, obj, value):
        with open(self.write_path, "a") as f:
            print(value)
            f.write(value)

In UsesDescriptor's __init__, self.tracked_attr still refers to the tracked_attr descriptor at the class level, since data descriptors have the highest priority in attribute retrievals on objects.

Continuing, AttributeError is raised when the interpreter gets to the self.deletable_file = ... line since the DeletableFileDD instance is still referenced and doesn't have a __set__ method.

Now what happens when we retrieve reversed_foo?

print(uses_descriptors.reversed_foo)

This prints "reversed_foo" since an instance attribute takes precedence over a non-data descriptor.

Suppose we don't want references to ReversedFooNDD instances to be settable, we could add a __set__ method that raises an AttributeError. This essentially converts it to a data descriptor, giving it the highest priority in the attribute retrieval chain.

class ReversedFooDD:
    def __get__(self, obj, objtype):
        ...  # Same as before.

    def __set__(self, obj, value):
        raise AttributeError


class UsesDescriptors:
    reversed_foo = ReversedFooDD()
    ... # Same as before.

    def __init__(self, foo="reverse_me"):
        ...  # Same as before.
        self.reversed_foo = "reversed_foo"
        ...  # Same as before.


uses_descriptors = UsesDescriptors()

On getting to the self.reversed_foo = ... line, AttributeError is raised since reversed_foo now refers to a data descriptor (ReversedFooDD) with its __set__ method set to raise the error.

Cached property example

Suppose we have a property that is expensive to compute for an object. We can cache its result when it's first computed and subsequently retrieve it from this cache.

The cached_property class below is a way to achieve this with descriptors.

class cached_property:
    def __init__(self, method):
        self.method_func = method

    def __set_name__(self, owner, name):
        # This method is called when a user class with this
        # descriptor is defined.
        #
        # `owner` is the user class object. `name` is the class-
        # level attribute name for the descriptor.
        #
        # Set an attribute on the descriptor instance based on name.
        print(name)  # Unnecessary. Just for the sake of explanation.
        self.name = f"_{name}_cache"

    def __get__(self, obj, objtype):
        if obj is None:
            return self
        if hasattr(obj, self.name):
            return getattr(obj, self.name)
        result = self.method_func(obj)
        setattr(obj, self.name, result)
        return result

    def __set__(self, obj, value):
        raise AttributeError

    def __delete__(self, obj):
        # We only clear the cache.
        try:
            delattr(obj, self.name)
        except AttributeError:
            pass


class Foo:
    # Using the decorator syntax.
    @cached_property
    def expensive_property1(self):
        # Do expensive computation and return it.
        ...

    def expensive_property2(self):
        # Do expensive computation and return it.
        ...

    # An equivalent method of definition.
    expensive_property2 = cached_property(expensive_property2)


foo = Foo()

Pay attention to the newly introduced __set_name__ method that can be used in any descriptor. If the script containing this file is run, "expensive_property1" and "expensive_property2" get printed due to the print(name) line in the __set_name__ method---these are the names assigned to the descriptors placed at the class level.

foo.expensive_property1 would first compute the expensive property and then store it. Subsequent retrievals would be from this cache. (Same for foo.expensive_property2.)

Conclusion and further reading

The descriptor protocol is behind a lot of key implementations in Python. Bound methods, property, staticmethod, classmethod, are all implemented with the descriptor protocol. Even Django's implementation of its Object Relational Mapping uses this protocol.

See more on this in the official Python documentation for descriptors: Python docs descriptor guide.