Chapter 11. Descriptors

Descriptors are classes which provide access control for the attributes of other classes. Any class that implements one or more of the descriptor special methods, __get__(), __set__(), and __delete__(), is called (and can be used as) a descriptor.

The built-in property() and classmethod() functions are implemented using descriptors. The key to understanding descriptors is that although we create an instance of a descriptor in a class as a class attribute, Python accesses the descriptor through the class’s instances.

To make things clear, let’s imagine that we have a class whose instances hold some strings. We want to access the strings in the normal way, for example, as a property, but we also want to get an XML-escaped version of the strings whenever we want. One simple solution would be that whenever a string is set we immediately create an XML-escaped copy. But if we had thousands of strings and only ever read the XML version of a few of them, we would be wasting a lot of processing and memory for nothing. So we will create a descriptor that will provide XML-escaped strings on demand without storing them. We will start with the beginning of the client (owner) class, that is, the class that uses the descriptor:

class Product:

    __slots__ = ("__name", "__description", "__price")

    name_as_xml = XmlShadow("name")
    description_as_xml = XmlShadow("description")

    def __init__(self, name, description, price):
        self.__name = name
        self.description = description
        self.price = price


The only code we have not shown are the properties; the name is a read-only property and the description and price are readable/writable properties, all set up in the usual way. (All the code is in the XmlShadow.py file.) We have used the __slots__ variable to ensure that the class has no __dict__ and can store only the three specified private attributes; this is not related to or necessary for our use of descriptors. The name_as_xml and description_as_xml class attributes are set to be instances of the XmlShadow descriptor. Although no Product object has a name_as_xml attribute or a description_as_xml attribute, thanks to the descriptor we can write code like this (here quoting from the module’s doctests):

>>> product = Product("Chisel <3cm>", "Chisel & cap", 45.25)
>>> product.name, product.name_as_xml, product.description_as_xml

('Chisel <3cm>', 'Chisel &lt;3cm&gt;', 'Chisel &amp; cap')


This works because when we try to access, for example, the name_as_xml attribute, Python finds that the Product class has a descriptor with that name, and so uses the descriptor to get the attribute’s value. Here’s the complete code for the XmlShadow descriptor class:

class XmlShadow:

    def __init__(self, attribute_name):
        self.attribute_name = attribute_name

    def __get__(self, instance, owner=None):
        return xml.sax.saxutils.escape(
                            getattr(instance, self.attribute_name))


When the name_as_xml and description_as_xml objects are created we pass the name of the Product class’s corresponding attribute to the XmlShadow initializer so that the descriptor knows which attribute to work on. Then, when the name_as_xml or description_as_xml attribute is looked up, Python calls the descriptor’s __get__() method. The self argument is the instance of the descriptor, the instance argument is the Product instance (i.e., the product’s self), and the owner argument is the owning class (Product in this case). We use the getattr() function to retrieve the relevant attribute from the product (in this case the relevant property), and return an XML-escaped version of it.

If the use case was that only a small proportion of the products were accessed for their XML strings, but the strings were often long and the same ones were frequently accessed, we could use a cache. For example:

class CachedXmlShadow:

    def __init__(self, attribute_name):
        self.attribute_name = attribute_name
        self.cache = {}

    def __get__(self, instance, owner=None):
        xml_text = self.cache.get(id(instance))
        if xml_text is not None:
            return xml_text
        return self.cache.setdefault(id(instance),
                xml.sax.saxutils.escape(
                            getattr(instance, self.attribute_name)))


We store the unique identity of the instance as the key rather than the instance itself because dictionary keys must be hashable (which IDs are), but we don’t want to impose that as a requirement on classes that use the CachedXmlShadow descriptor. The key is necessary because descriptors are created per class rather than per instance. (The dict.setdefault() method conveniently returns the value for the given key, or if no item with that key is present, creates a new item with the given key and value and returns the value.)

Having seen descriptors used to generate data without necessarily storing it, we will now look at a descriptor that can be used to store all of an object’s attribute data, with the object not needing to store anything itself. In the example, we will just use a dictionary, but in a more realistic context, the data might be stored in a file or a database. Here’s the start of a modified version of the Point class that makes use of the descriptor (from the ExternalStorage.py file):

class Point:

    __slots__ = ()
    x = ExternalStorage("x")
    y = ExternalStorage("y")

    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y


By setting __slots__ to an empty tuple we ensure that the class cannot store any data attributes at all. When self.x is assigned to, Python finds that there is a descriptor with the name “x”, and so uses the descriptor’s __set__() method. Here is the complete ExternalStorage descriptor class:

class ExternalStorage:

    __slots__ = ("attribute_name",)
    __storage = {}

    def __init__(self, attribute_name):
        self.attribute_name = attribute_name

    def __set__(self, instance, value):
        self.__storage[id(instance), self.attribute_name] = value

    def __get__(self, instance, owner=None):
        if instance is None:
            return self
        return self.__storage[id(instance), self.attribute_name]


Each ExternalStorage object has a single data attribute, attribute_name, which holds the name of the owner class’s data attribute. Whenever an attribute is set we store its value in the private class dictionary, __storage. Similarly, whenever an attribute is retrieved we get it from the __storage dictionary.

As with all descriptor methods, self is the instance of the descriptor object and instance is the self of the object that contains the descriptor, so here self is an ExternalStorage object and instance is a Point object.

Although __storage is a class attribute, we can access it as self.__storage (just as we can call methods using self.method()), because Python will look for it as an instance attribute, and not finding it will then look for it as a class attribute. The one (theoretical) disadvantage of this approach is that if we have a class attribute and an instance attribute with the same name, one would hide the other. (If this were really a problem we could always refer to the class attribute using the class, that is, ExternalStorage.__storage. Although hard-coding the class does not play well with subclassing in general, it doesn’t really matter for private attributes since Python name-mangles the class name into them anyway.)

The implementation of the __get__() special method is slightly more sophisticated than before because we provide a means by which the ExternalStorage instance itself can be accessed. For example, if we have p = Point(3, 4), we can access the x-coordinate with p.x, and we can access the ExternalStorage object that holds all the xs with Point.x.

To complete our coverage of descriptors we will create the Property descriptor that mimics the behavior of the built-in property() function, at least for setters and getters. The code is in Property.py. Here is the complete NameAndExtension class that makes use of it:

class NameAndExtension:

    def __init__(self, name, extension):
        self.__name = name
        self.extension = extension

    @Property               # Uses the custom Property descriptor
    def name(self):
        return self.__name

    @Property               # Uses the custom Property descriptor
    def extension(self):
        return self.__extension

    @extension.setter       # Uses the custom Property descriptor
    def extension(self, extension):
        self.__extension = extension


The usage is just the same as for the built-in @property decorator and for the @propertyName.setter decorator. Here is the start of the Property descriptor’s implementation:

class Property:

    def __init__(self, getter, setter=None):
        self.__getter = getter
        self.__setter = setter
        self.__name__ = getter.__name__


The class’s initializer takes one or two functions as arguments. If it is used as a decorator, it will get just the decorated function and this becomes the getter, while the setter is set to None. We use the getter’s name as the property’s name. So for each property, we have a getter, possibly a setter, and a name.

        def __get__(self, instance, owner=None):
            if instance is None:
                return self
            return self.__getter(instance)


When a property is accessed we return the result of calling the getter function where we have passed the instance as its first parameter. At first sight, self.__getter() looks like a method call, but it is not. In fact, self.__getter is an attribute, one that happens to hold an object reference to a method that was passed in. So what happens is that first we retrieve the attribute (self.__getter), and then we call it as a function (). And because it is called as a function rather than as a method we must pass in the relevant self object explicitly ourselves. And in the case of a descriptor the self object (from the class that is using the descriptor) is called instance (since self is the descriptor object). The same applies to the __set__() method.

        def __set__(self, instance, value):
            if self.__setter is None:
                raise AttributeError("'{0}' is read-only".format(
                                     self.__name__))
            return self.__setter(instance, value)


If no setter has been specified, we raise an AttributeError; otherwise, we call the setter with the instance and the new value.

        def setter(self, setter):
            self.__setter = setter
            return self.__setter


This method is called when the interpreter reaches, for example, @extension.setter, with the function it decorates as its setter argument. It stores the setter method it has been given (which can now be used in the __set__() method), and returns the setter, since decorators should return the function or method they decorate.

We have now looked at three quite different uses of descriptors. Descriptors are a very powerful and flexible feature that can be used to do lots of under-the-hood work while appearing to be simple attributes in their client (owner) class.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset