Advanced attribute access patterns

When many C++ and Java programmers first learn Python, they are surprised by Python's lack of a private keyword. The nearest concept is name mangling. Every time an attribute is prefixed by __, it is renamed by the interpreter on the fly:

class MyClass:
    __secret_value = 1

Accessing the __secret_value attribute by its initial name will raise an AttributeError exception:

>>> instance_of = MyClass()
>>> instance_of.__secret_value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyClass' object has no attribute '__secret_value'
>>> dir(MyClass)
['_MyClass__secret_value', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']
>>> instance_of._MyClass__secret_value
1

This feature is provided to avoid name collision under inheritance, as the attribute is renamed with the class name as a prefix. It is not a real lock, since the attribute can be accessed through its composed name. This feature could be used to protect the access of some attributes, but in practice, __ should never be used. When an attribute is not public, the convention to use is a _ prefix. This does not call any mangling algorithm, but just documents the attribute as a private element of the class and is the prevailing style.

Other mechanisms are available in Python to build the public part of the class together with the private code. The descriptors and properties that are the key features to OOP design should be used to design a clean API.

Descriptors

A descriptor lets you customize what should be done when you refer to an attribute on an object.

Descriptors are the base of a complex attribute access in Python. They are used internally to implement properties, methods, class methods, static methods, and the super type. They are classes that define how attributes of another class can be accessed. In other words, a class can delegate the management of an attribute to another one.

The descriptor classes are based on three special methods that form the descriptor protocol:

  • __set__(self, obj, type=None): This is called whenever the attribute is set. In the following examples, we will refer to this as a setter.
  • __get__(self, obj, value): This is called whenever the attribute is read (referred to as a getter).
  • __delete__(self, obj): This is called when del is invoked on the attribute.

A descriptor that implements __get__() and __set__() is called a data descriptor. If it just implements __get__(), then it is called a non-data descriptor.

Methods of this protocol are in fact called by the object's special __getattribute__() method (do not confuse it with __getattr__(), which has a different purpose) on every attribute lookup. Whenever such a lookup is performed, either by using dotted notation in the form of instance.attribute or by using the getattr(instance, 'attribute') function call, the __getattribute__() method is implicitly invoked and it looks for an attribute in the following order:

  1. It verifies if the attribute is a data descriptor on the class object of the instance.
  2. If not, it looks to see if the attribute can be found in the __dict__ of the instance object.
  3. Finally, it looks to see if the attribute is a non-data descriptor on the class object of the instance.

In other words, data descriptors take precedence over __dict__ lookup and __dict__ lookup takes precedence over non-data descriptors.

To make it more clear, here is an example from the official Python documentation that shows how descriptors work on real code:

class RevealAccess(object):
    """A data descriptor that sets and returns values
       normally and prints a message logging their access.
    """

    def __init__(self, initval=None, name='var'):
        self.val = initval
        self.name = name

    def __get__(self, obj, objtype):
        print('Retrieving', self.name)
        return self.val

    def __set__(self, obj, val):
        print('Updating', self.name)
        self.val = val


class MyClass(object):
    x = RevealAccess(10, 'var "x"')
    y = 5

And here is an example of using it in the interactive session:

>>> m = MyClass()
>>> m.x
Retrieving var "x"
10
>>> m.x = 20
Updating var "x"
>>> m.x
Retrieving var "x"
20
>>> m.y
5

The preceding example clearly shows that if a class has the data descriptor for the given attribute, then the descriptor's __get__() method is called to return the value every time the instance attribute is retrieved, and __set__() is called whenever a value is assigned to such an attribute. Although the case for the descriptor's __del__ method is not shown in the preceding example, it should be obvious now: it is called whenever an instance attribute is deleted with the del instance.attribute statement or the delattr(instance, 'attribute') call.

The difference between data and non-data descriptors is important due to the fact stated at the beginning. Python already uses the descriptor protocol to bind class functions to instances as a methods. They also power the mechanism behind the classmethod and staticmethod decorators. This is because, in fact, the function objects are non-data descriptors too:

>>> def function(): pass
>>> hasattr(function, '__get__')
True
>>> hasattr(function, '__set__')
False

And this is also true for functions created with lambda expressions:

>>> hasattr(lambda: None, '__get__')
True
>>> hasattr(lambda: None, '__set__')
False

So, without __dict__ taking precedence over non-data descriptors, we would not be able to dynamically override specific methods on already constructed instances at runtime. Fortunately, thanks to how descriptors work in Python. It is available, so developers may use a popular technique called monkey-patching to change the way how instances work without the need of subclassing.

Real-life example – lazily evaluated attributes

One example usage of descriptors may be to delay initialization of the class attribute to the moment when it is accessed from the instance. This may be useful if initialization of such attributes depends on the global application context. The other case is when such initialization is simply expensive but it is not known whether it will be used anyway when the class is imported. Such a descriptor could be implemented as follows:

class InitOnAccess:
    def __init__(self, klass, *args, **kwargs):
        self.klass = klass
        self.args = args
        self.kwargs = kwargs
        self._initialized = None

    def __get__(self, instance, owner):
        if self._initialized is None:
            print('initialized!')
            self._initialized = self.klass(*self.args, **self.kwargs)
        else:
            print('cached!')
        return self._initialized

And here is example usage:

>>> class MyClass:
...     lazily_initialized = InitOnAccess(list, "argument")
...
>>> m = MyClass()
>>> m.lazily_initialized
initialized!
['a', 'r', 'g', 'u', 'm', 'e', 'n', 't']
>>> m.lazily_initialized
cached!
['a', 'r', 'g', 'u', 'm', 'e', 'n', 't']

The official OpenGL Python library available on PyPI under the PyOpenGL name uses a similar technique to implement lazy_property that is both a decorator and a data descriptor:

class lazy_property(object):
    def __init__(self, function):
        self.fget = function

    def __get__(self, obj, cls):
        value = self.fget(obj)
        setattr(obj, self.fget.__name__, value)
        return value

Such an implementation is similar to using the property decorator (described later), but the function that is wrapped with it is executed only once and then the class attribute is replaced with a value returned by such a property. Such a technique is often useful when the developer needs to fulfill the following two requirements at the same time:

  • An object instance needs to be stored as a class attribute shared between its instances to save resources
  • This object cannot be initialized on import time because its creation process depends on some global application state/context

In the case of applications written using OpenGL, this is very often true. For example, the creation of shaders in OpenGL is expensive because it requires compilation of code written in GLSL (OpenGL Shading Language). It is reasonable to create them only once and include their definition in close proximity to classes that require them. On the other hand, shader compilation cannot be performed without having initialized the OpenGL context, so it is hard to define and compile them reliably in global module namespace at import time.

The following example shows the possible usage of the modified version of PyOpenGL's lazy_property decorator (here lazy_class_attribute) in some imaginary OpenGL-based application. The highlighted change to the original lazy_property decorator was required in order to allow sharing the attribute between different class instances:

import OpenGL.GL as gl
from OpenGL.GL import shaders


class lazy_class_attribute(object):
    def __init__(self, function):
        self.fget = function
    def __get__(self, obj, cls):
        value = self.fget(obj or cls)
        # note: storing in class object not its instance
        #       no matter if its a class-level or
        #       instance-level access
        setattr(cls, self.fget.__name__, value)
        return value


class ObjectUsingShaderProgram(object):
    # trivial pass-through vertex shader implementation
    VERTEX_CODE = """
        #version 330 core
        layout(location = 0) in vec4 vertexPosition;
        void main(){
            gl_Position =  vertexPosition;
        }
    """
    # trivial fragment shader that results in everything
    # drawn with white color
    FRAGMENT_CODE = """
        #version 330 core
        out lowp vec4 out_color;
        void main(){
            out_color = vec4(1, 1, 1, 1);
        }
    """

    @lazy_class_attribute
    def shader_program(self):
        print("compiling!")
        return shaders.compileProgram(
            shaders.compileShader(
                self.VERTEX_CODE, gl.GL_VERTEX_SHADER
            ),
            shaders.compileShader(
                self.FRAGMENT_CODE, gl.GL_FRAGMENT_SHADER
            )
        )

Like every advanced Python syntax feature, this one should also be used with caution and documented well in code. For unexperienced developers, the altered class behavior might be very confusing and unexpected because descriptors affect the very basic part of class behavior such as attribute access. Because of that, it is very important to make sure that all team members are familiar with descriptors and understand this concept well if it plays an important role in the project's codebase.

Properties

The properties provide a built-in descriptor type that knows how to link an attribute to a set of methods. A property takes four optional arguments: fget, fset, fdel, and doc. The last one can be provided to define a docstring that is linked to the attribute as if it were a method. Here is an example of a Rectangle class that can be controlled either by direct access to attributes that store two corner points or by using the width, and height properties:

class Rectangle:
    def __init__(self, x1, y1, x2, y2):
        self.x1, self.y1 = x1, y1
        self.x2, self.y2 = x2, y2

    def _width_get(self):
        return self.x2 - self.x1

    def _width_set(self, value):
        self.x2 = self.x1 + value

    def _height_get(self):
        return self.y2 - self.y1

    def _height_set(self, value):
        self.y2 = self.y1 + value

    width = property(
        _width_get, _width_set,
        doc="rectangle width measured from left"
    )
    height = property(
        _height_get, _height_set,
        doc="rectangle height measured from top"
    )

    def __repr__(self):
        return "{}({}, {}, {}, {})".format(
            self.__class__.__name__,
            self.x1, self.y1, self.x2, self.y2
        )

The example usage of such defined properties in an interactive session is as follows:

>>> rectangle = Rectangle(10, 10, 25, 34)
>>> rectangle.width, rectangle.height
(15, 24)
>>> rectangle.width = 100
>>> rectangle
Rectangle(10, 10, 110, 34)
>>> rectangle.height = 100
>>> rectangle
Rectangle(10, 10, 110, 110)
help(Rectangle)
Help on class Rectangle in module chapter3:

class Rectangle(builtins.object)
 |  Methods defined here:
 |  
 |  __init__(self, x1, y1, x2, y2)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  --------------------------------------------------------
 |  Data descriptors defined here:
 |  (...)
 |  
 |  height
 |      rectangle height measured from top
 |  
 |  width
 |      rectangle width measured from left

The properties make it easier to write descriptors, but must be handled carefully when using inheritance over classes. The created attribute is made on the fly using the methods of the current class and will not use methods that are overridden in the derived classes.

For instance, the following example will fail to override the implementation of the fget method of the parent's class (Rectangle) width property:

>>> class MetricRectangle(Rectangle):
...     def _width_get(self):
...         return "{} meters".format(self.x2 - self.x1)
...         
>>> Rectangle(0, 0, 100, 100).width
100

In order to solve this, the whole property simply needs to be overwritten in the derived class:

>>> class MetricRectangle(Rectangle):
...     def _width_get(self):
...         return "{} meters".format(self.x2 - self.x1)
...     width = property(_width_get, Rectangle.width.fset)
...     
>>> MetricRectangle(0, 0, 100, 100).width
'100 meters'

Unfortunately, the preceding code has some maintainability issues. It can be a source of issue if the developer decides to change the parent class, but forgets about updating the property call. This is why overriding only parts of the property behavior is not advised. Instead of relying on the parent class's implementation, it is recommended to rewrite all the property methods in the derived classes, if there is need to change how they work. In most cases, this is the only option anyway, because usually the change to property setter behavior implies a change to the behavior of the getter as well.

Due to the preceding reason, the best syntax for creating properties is using property as a decorator. This will reduce the number of method signatures inside of the class and make code more readable and maintainable:

class Rectangle:
    def __init__(self, x1, y1, x2, y2):
        self.x1, self.y1 = x1, y1
        self.x2, self.y2 = x2, y2
    @property
    def width(self):
        """rectangle height measured from top"""
        return self.x2 - self.x1

    @width.setter
    def width(self, value):
        self.x2 = self.x1 + value

    @property
    def height(self):
        """rectangle height measured from top"""
        return self.y2 - self.y1
    
    @height.setter
    def height(self, value):
        self.y2 = self.y1 + value

Slots

An interesting feature that is almost never used by developers is slots. They allow you to set a static attribute list for a given class with the __slots__ attribute, and skip the creation of the __dict__ dictionary in each instance of the class. They were intended to save memory space for classes with very few attributes, since __dict__ is not created at every instance.

Besides this, they can help to design classes whose signature needs to be frozen. For instance, if you need to restrict the dynamic features of the language over a class, defining slots can help:

>>> class Frozen:
...     __slots__ = ['ice', 'cream']
...     
>>> '__dict__' in dir(Frozen)
False
>>> 'ice' in dir(Frozen)
True
>>> frozen = Frozen()
>>> frozen.ice = True
>>> frozen.cream = None
>>> frozen.icy = True
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AttributeError: 'Frozen' object has no attribute 'icy'

This feature should be used carefully. When a set of available attributes is limited using __slots__, it is much harder to add something to the object dynamically. Some techniques, such as monkey-patching, will not work with instances of classes that have slots defined. Fortunately, the new attributes can be added to the derived class if it does not have its own slots defined:

>>> class Unfrozen(Frozen):
...     pass
...     
>>> unfrozen = Unfrozen()
>>> unfrozen.icy = False
>>> unfrozen.icy
False
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset