Chapter 12. Optimization – Some Powerful Techniques

Optimizing a program is not a magical process. It is done by following a simple algorithm, synthesized by Stefan Schwarzer at Europython 2006 in his original pseudocode example:

def optimize():
    """Recommended optimization"""
    assert got_architecture_right(), "fix architecture"
    assert made_code_work(bugs=None), "fix bugs"
    while code_is_too_slow():
        wbn = find_worst_bottleneck(just_guess=False,
                                    profile=True)
        is_faster = try_to_optimize(wbn,
                                    run_unit_tests=True,
                                    new_bugs=None)
        if not is_faster:
            undo_last_code_change()

# By Stefan Schwarzer, Europython 2006

This example probably isn't the neatest and clearest one but captures pretty much all the important aspects of an organized optimization procedure. The main things we learn from it are:

  • Optimization is an iterative process where not every iteration leads to better results
  • The main prerequisite is code that is verified to be working properly with tests
  • You should always focus on optimizing the current application bottleneck

Making your code work faster is not an easy task. In case of abstract mathematical problems, the solution of course lies in choosing the right algorithm and proper data structures. But in that case, it is very hard to provide some generic tips and tricks that can be used in any code for solving algorithmic issues. There are of course some generic methodologies for designing a new algorithm, or even meta-heuristics that can be applied to a large variety of problems but they are pretty language-agnostic and thus are rather out of scope of this book.

Anyway, some performance issues are only caused by certain code quality defects or application usage context. For instance, the speed of the application might be reduced by:

  • Bad usage of basic built-in types
  • Too much complexity
  • Hardware resource usage patterns not matching with the execution environment
  • Waiting too long for responses from third-party APIs or backing services
  • Doing too much in time-critical parts of the application

More often, the solving of such performance issues does not require advanced academic knowledge but only good software craftsmanship. And a big part of craftsmanship is knowing when to use the proper tools. Fortunately, there are some well-known patterns and solutions for dealing with performance problems.

In this chapter, we will discuss some popular and reusable solutions that allow you to non-algorithmically optimize your program through:

  • Reducing the complexity
  • Using architectural trade offs
  • Caching

Reducing the complexity

Before we dig further into optimization techniques, let's define exactly what we are going to deal with. From the chapter's introduction, we know that focusing on improving application bottlenecks is critical for successful optimization. A bottleneck is a single component that severely limits the capacity of a program or computer system. An important characteristic of every piece of code with performance issues is that it usually has only a single bottleneck. We discussed some profiling techniques in the previous chapter, so you should already be familiar with the tools required to locate and isolate such places. If your profiling results show that there are few places that need immediate improvement, then you should at first try to treat each as a separate component and optimize independently.

Of course, if there is no explicit bottleneck but your application still performs under your expectations, then you are really in a bad position. The gains of the optimization process are proportional to the performance impact of optimized bottlenecks. Optimizing every small component that does not substantially contribute to the overall execution time or resource consumption will only give you minimal benefit for all the time spent on profiling and optimization. If your application does not seem to have real bottlenecks, there is a possibility that you have missed something. Try using different profiling strategies or tools or look at it from a different perspective (memory, I/O operations, or network throughput). If that does not help, you should really consider revising your software architecture.

But if you have successfully found a single and integral component that limits your application performance, then you are really lucky. There is high chance that with only minimal code improvement, you will be able to really improve code execution time and/or resource usage. And the gain from optimization will, again, be proportional to the bottleneck size.

The first and most obvious aspect to look after when trying to improve application performance is complexity. There are many definitions of what makes a program complex and many ways to express it. Some complexity metrics can provide objective information about how the code behaves and such information can sometimes be extrapolated into performance expectations. An experienced programmer can even reliably guess how two different implementations will perform in practice knowing their complexities and realistic execution contexts.

The two popular ways to define application complexity are:

  • Cyclomatic complexity that is very often correlated with application performance
  • The Landau notation, also known as big O notation, that is an algorithm classification method that is very useful in objectively judging performance

From there, the optimization process may be sometimes understood as a process of reducing the complexity. This section provides simple tips for this work by simplifying loops. But first of all, let's learn how to measure complexity.

Cyclomatic complexity

Cyclomatic complexity is a metric developed by Thomas J. McCabe in 1976. Because of its author, it is very often called McCabe's complexity. It measures the number of linear paths through the code. All if, for, and while loops are counted to come up with a measure.

The code can then be categorized as follows:

Cyclomatic Complexity

What it means

1 to 10

Not complex

11 to 20

Moderately complex

21 to 50

Really complex

More than 50

Too complex

Cyclomatic complexity is rather the code quality score than a metric that objectively judges its performance. It does not replace the need for code profiling for finding performance bottlenecks. Anyway, code that has high cyclomatic complexity often tends to utilize rather complex algorithms that may not perform well with larger inputs.

Although cyclomatic complexity is not a reliable way to judge application performance, it has one very nice advantage. It is a source code metric so it can be measured with proper tools. This cannot be said about other ways of expressing complexity—the big O notation. Thanks to measurability, cyclomatic complexity may be a useful addition to profiling that gives you more information about problematic parts of the software. Complex parts of code are the first to review when considering radical code architecture redesigns.

Measuring McCabe's complexity is relatively simple in Python because it can be deduced from its Abstract Syntax Tree. Of course, you don't need to do that by yourself. A popular tool that provides cyclomatic complexity measurement for Python is flake8 (with the mccabe plugin), which has already been introduced in Chapter 4, Choosing Good Names.

The big O notation

The most canonical method to define function complexity is the big O notation (see http://en.wikipedia.org/wiki/Big_O_notation). This metric defines how an algorithm is affected by the size of the input data. For instance, does the algorithm scale linearly with the size of the input data or quadratically?

Calculating the big O notation manually for an algorithm is the best approach to get an overview on how its performance is related with the size of the input data. Knowing the complexity of your application components gives you the ability to detect and focus on the parts that will really slow down the code.

To measure the big O notation, all constants and low-order terms are removed in order to focus on the portion that really weighs when the input data grows. The idea is to try to categorize the algorithm in one of these categories, even if it is an approximation:

Notation

Type

O(1)

Constant. Does not depend on the input data.

O(n)

Linear. Will grow as "n" grows.

O(n log n)

Quasi linear.

O(n2)

Quadratic complexity.

O(n3)

Cubic complexity.

O(n!)

Factorial complexity.

For instance, we already know from Chapter 2, Syntax Best Practices – below the Class Level, that a dict lookup has an average complexity of O(1). It is considered constant regardless of how many elements are in the dict, whereas looking through a list of items for a particular item is O(n).

Let's take another example:

>>> def function(n):
...     for i in range(n):
...         print(i)
...

In that case, the print statement will be executed n times. Loop speed will depend on n, so its complexity expresses using the big O notation will be O(n).

If the function has conditions, the correct notation to keep is the highest one:

>>> def function(n):
...     if some_test:
...         print('something')
...     else:
...         for i in range(n):
...             print(i)
... 

In this example, the function could be O(1) or O(n), depending on the test. But the worst case is O(n), so whole function complexity is O(n).

When discussing complexity expressed in big O notation, we usually review the worst case scenario. While this is the best method to define complexity when comparing two independent algorithms, it may not be the best approach in every practical situation. Many algorithms change the runtime performance depending on the statistical characteristic of input data or amortize the cost of worst case operations by doing clever tricks. This is why, in many cases, it may be better to review your implementation in terms of average complexity or amortized complexity.

For example, take a look at the operation of appending a single element to Python's list type instance. We know that list in CPython uses an array with overallocation for the internal storage instead of linked lists. In case an array is already full, appending a new element requires allocation of the new array and copying all existing elements (references) to a new area in the memory. If we look from the point of the worst-case complexity, it is clear that the list.append() method has O(n) complexity. And this is a bit expensive when compared to a typical implementation of the linked list structure.

But we also know that the CPython list type implementation uses overallocation to mitigate the complexity of such occasional reallocation. If we evaluate the complexity over a sequence of operations, we will see that the average complexity of list.append() is O(1) and this is actually a great result.

When solving problems, we often know a lot of details about our input data such as its size or statistical distribution. When optimizing the application, it is always worth using every bit of knowledge about your input data. Here, another problem of worst-case complexity starts to show up. It is intended to show the limiting behavior of the function when the input tends toward large values or infinity, rather than give a reliable performance approximation for real-life data. Asymptotic notation is great when defining the growth rate of a function but it won't give a reliable answer for the simple question: which implementation will take less time? Worst-case complexity dumps all those little details about both your implementation and data characteristic to show you how your program will behave asymptotically. It works for arbitrarily large inputs that you may not even need to consider.

For instance, let's assume that you have a problem to solve regarding data consisting of n independent elements. Let's suppose also that you know two different ways to solve this problem—program A and program B. You know that program A requires 100n2 operations to finish and program B requires 5n3 operations to give the problem a solution. Which one would you choose? When speaking about very large inputs, program A is of course the better choice because it behaves better asymptotically. It has O(n2) complexity compared to O(n3) complexity that characterizes program B.

But by solving a simple 100 n2 > 5 n3 inequality, we can find that program B will take fewer operations when n is less than 20. If we know a bit more about our input bounds, we can make slightly better decisions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset