Simplifying

To reduce the complexity of code, the way data is stored is fundamental. You should pick your data structure carefully. This section provides a few examples on how the performance of simple code snippets can be improved by the proper datatypes for the job.

Searching in a list

Due to implementation details of the list type in Python, searching for a specific value in a list isn't a cheap operation. The complexity of the list.index() method is O(n), where n is the number of list elements. Such linear complexity is not especially bad if you don't need to perform many element index lookups, but it can have a negative performance impact if there is a need for many such operations.

If you need fast search over a list, you can try the bisect module from the Python standard library. The functions in this module are mainly designed for inserting or finding insertion indexes for given values in a way that will preserve the order of the already sorted sequence. Anyway, they can be used for efficiently finding element indexes with a bisection algorithm. Here is the recipe from the official documentation of the function that finds the element index using a binary search:

def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return i
    raise ValueError

Note that every function from the bisect module requires a sorted sequence in order to work. If your list is not in the correct order, then sorting it is a task with at least O(n log n) complexity. This is a worse class than O(n), so sorting the whole list for performing only a single search will definitely not pay off. However, if you need to perform a lot of index searches in a huge list that does not need to change often, then using a single sort operation bisect may be a very good trade off.

Also, if you already have a sorted list, you can insert new items into that list using bisect without needing to re-sort it.

Using a set instead of a list

When you need to build a sequence of distinct values out of a given sequence, the first algorithm that might come to your mind is:

>>> sequence = ['a', 'a', 'b', 'c', 'c', 'd']
>>> result = []
>>> for element in sequence:
...     if element not in result:
...         result.append(element)
... 
>>> result
['a', 'b', 'c', 'd']

The complexity is introduced by the lookup in the result list with the in operator that has the time complexity, O(n). It is then used in the loop, which costs O(n). So, the overall complexity is quadratic—O(n2).

Using a set type for the same work will be faster because the stored values are looked up using hashes same as in the dict type. Also, set ensures the uniqueness of elements, so we don't need to do anything more but create a new set from our sequence object. In other words, for each value in sequence, the time taken to see if it is already in the set will be constant:

>>> sequence = ['a', 'a', 'b', 'c', 'c', 'd']
>>> result = set(sequence)
>>> result
set(['a', 'c', 'b', 'd'])

This lowers the complexity to O(n), which is the complexity of the set object creation. The additional advantage is shorter and more explicit code.

Note

When you try to reduce the complexity of an algorithm, carefully consider your data structures. There are a range of built-in types, so pick the right one.

Cut the external calls, reduce the workload

A part of the complexity is introduced by calls to other functions, methods, and classes. In general, get as much of the code out of the loops as possible. This is doubly important for nested loops. Don't recalculate over and over those things that can be calculated before the loop even begins. Inner loops should be tight.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset