Finding bottlenecks

Finding bottlenecks is done by:

  • Profiling CPU usage
  • Profiling memory usage
  • Profiling network usage

Profiling CPU usage

The first source of bottlenecks is your code. The standard library provides all the tools needed to perform code profiling. They are based on a deterministic approach.

A deterministic profiler measures the time spent in each function by adding a timer at the lowest level. This introduces a bit of overhead but provides a good idea on where the time is consumed. A statistical profiler, on the other hand, samples the instruction pointer usage and does not instrument the code. The latter is less accurate but allows running the target program at full speed.

There are two ways to profile the code:

  • Macro-profiling: This profiles the whole program while it is being used and generates statistics
  • Micro-profiling: This measures a precise part of the program by instrumenting it manually

Macro-profiling

Macro-profiling is done by running the application in a special mode where the interpreter is instrumented to collect statistics on the code usage. Python provides several tools for this:

  • profile: This is a pure Python implementation
  • cProfile: This is a C implementation that provides the same interface as that of the profile tool but has less overhead

The recommended choice for most Python programmers is cProfile due to its reduced overhead. Anyway, if you need to extend the profiler in some way, then profile will probably be a better choice because it does not use C extensions.

Both tools have the same interface and usage, so we will use only one of them to show how they work. The following is a myapp.py module with a main function that we are going to test with cProfile:

import time


def medium():
    time.sleep(0.01)


def light():
    time.sleep(0.001)


def heavy():
    for i in range(100):
        light()
        medium()
        medium()
    time.sleep(2)


def main():
    for i in range(2):
        heavy()

if __name__ == '__main__':
    main()

The module can be called directly from the prompt and the results are summarized here:

$ python3 -m cProfile myapp.py
         1208 function calls in 8.243 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.001    0.000    8.243    4.121 myapp.py:13(heavy)
        1    0.000    0.000    8.243    8.243 myapp.py:2(<module>)
        1    0.000    0.000    8.243    8.243 myapp.py:21(main)
      400    0.001    0.000    4.026    0.010 myapp.py:5(medium)
      200    0.000    0.000    0.212    0.001 myapp.py:9(light)
        1    0.000    0.000    8.243    8.243 {built-in method exec}
      602    8.241    0.014    8.241    0.014 {built-in method sleep}

The statistics provided are a print view of a statistic object filled by the profiler. A manual invocation of the tool can be:

>>> import cProfile
>>> from myapp import main
>>> profiler = cProfile.Profile()
>>> profiler.runcall(main)
>>> profiler.print_stats()
         1206 function calls in 8.243 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall file:lineno(function)
        2    0.001    0.000    8.243    4.121 myapp.py:13(heavy)
        1    0.000    0.000    8.243    8.243 myapp.py:21(main)
      400    0.001    0.000    4.026    0.010 myapp.py:5(medium)
      200    0.000    0.000    0.212    0.001 myapp.py:9(light)
      602    8.241    0.014    8.241    0.014 {built-in method sleep}

The statistics can also be saved in a file and then read by the pstats module. This module provides a class that knows how to handle profile files and gives a few helpers to play with them invocation:

>>> import pstats
>>> import cProfile
>>> from myapp import main
>>> cProfile.run('main()', 'myapp.stats')
>>> stats = pstats.Stats('myapp.stats')
>>> stats.total_calls
1208
>>> stats.sort_stats('time').print_stats(3)
Mon Apr  4 21:44:36 2016    myapp.stats

         1208 function calls in 8.243 seconds

   Ordered by: internal time
   List reduced from 8 to 3 due to restriction <3>

   ncalls  tottime  percall  cumtime  percall file:lineno(function)
      602    8.241    0.014    8.241    0.014 {built-in method sleep}
      400    0.001    0.000    4.025    0.010 myapp.py:5(medium)
        2    0.001    0.000    8.243    4.121 myapp.py:13(heavy)

From there, you can browse the code by printing out the callers and callees for each function:

>>> stats.print_callees('medium')
   Ordered by: internal time
   List reduced from 8 to 1 due to restriction <'medium'>

Function           called...
                    ncalls  tottime  cumtime
myapp.py:5(medium) ->  400    4.025    4.025  {built-in method sleep}

>>> stats.print_callees('light')
   Ordered by: internal time
   List reduced from 8 to 1 due to restriction <'light'>

Function           called...
                    ncalls  tottime  cumtime
myapp.py:9(light)  ->  200    0.212    0.212  {built-in method sleep}

Being able to sort the output allows working on different views to find the bottlenecks. For instance, consider the following scenarios:

  • When the number of calls is really high and takes up most of the global time, the function or method is probably in a loop. Possible optimization may be done by moving this call to different scope in order to reduce number of operations
  • When one function is taking very long time, a cache might be a good option, if possible

Another great way to visualize bottlenecks from profiling data is to transform them into diagrams (see Figure 1). Gprof2Dot (https://github.com/jrfonseca/gprof2dot) can be used to turn profiler data into a dot graph. You can download this simple script PyPI using pip and use it on the stats as long as Graphviz (see http://www.graphviz.org/) is installed in your environment:

$ gprof2dot.py -f pstats myapp.stats | dot -Tpng -o output.png

The advantage of gprof2dot is that it tries to be language agnostic. It is not limited to Python profile or cProfile output and can read from multiple other profiles such as Linux perf, xperf, gprof, Java HPROF, and many others.

Macro-profiling

Figure 1 An example of profiling overview diagram generated with gprof2dot

Macro-profiling is a good way to detect the function that has a problem, or at least its neighborhood. When you have found it, you can jump to micro-profiling.

Micro-profiling

When the slow function is found, it is sometimes necessary to do more profiling work that tests just a part of the program. This is done by manually instrumenting a part of the code in a speed test.

For instance, the cProfile module can be used from a decorator:

>>> import tempfile, os, cProfile, pstats
>>> def profile(column='time', list=5):
...     def _profile(function):
...         def __profile(*args, **kw):
...             s = tempfile.mktemp()
...             profiler = cProfile.Profile()
...             profiler.runcall(function, *args, **kw)
...             profiler.dump_stats(s)
...             p = pstats.Stats(s)
...             p.sort_stats(column).print_stats(list)
...         return __profile
...     return _profile
...
>>> from myapp import main
>>> @profile('time', 6)
... def main_profiled():
...     return main()
...
>>> main_profiled()
Mon Apr  4 22:01:01 2016    /tmp/tmpvswuovz_

         1207 function calls in 8.243 seconds

   Ordered by: internal time
   List reduced from 7 to 6 due to restriction <6>

  ncalls  tottime  percall  cumtime  percall file:lineno(function)
     602    8.241    0.014    8.241    0.014 {built-in method sleep}
     400    0.001    0.000    4.026    0.010 myapp.py:5(medium)
       2    0.001    0.000    8.243    4.121 myapp.py:13(heavy)
     200    0.000    0.000    0.213    0.001 myapp.py:9(light)
       1    0.000    0.000    8.243    8.243 myapp.py:21(main)
       1    0.000    0.000    8.243    8.243 <stdin>:1(main_profiled)


>>> from myapp import light
>>> stats = profile()(light)
>>> stats()
Mon Apr  4 22:01:57 2016    /tmp/tmpnp_zk7dl

         3 function calls in 0.001 seconds

   Ordered by: internal time

  ncalls  tottime  percall  cumtime  percall file:lineno(function)
       1    0.001    0.001    0.001    0.001 {built-in method sleep}
       1    0.000    0.000    0.001    0.001 myapp.py:9(light)

This approach allows testing parts of the application and sharpens the statistics output. But at this stage, having a list of callees is probably not interesting, as the function has already been pointed out as the one to optimize. The only interesting information is to know how fast it is, and then enhance it.

timeit fits this need better by providing a simple way to measure the execution time of a small code snippet with the best underlying timer the host system provides (time.time or time.clock):

>>> from myapp import light
>>> import timeit
>>> t = timeit.Timer('main()')
>>> t.timeit(number=5)
10000000 loops, best of 3: 0.0269 usec per loop
10000000 loops, best of 3: 0.0268 usec per loop
10000000 loops, best of 3: 0.0269 usec per loop
10000000 loops, best of 3: 0.0268 usec per loop
10000000 loops, best of 3: 0.0269 usec per loop
5.6196951866149902

The module allows you to repeat the call and is oriented to try out isolated code snippets. This is very useful outside the application context, in a prompt, for instance, but is not really handy to use within an existing application.

Note

A deterministic profiler will provide results depending on what the computer is doing, and so results may vary each time. Repeating the same test multiple times and making averages provides more accurate results. Furthermore, some computers have special CPU features, such as SpeedStep, that might change the results if the computer is idling when the test is launched (see http://en.wikipedia.org/wiki/SpeedStep). So, continually repeating the test is a good practice for small code snippets. There are also various caches to keep in mind such as DNS caches or CPU caches.

But the results of timeit should be used with caution. It is a very good tool to objectively compare two short snippets of code but it also allows you to easily make dangerous mistakes that will lead you to confusing conclusions. Here, for example, is the comparison of two innocent snippets of code with the timeit module that could make you think that string concatenation by addition is faster than the str.join() method:

$ python3 -m timeit -s 'a = map(str, range(1000))' '"".join(a)'
1000000 loops, best of 3: 0.497 usec per loop

$ python3 -m timeit -s 'a = map(str, range(1000)); s=""' 'for i in a: s += i'
10000000 loops, best of 3: 0.0808 usec per loop

From Chapter 2, Syntax Best Practices – below the Class Level, we know that string concatenation by addition in not a good pattern. Despite there are some minor CPython micro-optimizations designed exactly for such use case, it will eventually lead to quadratic run time. The problem lies in nuances about the setup argument of timeit (-s parameter in the command line) and how the range in Python 3 works. I won't discuss the details of the problem but will leave it to you as an exercise. Anyway, here is the correct way to compare string concatenation in addition with the str.join() idiom under Python 3:

$ python3 -m timeit -s 'a = [str(i) for i in range(10000)]' 's="".join(a)'
10000 loops, best of 3: 128 usec per loop

$ python3 -m timeit -s 'a = [str(i) for i in range(10000)]' '
>s = ""
>for i in a:
>    s += i
>'
1000 loops, best of 3: 1.38 msec per loop

Measuring Pystones

When measuring execution time, the result depends on the computer hardware. To be able to produce a universal measure, the simplest way is to benchmark the speed of a fixed sequence of code and calculate a ratio out of it. From there, the time taken by a function can be translated to a universal value that can be compared on any computer.

Note

A lot of generic benchmarking tools for the measurement of computer performance are available. Surprisingly, some of them that were created many years ago are still used today. For instance, Whetstone was created in 1972, and back then it provided a computer performance analyzer in Algol 60 (see http://en.wikipedia.org/wiki/Whetstone_%28benchmark%29). It is used to measure the Millions Of Whetstone Instructions Per Second (MWIPS). A table of results for old and modern CPUs is maintained at http://freespace.virgin.net/roy.longbottom/whetstone%20results.htm.

Python provides a benchmark utility in its test package that measures the duration of a sequence of well-chosen operations. The result is a number of pystones per second the computer is able to perform and the time used to perform the benchmark, which is generally around one second on modern hardware:

>>> from test import pystone
>>> pystone.pystones()
(1.0500000000000007, 47619.047619047589)

The rate can be used to translate a profile duration into a number of pystones:

>>> from test import pystone
>>> benchtime, pystones = pystone.pystones()
>>> def seconds_to_kpystones(seconds):
...     return (pystones*seconds) / 1000 
... 
... 
>>> seconds_to_kpystones(0.03)
1.4563106796116512
>>> seconds_to_kpystones(1)
48.543689320388381
>>> seconds_to_kpystones(2)
97.087378640776762

The seconds_to_kpystones returns the number of kilo pystones. This conversion can be included in your test if you want to code some speed assertions.

Having pystones will allow you to use this decorator in tests so that you can set assertions on execution times. These tests will be runnable on any computer and will allow developers to prevent speed regressions. When a part of the application has been optimized, they will be able to set its maximum execution time in tests and make sure it won't be breached by further changes. This approach is, of course, not ideal and 100% accurate, but it is at least better than hardcoding execution time assertions in raw values expressed as seconds.

Profiling memory usage

Another problem you may encounter when optimizing an application is memory consumption. If a program starts to eat so much memory that the system begins to swap, there is probably a place in your application where too many objects are created or objects that you don't intend to keep are still kept alive by some unintended reference. This is often easy to detect through classical profiling because consuming enough memory to make a system swap involves a lot of CPU work that can be detected. But sometimes it is not obvious and the memory usage has to be profiled.

How Python deals with memory

Memory usage is probably the hardest thing to profile in Python when you use the CPython implementation. While languages such as C allow you to get the memory size of any element, Python will never let you know how much a given object consumes. This is due to the dynamic nature of the language, and the fact that memory management is not directly accessible to the language user.

Some raw details of memory management were already explained in Chapter 7, Python Extensions in Other Languages. We already know that CPython uses reference counting to manage object allocation. This is the deterministic algorithm which ensures that object deallocation will be triggered when the reference count of the object goes to zero. Despite being deterministic, this process is not easy to track manually and to reason about in complex codebases. Also, the deallocation of objects on a reference count level does not necessarily mean that the actual process heap memory is freed by the interpreter. Depending on CPython interpreter compilation flags, system environment, or runtime context, the internal memory manager layer might decide to leave some blocks of free memory for future reallocation instead of releasing it completely.

Additional micro-optimizations in CPython implementation also make it even harder to predict actual memory usage. For instance, two variables that point to the same short string or small integer value might or might not point to the same object instance in memory.

Despite being quite scary and seemingly complex, memory management in Python is very well documented (refer to https://docs.python.org/3/c-api/memory.html). Note that, micro-optimizations mentioned earlier can, in most cases, be ignored when debugging memory issues. Also, reference counting is roughly based on a simple statement—if a given object is not referenced anymore, it is removed. In other words, all local references in a function are removed after the interpreter:

  • Leaves the function
  • Makes sure the object is not being used anymore

So, objects that remain in memory are:

  • Global objects
  • Objects that are still referenced in some way

Be careful with the argument inbound outbound edge case. If an object is created within the arguments, the argument reference will still be alive if the function returns the object. This can lead to unexpected results if it is used as a default value:

>>> def my_function(argument={}):  # bad practice
...     if '1' in argument:
...         argument['1'] = 2
...     argument['3'] = 4
...     return argument
... 
>>> my_function()
{'3': 4}
>>> res = my_function()
>>> res['4'] = 'I am still alive!'
>>> print my_function()
{'3': 4, '4': 'I am still alive!'}

That is why nonmutable objects should always be used, like this:

>>> def my_function(argument=None):  # better practice
...     if argument is None:
...         argument = {}  # a fresh dict is created everytime
...     if '1' in argument:
...         argument['1'] = 2
...     argument['3'] = 4
...     return argument
... 
>>> my_function()
{'3': 4}
>>> res = my_function()
>>> res['4'] = 'I am still alive!'
>>> print my_function()
{'3': 4}

Reference counting in Python is handy and frees you from the obligation of manually tracking object references of objects, and therefore you don't have to manually destroy them. Although this introduces another problem, since developers never clean up instances in memory, it might grow in an uncontrolled way if developers don't pay attention to the way they use their data structures.

The usual memory eaters are:

  • Caches that grow uncontrolled
  • Object factories that register instances globally and do not keep track of their usage, such as a database connector creator used on the fly every time a query is called
  • Threads that are not properly finished
  • Objects with a __del__ method and involved in a cycle are also memory eaters. In older versions of Python (prior to 3.4 version), the garbage collector will not break the cycle since it cannot be sure which object should be deleted first. Hence, you will leak memory. Using this method is a bad idea in most cases.

Unfortunately, the management of reference counts must be done manually in C extensions using Python/C API with Py_INCREF() and Py_DECREF() macros. We discussed caveats of handling reference counts and reference ownership earlier in Chapter 7, Python Extensions in Other Languages, so you should already know that it is a pretty hard topic riddled with various pitfalls. This is the reason why most memory issues are caused by C extensions that are not written properly.

Profiling memory

Before starting to hunt down memory issues in Python, you should know that the nature of memory leaks in Python is quite special. In some of the compiled languages such as C and C++, the memory leaks are almost exclusively caused by allocated memory blocks that are no longer referenced by any pointer. If you don't have reference to memory, you cannot release it, and this very situation is called a memory leak. In Python, there is no low level memory management available for the user, so we rather deal with leaking references—references to objects that are not needed anymore but were not removed. This stops the interpreter from releasing resources but is not the same situation as a memory leak in C. Of course, there is always the exceptional case of C extensions, but they are a different kind of beast that need completely different tool chains and cannot be easily inspected from Python code.

So, memory issues in Python are mostly caused by unexpected or unplanned resource acquiring patterns. It happens very rarely that this is an effect of real bugs caused by the mishandling of memory allocation and deallocation routines. Such routines are available to the developer only in CPython when writing C extension with Python/C APIs and you will deal with them very rarely, if ever. Thus, most so-called memory leaks in Python are mostly caused by the overblown complexity of the software and minor interactions between its components that are really hard to track. In order to spot and locate such deficiencies of your software, you need to know how an actual memory usage looks in the program.

Getting information about how many objects are controlled by the Python interpreter and about their real size is a bit tricky. For instance, knowing how much a given object weighs in bytes would involve crawling down all its attributes, dealing with cross-references and then summing up everything. It's a pretty difficult problem if you consider the way objects tend to refer to each other. The gc module does not provide high-level functions for this, and it would require Python to be compiled in debug mode to have a full set of information.

Often, programmers just ask the system about the memory usage of their application after and before a given operation has been performed. But this measure is an approximation and depends a lot on how the memory is managed at system level. Using the top command under Linux or the Task Manager under Windows, for instance, makes it possible to detect memory problems when they are obvious. But this approach is laborious and makes it really hard to track down the faulty code block.

Fortunately, there are a few tools available to make memory snapshots and calculate the number and size of loaded objects. But let's keep in mind that Python does not release memory easily, preferring to hold on to it in case it is needed again.

For some time, one of most popular tools to use when debugging memory issues and usage in Python was Guppy-PE and its Heapy component. Unfortunately, it seems to be no longer maintained and it lacks Python 3 support. Luckily, there are some other alternatives that are Python 3 compatible to some extent:

Note that the preceding information is based purely on trove classifiers used by the latest distributions of featured packages. This could easily change in the time after this book was written. Nevertheless, there is one package that currently supports the widest spectrum of Python versions and is also known to work flawlessly under Python 3.5. It is objgraph. Its APIs seem to be a bit clumsy and have a very limited set of functionalities. But it works, does well what it needs to and is really easy to use. Memory instrumentation is not a thing that is added to the production code permanently, so this tool does not need to be pretty. Because of its wide support of Python versions in OS independence, we will focus only on objgraph when discussing examples of memory profiling. The other tools mentioned in this section are also exciting pieces of software but you need to research them by yourself.

objgraph

objgraph (refer to http://mg.pov.lt/objgraph/) is a simple tool for creating diagrams of object references that should be useful when hunting memory leaks in Python. It is available on PyPI but it is not a completely standalone tool and requires Graphviz in order to create memory usage diagrams. For developer-friendly systems like Mac OS X or Linux, you can easily obtain it using your preferred system package manager. For Windows, you need to download the Graphviz installer from the project page (refer to http://www.graphviz.org/) and install it manually.

objgraph provides multiple utilities that allow you to list and print various statistics about memory usage and object counts. An example of such utilities in use is shown in the following transcript of interpreter session.

>>> import objgraph
>>> objgraph.show_most_common_types()
function                   1910
dict                       1003
wrapper_descriptor         989
tuple                      837
weakref                    742
method_descriptor          683
builtin_function_or_method 666
getset_descriptor          338
set                        323
member_descriptor          305
>>> objgraph.count('list')
266
>>> objgraph.typestats(objgraph.get_leaking_objects())
{'Gt': 1, 'AugLoad': 1, 'GtE': 1, 'Pow': 1, 'tuple': 2, 'AugStore': 1, 'Store': 1, 'Or': 1, 'IsNot': 1, 'RecursionError': 1, 'Div': 1, 'LShift': 1, 'Mod': 1, 'Add': 1, 'Invert': 1, 'weakref': 1, 'Not': 1, 'Sub': 1, 'In': 1, 'NotIn': 1, 'Load': 1, 'NotEq': 1, 'BitAnd': 1, 'FloorDiv': 1, 'Is': 1, 'RShift': 1, 'MatMult': 1, 'Eq': 1, 'Lt': 1, 'dict': 341, 'list': 7, 'Param': 1, 'USub': 1, 'BitOr': 1, 'BitXor': 1, 'And': 1, 'Del': 1, 'UAdd': 1, 'Mult': 1, 'LtE': 1}

As already said, objgraph allows you to create diagrams of memory usage patterns and cross-references that link all the objects in the given namespace. The most useful diagramming utilities of that library are objgraph.show_refs() and objgraph.show_backrefs(). They both accept reference to the object being inspected and save a diagram image to file using the Graphviz package. Examples of such graphs are presented in Figure 2 and Figure 3.

Here is the code that was used to create these diagrams:

import objgraph


def example():
    x = []
    y = [x, [x], dict(x=x)]

    objgraph.show_refs(
        (x, y),
        filename='show_refs.png',
        refcounts=True
    )
    objgraph.show_backrefs(
        (x, y),
        filename='show_backrefs.png',
        refcounts=True
    )


if __name__ == "__main__":
    example()

Figure 2 shows the diagram of all references hold by x and y objects. From top to bottom and left to right it presents exactly four objects:

  • y = [x, [x], dict(x=x)] list instance
  • dict(x=x) dictionary instance
  • [x] list instance
  • x = [] list instance
objgraph

Figure 2 An example result of the show_refs() diagram from the example() function

Figure 3 shows not only references between x and y but also all the objects that hold references to these two instances. There are so-called back references and are really helpful in finding objects that stop other objects from being deallocated.

objgraph

Figure 3 An example result of the show_backrefs() diagram from the example() function

In order to show how objgraph may be used in practice, let's review some practical examples. As we have already noted a few times in this book, CPython has its own garbage collector that exists independently from its reference counting method. It's not used for general purpose memory management but only to solve the problem of cyclic references. In many situations, objects may reference each other in a way that would make it impossible to remove them using simple techniques based on tracking the number of references. Here is the most simple example:

x = []
y = [x]
x.append(y)

Such a situation is visually presented in Figure 4. In the preceding case, even if all external references to x and y objects will be removed (for instance, by returning from local scope of a function), these two objects cannot be removed because there are still two cross-references owned by these two objects. This is the situation where Python garbage collector steps in. It can detect cyclic references to objects and trigger their deallocation if there are no other valid references to these objects outside the cycle.

objgraph

Figure 4 An example diagram of cyclic references between two objects

The real problem starts when at least one of the objects in such a cycle has the custom __del__() method defined. It is a custom deallocation handler that will be called when the object's reference count finally goes to zero. It can execute any arbitrary Python code and so can also create new references to featured object. This is the reason why garbage collector prior to Python 3.4 version could not break reference cycles if at least one of the objects provided the custom __del__() method implementation. PEP 442 introduced safe object finalization to Python and became a part of the standard starting from Python 3.4. Anyway, this may still be a problem for packages that worry about backwards compatibility and target a wide spectrum of Python interpreter versions. The following snippet of code shows you the differences in behavior of cyclic garbage collector in different Python versions:

import gc
import platform
import objgraph


class WithDel(list):
    """ list subclass with custom __del__ implementation """
    def __del__(self):
        pass


def main():
    x = WithDel()
    y = []
    z = []

    x.append(y)
    y.append(z)
    z.append(x)

    del x, y, z

    print("unreachable prior collection: %s" % gc.collect())
    print("unreachable after collection: %s" % len(gc.garbage))
    print("WithDel objects count:        %s" %
          objgraph.count('WithDel'))


if __name__ == "__main__":
    print("Python version: %s" % platform.python_version())
    print()
    main()

The output of the preceding code, when executed under Python 3.3, shows that the cyclic garbage collector in the older versions of Python cannot collect objects that have the __del__() method defined:

$ python3.3 with_del.py 
Python version: 3.3.5

unreachable prior collection: 3
unreachable after collection: 1
WithDel objects count:        1

With a newer version of Python, the garbage collector can safely deal with finalization of objects even if they have the __del__() method defined:

$ python3.5 with_del.py 
Python version: 3.5.1

unreachable prior collection: 3
unreachable after collection: 0
WithDel objects count:        0

Although custom finalization is no longer tricky in the latest Python releases, it still poses a problem for applications that need to work under different environments. As mentioned earlier, the objgraph.show_refs() and objgraph.show_backrefs() functions allow you to easily spot problematic class instances. For instance, we can easily modify the main() function to show all back references to the WithDel instances in order to see if we have leaking resources:

def main():
    x = WithDel()
    y = []
    z = []

    x.append(y)
    y.append(z)
    z.append(x)

    del x, y, z

    print("unreachable prior collection: %s" % gc.collect())
    print("unreachable after collection: %s" % len(gc.garbage))
    print("WithDel objects count:        %s" %
          objgraph.count('WithDel'))

    objgraph.show_backrefs(
        objgraph.by_type('WithDel'),
        filename='after-gc.png'
    )

Running the preceding example under Python 3.3 will result in a diagram (see Figure 5), which shows that gc.collect() could not succeed in removing x, y, and z object instances. Additionally, objgraph highlights all the objects that have the custom __del__() method defined to make spotting such issues easier.

objgraph

Figure 5 The diagram showing an example of cyclic references that can't be picked by the Python garbage collector prior to version 3.4

C code memory leaks

If the Python code seems perfectly fine and the memory still increases when you loop through the isolated function, the leak might be located on the C side. This happens, for instance, when a Py_DECREF call is missing.

The Python core code is pretty robust and tested for leaks. If you use packages that have C extensions, they might be a good place to look first. Because you will be dealing with code operating on a much lower level of abstraction than Python, you need to use completely different tools to resolve such memory issues.

Memory debugging is not easy in C, so before diving into extension internals make sure to properly diagnose the source of your problem. It is a very popular approach to isolate a suspicious package with code similar in nature to unit tests:

  • Write a separate test for each API unit or functionality of an extension you are suspecting to leak memory
  • Perform the test in a loop for an arbitrarily long time in isolation (one test per run)
  • Observe from outside which of the tested functionalities increase memory usage over time

Using such an approach, you can isolate the faulty part of the extension and this will reduce the time required later to inspect and fix its code. This process may seem burdensome because it requires a lot of additional time and coding, but it really pays off in the long run. You can always ease your work by reusing some testing tools introduced in Chapter 10, Test-Driven Development. Utilities such as tox were perhaps not designed exactly for this case, but they can at least reduce the time required to run multiple tests in isolated environments.

Hopefully, you have isolated the part of the extension that is leaking memory and can finally start actual debugging. If you're lucky, a simple manual inspection of the source code may give the desired results. In many cases, the problem is as simple as adding the missing Py_DECREF call. Nevertheless, in most cases, our work is not that simple. In such situations, you need to bring out some bigger guns. One of the notable generic tools for fighting memory leaks in compiled code that should be in every programmer's toolbelt is Valgrind. It is a whole instrumentation framework for building dynamic analysis tools. Because of this, it may not be easy to learn and master, but you should definitely know the basics.

Profiling network usage

As I said earlier, an application that communicates with third-party programs such as databases, caches, web services, or an LDAP server can be slowed down when those applications are slow. This can be tracked with a regular code profiling method on the application side. But if the third-party software works fine on its own, the culprit is probably the network.

The problem might be a misconfigured hub, a low-bandwidth network link, or even a high number of traffic collisions that make computers send the same packets several times.

Here are a few elements to get you in. To find out what is going on, there are three fields to investigate at first:

If you want to go further on network performance issues, you might also want to read Network Performance Open Source Toolkit, Wiley, by Richard Blum. This book exposes strategies to tune the applications that are heavily using the network and provides a tutorial to scan complex network problems.

High Performance MySQL, O'Reilly Media, by Jeremy Zawodny is also a good book to read when writing an application that uses MySQL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset