Chapter 2. Pythonic Syntax, Common Pitfalls, and Style Guide

The design and development of the Python programming language have always been in the hands of its original author, Guido van Rossum, in many cases lovingly referred to as the Benevolent Dictator For Life (BDFL). Even though van Rossum is thought to have a time machine (he has repeatedly answered feature requests with "I just implemented that last night": http://www.catb.org/jargon/html/G/Guido.html), he is still just a human and needs help with the maintenance and development of Python. To facilitate that, the Python Enhancement Proposal (PEP) process has been developed. This process allows anyone to submit a PEP with a technical specification of the feature and a rationale to defend its usefulness. After a discussion on the Python mailing lists and possibly some improvements, the BDFL will make a decision to accept or reject the proposal.

The Python style guide (PEP 8: https://www.python.org/dev/peps/pep-0008/) was once submitted as one of those PEPs, and it is has been accepted and improved regularly since. It has a lot of great and widely accepted conventions as well as a few disputed ones. Especially, the maximum line length of 79 characters is a topic of many discussions. Limiting a line to 79 characters does have some merits, however. In addition to this, while just the style guide itself does not make code Pythonic, as "The Zen of Python" (PEP 20: https://www.python.org/dev/peps/pep-0020/) elegantly says: "Beautiful is better than ugly." PEP 8 defines how code should be formatted in an exact way, and PEP 20 is more of a philosophy and mindset.

The common pitfalls are a list of common mistakes made, varying from beginner mistakes to advanced ones. They range from passing a list or dictionary (which are mutable) as arguments to late-binding problems in closures. An even more important issue is how to work around circular imports in a clean way.

Some of the techniques used in the examples in this chapter might be a bit too advanced for such an early chapter, but please don't worry. This chapter is about style and common pitfalls. The inner workings of the techniques used will be covered in later chapters.

We will cover the following topics in this chapter:

  • Code style (PEP 8, pyflakes, flake8, and more)
  • Common pitfalls (lists as function arguments, pass by value versus pass by reference, and inheritance behavior)

Note

The definition of Pythonic code is highly subjective and mainly reflects the opinion of this author. When working on a project, it is more important to stay consistent with the coding styles of that project than with the coding guidelines given by Python or this book.

Code style – or what is Pythonic code?

Pythonic code—when you first hear of it, you might think it is a programming paradigm, similar to object-oriented or functional programming. While some of it could be considered as such, it is actually more of a design philosophy. Python leaves you free to choose to program in an object-oriented, procedural, functional, aspect-oriented or even logic-oriented way. These freedoms make Python a great language to write in, but as always, freedom has the drawback of requiring a lot of discipline to keep the code clean and readable. The PEP8 standard tells us how to format code, but there is more to Pythonic code than syntax alone. That is what the Pythonic philosophy (PEP20) is all about, code that is:

  • Clean
  • Simple
  • Beautiful
  • Explicit
  • Readable

Most of these sound like common sense, and I think they should be. There are cases however, where there is not a single obvious way to do it (unless you're Dutch, of course, as you'll read later in this chapter). That is the goal of this chapter—to learn what code is beautiful and why certain decisions have been made in the Python style guide.

Note

Some programmers once asked Guido van Rossum whether Python would ever support braces. Since that day, braces have been available through a __future__ import:

>>> from __future__ import braces
  File "<stdin>", line 1
SyntaxError: not a chance

Formatting strings – printf-style or str.format?

Python has supported both printf-style (%) and str.format for a long time, so you are most likely familiar with both already.

Within this book, printf-style formatting will be used for a few reasons:

  • The most important reason is that it comes naturally to me. I have been using printf in many different programming languages for about 20 years now.
  • The printf syntax is supported in most programming languages, which makes it familiar for a lot of people.
  • While only relevant for the purposes of the examples in this book, it takes up slightly less space, requiring less formatting changes. As opposed to monitors, books have not gotten wider over the years.

In general most people recommend str.format these days, but it mainly comes down to preference. The printf-style is simpler, while the str.format method is more powerful.

If you wish to learn more about how printf-style formatting can be replaced with str.format (or the other way around, of course), then I recommend the PyFormat site at https://pyformat.info/.

PEP20, the Zen of Python

Most of the Pythonic philosophy can be explained through PEP20. Python has a nice little Easter egg to always remind you of PEP20. Simply type import this in a Python console and you will get the PEP20 lines. To quote PEP20:

"Long time Pythoneer Tim Peters succinctly channels the BDFL's guiding principles for Python's design into 20 aphorisms, only 19 of which have been written down."

The next few paragraphs will explain the intentions of these 19 lines.

Note

The examples within the PEP20 section are not necessarily all identical in working, but they do serve the same purpose. Many of the examples here are fictional and serve no purpose other than explaining the rationale of the paragraph.

For clarity, let's see the output of import this before we begin:

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Beautiful is better than ugly

While beauty is quite subjective, there are some Python style rules to adhere to: limiting line lengths, keeping statements on separate lines, splitting imports on separate lines, and so on.

In short, instead of a somewhat complex function such as this:

 def filter_modulo(items, modulo):
    output_items = []
    for i in range(len(items)):
        if items[i] % modulo:
            output_items.append(items[i])
    return output_items

Or this:

filter_modulo = lambda i, m: [i[j] for i in range(len(i))
                              if i[j] % m]

Just do the following:

def filter_modulo(items, modulo):
    for item in items:
        if item % modulo:
            yield item

Simpler, easier to read, and a bit more beautiful!

Note

These examples are not identical in results. The first two return lists whereas the last returns a generator. Generators will be discussed more thoroughly in Chapter 6, Generators and Coroutines – Infinity, One Step at a Time.

Explicit is better than implicit

Imports, arguments, and variable names are just some of the many cases where explicit code is far easier to read at the cost of a little bit more effort and/or verbosity when writing the code.

Here is an example:

from spam import *
from eggs import *

some_function()

While this saves you some typing, it becomes quite difficult to see where some_function is defined. Is it defined in foo? In bar? Perhaps in both modules? There are editors with advanced introspection that can help you here, but why not keep it explicit so that everyone (even when simply viewing the code online) can see what it's doing?

import spam
import eggs

spam.some_function()
eggs.some_function()

The added benefit is that we can explicitly call the function from either spam or eggs here, and everyone will have a better idea what the code does.

The same goes for functions with *args and **kwargs. They can be very useful at times, but they do have the downside of making it less obvious which arguments are valid for a function:

def spam(egg, *args, **kwargs):
    processed_egg = process_egg(egg, *args, **kwargs)
    return Spam(processed_egg)

Documentation can obviously help for cases like these and I don't disagree with the usage of *args and **kwargs in general, but it is definitely a good idea to keep at least the most common arguments explicit. Even when it requires you to repeat the arguments for a parent class, it just makes the code that much clearer. When refactoring the parent class in future, you'll know whether there are subclasses that still use some parameters.

Simple is better than complex

"Simple is better than complex. Complex is better than complicated."

The most important question to ask yourself when starting a new project is: how complex does it need to be?

For example, let's assume that we've written a small program and now we need to store a bit of data. What options do we have here?

  • A full database server, such as PostgreSQL or MySQL
  • A simple file system database, such as SQLite or AnyDBM
  • Flat file storage, such as CSV and TSV
  • Structured storage, such as JSON, YAML, or XML
  • Serialized Python, such as Pickle or Marshal

All of these options have their own use cases as well as advantages and disadvantages depending on the use case:

  • Are you storing a lot of data? Then full database servers and flat file storage are generally the most convenient options.
  • Should it be easily portable to different systems without any package installation? That makes anything besides full database servers convenient options.
  • Do we need to search the data? This is much easier using one of the database systems, both filesystem and full servers.
  • Are there other applications that need to be able to edit the data? That makes universal formats such as flat file storage and the structured storage convenient options, but excludes serialized Python.

Many questions! But the most important one is: how complex does it need to be? Storing data in a pickle file is something you can do in three lines, while connecting to a database (even with SQLite) will be more complicated and, in many cases, not needed:

import pickle  # Or json/yaml
With open('data.pickle', 'wb') as fh:
    pickle.dump(data, fh, pickle.HIGHEST_PROTOCOL)

Versus:

import sqlite3
connection = sqlite3.connect('database.sqlite')
cursor = connection.cursor()
cursor.execute('CREATE TABLE data (key text, value text)')
cursor.execute('''INSERT INTO data VALUES ('key', 'value')''')
connection.commit()
connection.close()

These examples are far from identical, of course, as one stores a complete data object whereas the other simply stores some key/value pairs within a SQLite database. That is not the point, however. The point is that the code is far more complex while it is actually less versatile in many cases. With proper libraries, this can be simplified, but the basic premise stays the same. Simple is better than complex and if the complexity is not needed, it's better to avoid it.

Flat is better than nested

Nested code quickly becomes unreadable and hard to understand. There are no strict rules here, but generally when you have three levels of nested loops, it is time to refactor.

Just take a look the following example, which prints a list of two-dimensional matrices. While nothing is specifically wrong here, splitting it into a few more functions might make it easier to understand the purpose and easier to test:

def print_matrices():
    for matrix in matrices:
        print('Matrix:')
        for row in matrix:
            for col in row:
                print(col, end='')
            print()
        print()

The somewhat flatter version is as follows:

def print_row(row):
    for col in row:
        print(col, end='')

def print_matrix(matrix):
    for row in matrix:
        print_row(row)
        print()

def print_matrices(matrices):
    for matrix in matrices:
        print('Matrix:')
        print_matrix(matrix)
        print()

This example might be a bit convoluted, but the idea is sound. Having deeply nested code can easily become very unreadable.

Sparse is better than dense

Whitespace is generally a good thing. Yes, it will make your files longer and your code will take more space, but it can help a lot with readability if you split your code logically:

>>> def make_eggs(a,b):'while',['technically'];print('correct');
...     {'this':'is','highly':'unreadable'};print(1-a+b**4/2**2)
...
>>> make_eggs(1,2)
correct
4.0

While technically correct, this is not all that readable. I'm certain that it would take me some effort to find out what the code actually does and what number it would print without trying it:

>>> def make_eggs(a, b):
...     'while', ['technically']
...     print('correct')
...     {'this': 'is', 'highly': 'unreadable'}
...     print(1 - a + ((b ** 4) / (2 ** 2)))
...
>>> make_eggs(1, 2)
correct
4.0

Still, this is not the best code, but at least it's a bit more obvious what is happening in the code.

Readability counts

Shorter does not always mean easier to read:

fib=lambda n:reduce(lambda x,y:(x[0]+x[1],x[0]),[(1,1)]*(n-2))[0]

Although the short version has a certain beauty in conciseness, I personally find the following far more beautiful:

def fib(n):
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

Practicality beats purity

"Special cases aren't special enough to break the rules. Although practicality beats purity."

Breaking the rules can be tempting at times, but it tends to be a slippery slope. Naturally, this applies to all rules. If your quick fix is going to break the rules, you should really try to refactor it immediately. Chances are that you won't have the time to fix it later and will regret it.

No need to go overboard though. If the solution is good enough and refactoring would be much more work, then choosing the working method might be better. Even though all of these examples pertain to imports, this guideline applies to nearly all cases.

To prevent long lines, imports can be made shorter by using a few methods, adding a backslash, adding parentheses, or just shortening the imports:

from spam.eggs.foo.bar import spam, eggs, extra_spam, extra_eggs, extra_stuff  from spam.eggs.foo.bar import spam, eggs, extra_spam, extra_eggs

This case can easily be avoided by just following PEP8 (one import per line):

from spam.eggs.foo.bar import spam from spam.eggs.foo.bar import eggs from spam.eggs.foo.bar import extra_spam from spam.eggs.foo.bar import extra_eggs from spam.eggs.foo.bar import extra_stuff  from spam.eggs.foo.bar import spam
from spam.eggs.foo.bar import eggs
from spam.eggs.foo.bar import extra_spam
from spam.eggs.foo.bar import extra_eggs

But what about really long imports?

from spam_eggs_and_some_extra_spam_stuff import my_spam_and_eggs_stuff_which_is_too_long_for_a_line

Yes… even though adding a backslash for imports is generally not recommended, there are some cases where it's still the best option:

from spam_eggs_and_some_extra_spam_stuff 
    import my_spam_and_eggs_stuff_which_is_too_long_for_a_line

Errors should never pass silently

"Errors should never pass silently. Unless explicitly silenced."

To paraphrase Jamie Zawinsky: Some people, when confronted with an error, think "I know, I'll use a try/except/pass block." Now they have two problems.

Bare or too broad exception catching is already a bad idea. Not passing them along will make you (or some other person working on the code) wonder for ages what is happening:

try:
    value = int(user_input)
except:
    pass

If you really need to catch all errors, be very explicit about it:

try:
    value = int(user_input)
except Exception as e:
    logging.warn('Uncaught exception %r', e)

Or even better, catch it specifically and add a sane default:

try:
    value = int(user_input)
except ValueError:
    value = 0

The problem is actually even more complicated. What about blocks of code that depend on whatever is happening within the exception? For example, consider the following code block:

try:
    value = int(user_input)
    value = do_some_processing(value)
    value = do_some_other_processing(value)
except ValueError:
    value = 0

If ValueError is raised, which line is causing it? Is it int(user_input), do_some_processing(value), or do_some_other_processing(value)? With silent catching of the error, there is no way to know when regularly executing the code, and this can be quite dangerous. If for some reason the processing of the other functions changes, it becomes a problem to handle exceptions in this way. So, unless it was actually intended to behave like that, use this instead:

try:
    value = int(user_input)
except ValueError:
    value = 0
else:
    value = do_some_processing(value)
    value = do_some_other_processing(value)

In the face of ambiguity, refuse the temptation to guess

While guesses will work in many cases, they can bite you if you're not careful. As already demonstrated in the "explicit is better than implicit" paragraph, when having a few from ... import *, you cannot always be certain which module is providing you the variable you were expecting.

Ambiguity should generally be avoided, so guessing can be avoided. Clear and unambiguous code generates fewer bugs. A useful case where ambiguity is likely is function calling. Take, for example, the following two function calls:

spam(1, 2, 3, 4, 5)
spam(spam=1, eggs=2, a=3, b=4, c=5)

They could be the same, but they might also not be. It's impossible to say without seeing the function. If the function were implemented in the following way, the results would be vastly different between the two:

def spam(a=0, b=0, c=0, d=0, e=0, spam=1, eggs=2):
    pass

I'm not saying you should use keyword arguments in all cases, but if there are many arguments involved and/or hard-to-identify parameters (such as numbers), it would be a good idea. Instead of using keyword arguments, you can choose logical variable names to pass the arguments as well, as long as the meaning is clearly conveyed from the code.

For example, the following is a similar call that uses custom variable names to convey the intent:

a = 3
b = 4
c = 5
spam(a, b, c)

One obvious way to do it

"There should be one—and preferably only one—obvious way to do it. Although that way may not be obvious at first unless you're Dutch."

In general, after thinking about a difficult problem for a while, you will find that there is one solution that is clearly preferable over the alternatives. There are cases where this is not the case, however, and in that case, it can be useful if you're Dutch. The joke here is that Guido van Rossum, the BDFL and original author of Python, is Dutch (as is yours truly).

Now is better than never

"Now is better than never. Although never is often better than *right* now."

It's better to fix a problem right now than push it into the future. There are cases, however, where fixing it right away is not an option. In those cases, a good alternative can be to mark a function as deprecated instead so that there is no chance of accidentally forgetting the problem:

import warnings
warnings.warn('Something deprecated', DeprecationWarning)

Hard to explain, easy to explain

"If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea."

As always, keep things as simple as you can. While complicated code can be nice to test with, it is more prone to bugs. The simpler you can keep things, the better.

Namespaces are one honking great idea

"Namespaces are one honking great idea—let's do more of those!"

Namespaces can make code a lot clearer to use. Naming them properly makes it even better. For example, what does the following line of code do?

load(fh)

Not too clear, right?

How about the version with the namespace?

pickle.load(fh)

And now we do understand.

To give an example of a namespace, the full length of which renders it impractical to use, we will take a look at the User class in Django. Within the Django framework, the User class is stored in django.contrib.auth.models.User. Many projects use the object in the following way:

from django.contrib.auth.models import User
# Use it as: User

While this is fairly clear, it might make someone think that the User class is local to the current class. Doing the following instead lets people know that it is in a different module:

from django.contrib.auth import models
# Use it as: models.User

This quickly clashes with other models' imports though, so personally I would recommend the following instead:

from django.contrib.auth import models as auth_models
# Use it as auth_models.User

Here is another alternative:

import django.contrib.auth as auth_models
# Use it as auth_models.User

Conclusion

Now we should have some idea of what the Pythonic ideology is about. Creating code that is:

  • Beautiful
  • Readable
  • Unambiguous
  • Explicit enough
  • Not completely void of whitespace

So let's move on to some more examples of how to create beautiful, readable, and simple code using the Python style guide.

Explaining PEP8

The previous paragraphs have already shown a lot of examples using PEP20 as a reference, but there are a few other important guidelines to note as well. The PEP8 style guide specifies the standard Python coding conventions. Simply following the PEP8 standard doesn't make your code Pythonic though, but it is most certainly a good start. Which style you use is really not that much of a concern as long as you are consistent. The only thing worse than not using a proper style guide is being inconsistent with it.

Duck typing

Duck typing is a method of handling variables by behavior. To quote Alex Martelli (one of my Python heroes, also nicknamed the MartelliBot by many):

"Don't check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with. If the argument fails this specific-ducklyhood-subset-test, then you can shrug, ask "why a duck?"

In many cases, when people make a comparison such as if spam != '':, they are actually just looking for anything that is considered a true value. While you can compare the value to the string value '', you generally don't have to make it so specific. In many cases, simply doing if spam: is more than enough and actually functions better.

For example, the following lines of code use the value of timestamp to generate a filename:

filename = '%s.csv' % timestamp

Because it is named timestamp, one might be tempted to check whether it is actually a date or datetime object, like this:

import datetime
if isinstance(timestamp, (datetime.date, datetime.datetime)):
    filename = '%s.csv' % timestamp
else:
    raise TypeError(
        'Timestamp %r should be date(time) object, got %s'
        % (timestamp, type(timestamp))) 

While this is not inherently wrong, comparing types is considered a bad practice in Python, as there is oftentimes no need for it. In Python, duck typing is preferred instead. Just try converting it to a string and don't care what it actually is. To illustrate how little difference this can make for the end result, see the following code:

import datetime
timestamp = datetime.date(2000, 10, 5)
filename = '%s.csv' % timestamp
print('Filename from date: %s' % filename)

timestamp = '2000-10-05'
filename = '%s.csv' % timestamp
print('Filename from str: %s' % filename)

As you might expect, the result is identical:

Filename from date: 2000-10-05.csv
Filename from str: 2000-10-05.csv

The same goes for converting a number to a float or an integer; instead of enforcing a certain type, just require certain features. Need something that can pass as a number? Just try to convert to int or float. Need a file object? Why not just check whether there is a read method with hasattr?

So, don't do this:

if isinstance(value, int):

Instead, just use the following:

value = int(value)

And instead of this:

import io

if isinstance(fh, io.IOBase):

Simply use the following line:

if hasattr(fh, 'read'):

Differences between value and identity comparisons

There are several methods of comparing objects in Python, the standard greater than and less than, equal and unequal. But there are actually a few more, and one of them is a bit special. That's the identity comparison operator: instead of using if spam == eggs, you use if spam is eggs. The big difference is that one compares the value and the other compares the identity. This sounds a little vague, but it's actually fairly simple. At least within the CPython implementation, the memory address is being compared, which means that it is one of the lightest lookups you can get. Whereas a value needs to make sure that the types are comparable and perhaps check the sub-values, the identity check just checks whether the unique identifier is the same.

Note

If you've ever written Java, you should be familiar with this principle. In Java, a regular string comparison (spam == eggs) will use the identity instead of the value. To compare the value, you need to use spam.equals(eggs) to get the correct results.

Look at this example:

a = 200 + 56
b = 256
c = 200 + 57
d = 257

print('%r == %r: %r' % (a, b, a == b))
print('%r is %r: %r' % (a, b, a is b))
print('%r == %r: %r' % (c, d, c == d))
print('%r is %r: %r' % (c, d, c is d))

While the values are the same, the identities are different. The actual result from this code is as follows:

256 == 256: True
256 is 256: True
257 == 257: True
257 is 257: False

The catch is that Python keeps an internal array of integer objects for all integers between -5 and 256; that's why it works for 256 but not for 257.

You might wonder why anyone would ever want to use is instead of ==. There are multiple valid answers; depending on the case, one is correct and the other isn't. But performance can also be a very important consideration. The basic guideline is that when comparing Python singletons such as True, False, and None, always compare using is.

As for the performance consideration, think of the following example:

spam = range(1000000)
eggs = range(1000000)

When doing spam == eggs, this will compare every item in both lists to each other, so effectively it is doing 1,000,000 comparisons internally. Compare this with only one simple identity check when using spam is eggs.

To look at what Python is actually doing internally with the is operator, you can use the id function. When executing if spam is eggs, Python will actually execute if id(spam) == id(eggs) internally.

Loops

Coming from other languages, one might be tempted to use for loops or even while loops to process the items of a list, tuple, str, and so on. While valid, it is more complex than needed. For example, consider this code:

i = 0
while i < len(my_list):
    item = my_list[i]
    i += 1
    do_something(i, item)

Instead you can do the following:

for i, item in enumerate(my_list):
    do_something(i, item)

While this can be written even shorter, it's generally not recommended, as it does not improve readability:

[do_something(i, item) for i, item in enumerate(my_list)]

The last option might be clear to some but not all. Personally, I prefer to limit the usage of list comprehensions, dict comprehensions, and map and filter statements for when the result is actually being stored.

For example:

spam_items = [x for x in items if x.startswith('spam_')]

But still, only if it doesn't hurt the readability of the code.

Consider this bit of code:

eggs = [is_egg(item) or create_egg(item) for item in list_of_items if egg and hasattr(egg, 'egg_property') and isinstance(egg, Egg)]  eggs = [is_egg(item) or create_egg(item) for item in list_of_items
        if egg and hasattr(egg, 'egg_property')
        and isinstance(egg, Egg)]

Instead of putting everything in the list comprehension, why not split it into a few functions?

def to_egg(item):
    return is_egg(item) or create_egg(item)

def can_be_egg(item):
    has_egg_property = hasattr(egg, 'egg_property')
    is_egg_instance = isinstance(egg, Egg)
    return egg and has_egg_property and is_egg_instance

eggs = [to_egg(item) for item in list_of_items if can_be_egg(item)]  eggs = [to_egg(item) for item in list_of_items if
        can_be_egg(item)]

While this code is a bit longer, I would personally argue that it's more readable this way.

Maximum line length

Many Python programmers think 79 characters is too constricting and just keep the lines longer. While I am not going to argue for 79 characters specifically, setting a low and fixed limit such as 79 or 99 is a good idea. While monitors get wider and wider, limiting your lines can still help a lot with readability and it allows you to put multiple files next to each other. It's a regular occurrence for me to have four Python files opened next to each other. If the line width were more than 79 characters, that simply wouldn't fit.

The PEP8 guide tells us to use backslashes in cases where lines get too long. While I agree that backslashes are preferable over long lines, I still think they should be avoided if possible. Here's an example from PEP8:

with open('/path/to/some/file/you/want/to/read') as file_1, 
        open('/path/to/some/file/being/written', 'w') as file_2:
    file_2.write(file_1.read())

Instead of using backslashes, I would reformat it like this:

filename_1 = '/path/to/some/file/you/want/to/read'
filename_2 = '/path/to/some/file/being/written'
with open(filename_1) as file_1, open(filename_2, 'w') as file_2:
    file_2.write(file_1.read())

Or perhaps the following:

filename_1 = '/path/to/some/file/you/want/to/read'
filename_2 = '/path/to/some/file/being/written'
with open(filename_1) as file_1:
    with open(filename_2, 'w') as file_2:
        file_2.write(file_1.read())

Not always an option, of course, but it's a good consideration to keep the code short and readable. It actually gives a bonus of adding more information to the code. If, instead of filename_1, you use a name that conveys the goal of the filename, it immediately becomes clearer what you are trying to do.

Verifying code quality, pep8, pyflakes, and more

There are many tools for checking code quality in Python. The simplest ones, such as pep8, just validate a few simple PEP8 errors. The more advanced ones, such as pylint, do advanced introspections to detect potential bugs in otherwise working code. A large portion of what pylint offers is a bit over the top for many projects, but still interesting to look at.

flake8

The flake8 tool combines pep8, pyflakes, and McCabe to set up a quality standard for code. The flake8 tool is one of the most important packages for maintaining code quality in my packages. All the packages that I maintain have a 100% flake8 compliance requirement. It does not promise readable code, but at least it requires a certain level of consistency, which is very important when writing on a project with multiple programmers.

Pep8

One of the simplest tools used to check the quality of Python code is the pep8 package. It doesn't check everything that is in the PEP8 standard, but it goes a long way and is still updated regularly to add new checks. Some of the most important things checked by pep8 are as follows:

  • Indentation, while Python will not check how many spaces you use to indent, it does not help with the readability of your code
  • Missing whitespace, such as spam=123
  • Too much whitespace, such as def eggs(spam = 123):
  • Too many or too few blank lines
  • Too long lines
  • Syntax and indentation errors
  • Incorrect and/or superfluous comparisons (not in, is not, if spam is True, and type comparisons without isinstance)

The conclusion is that the pep8 tool helps a lot with testing whitespace and some of the more common styling issues, but it is still fairly limited.

pyflakes

This is where pyflakes comes in. pyflakes is a bit more intelligent than pep8 and warns you about style issues such as:

  • Unused imports
  • Wildcard imports (from module import *)
  • Incorrect __future__ imports (after other imports)

But more importantly, it warns about potential bugs, such as the following:

  • Redefinitions of names that were imported
  • Usage of undefined variables
  • Referencing variables before assignment
  • Duplicate argument names
  • Unused local variables

The last bit of PEP8 is covered by the pep8-naming package. It makes sure that your naming is close to the standard dictated by PEP8:

  • Class names as CapWord
  • Function, variable, and argument names all in lowercase
  • Constants as full uppercase and being treated as constants
  • The first argument of instance methods and class methods as self and cls, respectively

McCabe

Lastly, there is the McCabe complexity. It checks the complexity of code by looking at the Abstract Syntax Tree (AST). It finds out how many lines, levels, and statements are there and warns you if your code has more complexity than a preconfigured threshold. Generally, you will use McCabe through flake8, but a manual call is possible as well. Using the following code:

def spam():
    pass


def eggs(matrix):
    for x in matrix:
        for y in x:
            for z in y:
                print(z, end='')
            print()
        print()

McCabe will give us the following output:

# pip install mccabe
...
# python -m mccabe cabe_test.py 1:1: 'spam' 1
5:1: 'eggs' 4

Your maximum threshold is configurable, of course, but the default is 10. The McCabe test returns a number that is influenced by parameters such as the size of a function, the nested depths, and a few others. If your function reaches 10, it might be time to refactor the code.

flake8

All of this combined is flake8, a tool that combines these tools and outputs a single report. Some of the warnings generated by flake8 might not fit your taste, so each and every one of the checks can be disabled, both per file and for the entire project if needed. For example, I personally disable W391 for all my projects, which warns about blank lines at the end of a file. This is something I find useful while working on code so that I can easily jump to the end of the file and start writing code instead of having to append a few lines first.

In general, before committing your code and/or putting it online, just run flake8 from your source directory to check everything recursively.

Here is a demonstration with some poorly formatted code:

def spam(a,b,c):
    print(a,b+c)

def eggs():
    pass

It results in the following:

# pip install flake8
...
# flake8 flake8_test.py
flake8_test.py:1:11: E231 missing whitespace after ','
flake8_test.py:1:13: E231 missing whitespace after ','
flake8_test.py:2:12: E231 missing whitespace after ','
flake8_test.py:2:14: E226 missing whitespace around arithmetic operator
flake8_test.py:4:1: E302 expected 2 blank lines, found 1

Pylint

pylint is a far more advanced—and in some cases better—code quality checker. The power of pylint does come with a few drawbacks, however. Whereas flake8 is a really fast, light, and safe quality check, pylint has far more advanced introspection and is much slower for this reason. In addition, pylint will most likely give you a large number of warnings, which are irrelevant or even wrong. This could be seen as a flaw in pylint, but it's actually more of a restriction of passive code analysis. Tools such as pychecker actually load and execute your code. In many cases, this is safe, but there are cases where it is not. Just think of what could happen when executing a command that deletes files.

While I have nothing against pylint, in general I find that most important problems are handled by flake8, and others can easily be avoided with some proper coding standards. It can be a very useful tool if configured correctly, but without configuration, it is very verbose.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset