© J. Burton Browning and Marty Alchin 2019
J. Burton Browning and Marty AlchinPro Python 3https://doi.org/10.1007/978-1-4842-4385-5_8

8. Documentation

J. Burton Browning1  and Marty Alchin2
(1)
Oak Island, NC, USA
(2)
Agoura Hills, CA, USA
 

Documentation is arguably the most difficult part of any project. Code tends to come fairly easy to programmers, but documentation requires a different set of skills because the audience is strictly human. The magnitude of the differences can vary greatly between projects and audiences. Sometimes all that’s necessary is some example code, whereas other topics can fill entire books and still have plenty left to cover.

The language of documentation is very different from that of code, so it can be difficult to excel at both. This causes many programmers to take the path of least resistance, opting for tools that automatically generate some minimal form of documentation from the code itself, so that the extra work is kept to a minimum. Although that can seem sufficient, such tools can only do so much because they’re limited by what the code alone can tell them. Javadoc for JAVA and Epydoc for Python are examples of such tools.

This chapter will show the tools available to help describe your code and its features for human understanding. There are several options available, some of which go alongside the code itself, while others accompany it on the outside. These can be used individually or in combination to form a full set of documentation for any project. How much of each is necessary will differ based on the needs of each application, but each has its place.

Each section in this chapter will highlight how to document your code with the available tools, along with the benefits and drawbacks of each approach. The most important thing to remember about documentation, however, is that it’s all about presenting what people need to know about your application and how to use it. You must always consider how your code works and what your users will need to know to interact with it. Only then can you pick the approaches that are best for your needs.

Proper Naming

The simplest form of documentation is to properly name the various aspects of your code. With very few exceptions, every single class, function, and variable is given a name when it’s defined. Because these names are already required, all it takes is a little extra thought to make sure they’re accurate and easy to understand. To illustrate how valuable this can be, take a look at a function signature with vague, generic names and see if you can guess what it does:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figa_HTML.jpg
def action(var1, var2):

Given some code inside the body of the function, you might be able to get a good idea of its purpose, but the signature itself does nothing to help. In fact, the only reason the code in the body would be more helpful is that it would typically use more standardized features available from elsewhere. For instance, loops and slicing are easily recognizable, as are methods from commonly used objects, such as a string’s format() method. These are just clues to help make an educated guess, however; the naming should make it obvious:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figb_HTML.jpg
def find_words(text, word):

Just picking some more descriptive names makes the purpose of the function and its arguments much clearer. As a rule of thumb, classes and variables should be named as singular nouns, such as Book, Person, Restaurant, index, and first_name. Functions, by contrast, should be given verbs as names, such as find(), insert(), and process_user().

PEP 8,1 also included as an appendix in this book, offers some more specific guidelines for naming various types of objects. See its “Naming Conventions” section for details. Once you get inside a block of code things aren’t always as easy to follow, so comments can help clarify.

Comments

In classes and functions that are very long or complex, the name alone is often not sufficient to convey all the things the code is doing. Variable names can certainly help, but that usually only explains what the code does; it’s typically more useful to explain why the code does what it does. Both of these can be addressed by placing comments in your code.

Comments are one of the most basic forms of documentation a programmer can use, yet they’re also among the most powerful. Comments are placed directly alongside the rest of your code, where it’s easiest to write and is often most helpful. Comments offer a convenient way to make small notes where they’re most relevant, which can help make complex code easier to understand later on.

Python’s comments are separated from code by the # symbol. All of the text that follows that symbol is treated as a comment, all the way to the end of the line. This allows comments to either take up a whole line or attach to the end of a line of code. Unlike some other languages, Python doesn’t have any true syntax for multiline comments unless you use a docstring triple quoted string, as tweeted by Guido van Rossum in 2011. (A more detailed discussion of docstrings follows shortly, so hold on for more details.) Formally for strings, each line of a longer comment must be preceded by a # symbol. Note both methods here:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figc_HTML.jpg
def foo(): #example of a docstring comment
    """alkaj
    laksjf
    alkdfj"""
x=1
print (x) # shows value for x
foo() # does nothing
../images/330715_3_En_8_Chapter/330715_3_En_8_Figd_HTML.jpg
# This function doesn't really do anything useful. It's only here to show
# how multi-line comments work in Python. Notice how each line has to have
# a separate # to indicate that it's a comment.
def example():
    pass

Like naming conventions, the Python Style Guide has a lot to say on how comments should be formatted. See the “Comments” heading of PEP 8 for details.

Perhaps the biggest limitation of comments is that they’re only available when viewing the source file directly. Because comments don’t have any impact on the execution of the code, there are no introspection tools available to read them at runtime. For that, we turn to docstrings.

Docstrings

In the previous section, as well as in Chapters 3 and 4, we referred briefly to docstrings and how they’re used in code. A docstring is placed at the beginning of a module, function, or class; rather than assigning it to a variable, however, you can just leave the string as its own statement. As long as it’s the first thing in the code block, Python will interpret it as a docstring:

../images/330715_3_En_8_Chapter/330715_3_En_8_Fige_HTML.jpg
def find_words(text, word):
    """
    Locate all instances of a word in a given piece of text.
    Return a list of indexes where the words were found.
    If no instances of the word were found, return an empty list.
    text -- a block of text to search
    word -- an individual word to search for
    """

This information could be presented in a set of comments, but there’s one major advantage of using docstrings instead: Python makes them available in code. In keeping with the spirit of transparency, docstrings can be accessed at runtime through the __doc__ attribute of modules, classes, and functions. Perhaps the most obvious benefit this brings is that the various automatic documentation generators get a lot more information to work with. Better yet, that information is written specifically for humans, which can greatly improve the quality of the final output.

Exactly how it’s written, however, is entirely up to you. Aside from where docstrings can be placed in your code, Python makes no assumptions or requirements about the format or structure of their contents. PEP 257,2 also provided as an appendix, provides a number of recommendations, but the final decision is left up to you. The goal is to help people understand how to use your code, however, so there are a few particulars that everyone should follow.

Describe What the Function Does

As simple as it sounds, it can sometimes be difficult to step back from how the code works and simply describe what it does. For most functions you should be able to describe it in one sentence, preferably on a single line. Common examples are “add an item to the collection” and “cache an object for later use.” The details of how the code achieves that goal are best left out of the docstring.

Explain the Arguments

Argument names are limited to one or two words. This works well as a simple reminder of their purpose, but more information is usually needed to understand their purposes in the first place. This is particularly important for optional arguments, which often help control how the function works. Even if the argument names are self-explanatory, including a brief description helps maintain consistency across your documentation.

Don’t Forget the Return Value

Any time a function returns a value, the nature of that value should be documented. It should include the return value’s type as well as any relevant details about how the object will be formed. For example, find_words() returns a list, but that list contains indexes where the words were found, rather than returning the words themselves, so that behavior is documented.

Also, make sure that if the return value differs slightly based on what input was given or what other conditions the function works with, the different forms of return values are given. For example, a function to retrieve an object by name might be given a name that doesn’t match any existing objects. In that case, it’s important to document whether the function will create a new object or raise an exception.

Include Any Expected Exceptions

Every piece of code contains opportunities for exceptions to be raised. Sometimes those exceptions are actually part of the code’s expected functionality, such as when looking for an object by a name that doesn’t match anything. In these cases, the exceptions should be documented right alongside the return values. These explicit exceptions are frequently caught by the code that calls your function, so it’s necessary to indicate which ones will be raised, as well as the circumstances under which they’ll be raised.

Documentation Outside the Code

One thing you’ll notice about the recommendations in the previous section is that they aren’t specific to docstrings. You should also document your application outside of the code, and that documentation needs to include all the same details. What makes this external documentation different is how the information is presented, and it will also include additional information not covered inside the code itself.

This general class of documentation can cover a wide variety of topics, many of which wouldn’t make any sense inside the code. After all, someone who’s reading your code is likely to have something already in mind to look for. They’ll be looking for more information about a specific module, class, or function that they already know how to find. Other users will have a broader range of needs, from installation and tutorials to more topical references that show how to combine multiple features toward a certain goal.

Installation and Configuration

Before anyone can use your software, they will need to obtain it and get it working. This almost goes without saying, but not quite. There are a number of issues that users need to tackle before they can use your code, and you need to make sure that those issues are addressed as thoroughly as possible.

Obtaining the code is the first step. However you choose to distribute your code, you’ll need to make sure your users know how to get it. Sometimes it will be a simple one-line command, but in other cases it may require first obtaining other applications such as version control software to get the latest code without waiting for a release. Chapter 10 will describe some of the more common ways to distribute your code, along with what your choices will mean for the users who need to retrieve it.

Tutorials

After getting an application, many users want to immediately get an idea of how to use it. Everybody appreciates immediate gratification, so you can use their first experience with your software as an opportunity to accomplish something quickly. Tutorials are a great way to walk your users through the most common features of your application.

A tutorial can often showcase the greatest strengths of an application, so it can also be your first chance to convince someone to try it out in the first place. This is particularly true with libraries and frameworks, which are designed to be integrated into other code rather than be used independently. If your audience can get a quick feel for how your approach can help them work with their own code, it will make a lasting impression.

Reference Documents

Once your users have a good idea of how your application can help them and have gotten a bit of experience under their belts, their needs change again. At this point they no longer need to be convinced to use your software, and they’re ready to move beyond learning how to use it. Now they need reminders of how all the features work, how those features work together, and how they can integrate with the tasks they’re really trying to perform.

Different readers will look for different forms of reference documentation. Some may prefer method-level arguments and return values, like those contained in docstrings, whereas others may get more out of a broader overview, written in plain language. Some readers, like you, even enjoy a physical book, easy to pick up and flip through at a moment’s notice.

With all of these different preferences, it’s unlikely that you’ll be able to write reference documentation in a way that will suit all tastes. As the author, it’s your job to determine what type of documentation best suits your application. Look to your own preferences for the type of documentation you like to read most, as that’s likely to be in the same spirit of the software you create. Just write the way you’d like to read. The users who like your documentation are likely to be the very same ones who will like your software.

Note

One important thing to remember is that you may not need reference documentation at all. For very simple applications, a tutorial alone may be enough to illustrate and explain all the available features.

Documentation Utilities

Some of the most challenging aspects of documentation have nothing to do with your application or how you plan to write about it. Beyond those concerns, tasks such as formatting, referencing, and presenting documentation can consume quite a bit of time and energy. The more documents you need to write, the harder these tasks become. The third-party docutils package3 provides a comprehensive set of tools to make this process more manageable.

The crown jewel of the docutils package is reStructuredText, more often referred to as ReST or simply RST. reStructuredText is a markup language designed for writing technical documents, taking what its developers call a What You See Is What You Mean (WYSIWYM) approach. This is in contrast with the more traditional What You See Is What You Get (WYSIWYG), where editing based on the visual layout and formatting of the document.

In WYSIWYM, the goal is to indicate the structure and intentions of the document, without regard to exactly how it will be presented. Much like HTML, separating content from its presentation allows you to focus on what’s really important about your documentation and leave the details of visual style for later. reStructuredText uses a more text-friendly approach than HTML, however, so that even unformatted documents are easily readable.

Readability Counts

In keeping with Python philosophy, reStructuredText focuses on readability at all times, even before the document gets formatted into its final format. The structure of a document and the instructions are designed to be understandable and easy to remember and format.

Formatting

The most basic unit of any type of document is the paragraph, so reStructuredText makes them the easiest to work with. All you need to do is write a block of text with each line of text starting immediately below the one before it. The number of lines and the length of each line are irrelevant, as long as there are no completely blank lines between any lines of text in a given paragraph.

Blank lines are reserved for separating paragraphs from each other and from other types of content. This forms a simple way to distinguish one paragraph from another. You can use multiple blank lines if you’d like, but only one is required. Indenting a paragraph indicates a quoted passage from another document, which will typically also be indented in the output. To illustrate, here are a couple of simple paragraphs written for reStructuredText:

The reStructuredText format is very simple when it comes down to it. It's all
about readability and flexibility. Common needs, such as paragraphs and inline
formatting, are simple to write, read and maintain. More complex features are
possible, and they use a simple, standardized syntax.
After all, the Zen of Python says:
    Simple is better than complex.
    Complex is better than complicated.

Most application documentation will also include blocks of code along with regular text. This is particularly useful for tutorials, in which a block of code can be built up in pieces, with explanations in between. Distinguishing between a paragraph and a block of code is based on a double colon at the end of a normal paragraph, followed by an indented block of code. This will end the first paragraph with a colon and format the indented text as code:

The reStructuredText format is very simple when it comes down to it. It's all
about readability and flexibility. Common needs, such as paragraphs and inline
formatting, are simple to write, read and maintain. More complex features are
possible, and they use a simple, standardized syntax.
After all, the Zen of Python says::
    Simple is better than complex.
    Complex is better than complicated.

Note

You’ll notice that the example shown here isn’t actually code. The double-colon format technically distinguishes a block of text as preformatted. This prevents the reStructuredText parser from doing any additional processing on that block. Therefore, even though it’s most useful for including code in your documentation, it can be used for anything that already has its own formatting that should remain intact.

Inside an individual paragraph, you can also format text in all the ways you’d expect. Rather than directly marking things for italics or bold, this formatting requires the use of additional punctuation before and after the text you’d like to format. Surrounding a word or phrase with asterisks marks it as emphasized, which will typically render in italics. Using an extra pair of asterisks beyond that will indicate strong emphasis, often rendering as bold.

Links

When working with large amounts of documentation, one of the most important features you can offer is linking multiple documents together. The reStructuredText format offers several different ways to link to additional information, whether footnotes, other sections in the same document, or completely different documents. The simplest form of link you can include is a URL, which will be converted into a link when rendering the document. Other types of links require a bit more formatting.

Links take the form of an underscore following the text that should be used as the link. The target of the link is specified differently, depending on where that target is located. In the most common case, in which a document links to some external web page, the link target is placed in what might appear to be its own paragraph, with a structure that tells the parser that it’s a link instead of an actual paragraph:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figf_HTML.jpg
This paragraph shows the basics of how a link is formed in reStructuredText.
You can find additional information in the official documentation_.
.. _documentation: http://docutils.sf.net/docs/

This will cause the word “documentation” to be used as the link itself, referencing the target given on the bottom line. You’ll usually need to use more than one word for the text of a link, but this doesn’t provide a way to specify how much text should be included. To do that, you’ll need to enclose the text in backticks (`). The underscore then goes outside the enclosure, immediately following the second backtick:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figg_HTML.jpg
This paragraph shows the basics of how a link is formed in reStructuredText.
You can find additional information in the `official documentation`_.
.. _official documentation: http://docutils.sf.net/docs/

In this case, the link target is specified immediately below the paragraph where the link should be placed. This particular case can be simplified a bit by creating an anonymous link, which no longer requires rewriting the link text underneath. In order to distinguish it from a regular link, you’ll need to use two underscores after the link text instead of just one. Then, the link target is specified with only two underscores at the beginning of the line:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figh_HTML.jpg
This paragraph shows the basics of how a link is formed in reStructuredText.
You can find additional information in the `official documentation`__.
__ http://docutils.sf.net/docs/

Readability Counts

There’s also another way to specify external links that’s even more space-efficient: place the link target directly alongside the link text, inside the paragraph itself. Links formatted this way still use backticks to set the link apart from the rest of the text, but the link target goes inside the backticks as well, after being enclosed in angle brackets. To distinguish it as a link, two underscores are still used, so it is parsed as an anonymous link—for example, `Pro Python < http://propython.com />`__.

The problem with this approach is that having the URL inside the paragraph can be very distracting when reading the source code for the document, even though the target will be hidden from view from the final output. Furthermore, named link targets can all be placed at the end of the document, so they don’t even have to interrupt the flow from one paragraph to another.

Rather than referencing external documents, you can also include footnotes to be placed at the end of the same document or in an attached bibliography. Defining this type of link works much like standard links except that the link text is set apart by square brackets. Between the brackets, the text can either be just a number or a small piece of text, which will be used to reference the related information elsewhere.

Then, at the end of the document, the referenced information can be included in a format similar to named link targets. Rather than using an underscore to signify it, the reference text from earlier in the document is enclosed in square brackets again. After that, simply write the related text in the paragraph. This can be used for references to traditional publications, such as books, as well as for minor additions to further clarify the main text:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figi_HTML.jpg
The reStructuredText format isn't part of Python itself, but it's popular enough
that even published books [1]_ reference it as an integral part of the Python
development process.
.. [1] Alchin, Marty. *Pro Python*. Apress, 2010.

In addition to these options, docutils allows reStructuredText to be expanded to provide other features. One application that provides some additional features is Sphinx.

Sphinx

The base features provided by reStructuredText are designed to work with individual documents. Even though it’s easy to reference other documents, those references must be explicitly included in each document. If you write a complex application that requires multiple documents, each one will need to know the full structure of all the documents in order to reference them.

Sphinx4 is an application that attempts to address that problem by working with the documents as a whole collection. In this way it’s somewhat similar to other, more popular automated systems such as Javadoc and Doxygen, but Sphinx is designed to get its content from dedicated files rather than directly from the code itself. It can also include content based on code, but the main goal is to write documentation on its own.

By managing references across documents more effectively, Sphinx can generate an entire documentation package at once. This can be a web site full of linked HTML documents or even a single PDF document that includes all the documents as individual sections. In addition, Sphinx offers a variety of styling options, with many already supplied by a growing community.

Exciting Python Extensions: NumPy

As noted on the main site for NumPy, it is “the fundamental package for scientific computing with Python.” As such, it offers much power to a Python programmer.

NumPy is the most fundamental package for scientific computing and data manipulation with Python. If you need to work in Python with standard arrays, than Numpy is the ticket. Typically, it will be used in conjunction with SciPy, and is one of the core packages in SciPy. One thing about the base Python implementation is that it does not have standard array structures as other languages do. By “standard” we mean arrays that hold like data (e.g., all integer, all character, etc.). So, to the rescue is NumPy. However, it does much more. Let’s try a few of the interesting features in NumPy. First you will need to install it.

Install NumPy

If using Windows, try this from an escalated command prompt type:
pip install numpy (enter)

It should respond that it installed correctly or that it was already installed.

Using NumPy

First, standard non-Python arrays are handy things. Python uses Lists, Dictionary’s, and Tuples; they are powerful, yet sometimes an old-fashioned array is just the thing to solve a problem. A NumPy array is just like one you might use in C++ or other languages in that they contain the same type of data elements (each is an int, float, character, etc.). It also cannot be changed with regard to size, unless you delete it and recreate a larger one. It is also interesting to note that a NumPy array is smaller in terms of memory usage than the same structure stored as a list.

Python array-like structures and standard arrays each offer their own unique benefits. So if you need a standard array, you can create them with NumPy with ease:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figj_HTML.jpg
Try the following:
#NumPy create a 1 dimensional numeric array from a list
import numpy as mynp
my_list = [1,2,3,4,5]
array1 = mynp.array(my_list)
#Print array and its type
print (array1)
print(type(array1))

In the preceding example, each item in the list is treated as a numeric value. However, if you change one value in the list to alphanumeric, the entire array becomes a character array:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figk_HTML.jpg
#NumPy create a 1 dimensional character array from a list
import numpy as mynp
my_list = [1,2,3,'a',5]
array1 = mynp.array(my_list)
#Print array and its type
print (array1)
print(type(array1))

So in this conversion it would not work well if you were performing some math manipulations on the values in the array, as in the next example:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figl_HTML.jpg
#Add one to each value
import numpy as mynp
my_list = [1,2,3,4,5]
array1 = mynp.array(my_list)
#Print array and its type
print (array1)
print('With one added two each: ')
for item in array1:
    print (item + 1)

Since each was a numeric value in the array, we could add one to it and display the result. If you wanted to specify the array type, as you would with another language such as C++, you might do the following:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figm_HTML.jpg
#NumPy 1 dimensional array from a list as floating-point values
#and make it a float array
import numpy as mynp
my_list = [1.1,2.1,3.1,4.1,5.1]
array1 = mynp.array(my_list, dtype="float")
#Print the array
print (array1)

You can also convert from one type to another with astype, as in array1.astype(‘int’) or other valid data types such as bool, str, or float. Or, you could convert the array back to a list with array1.tolist().

Working With NumPy Arrays

You can address an array in a similar fashion to other Python structures. In this next example we will extract one element and find truth to a question, based on each element in the array:

../images/330715_3_En_8_Chapter/330715_3_En_8_Fign_HTML.jpg
#NumPy create a 1 dimensional array from a list
#and make it a float array
import numpy as mynp
my_list = [1.1,2.1,3.1,4.1,5.1]
array1 = mynp.array(my_list, dtype="float")
#Print the array
print (array1)
print("Print second element of array")
print (array1[1])
print("Print if element is > 2")
print (array1 > 2)

Statistical Measures

NumPy has some statistical functions built in, such as the standards min, max, and mean. With regard to random numbers (such as for random participant selection in a study or cryptographic work), the random library built in to NumPy is very similar to the enhanced features of C++’s random library. Use a numeric array to try it out:

../images/330715_3_En_8_Chapter/330715_3_En_8_Figo_HTML.jpg
#NumPy stats functions
import numpy as mynp
my_list = [1,2,7,4,5]
array1 = mynp.array(my_list, dtype="int")
print ('Minimum:> ',array1.min())
print ('Max:> ',array1.max())
print ('Mean of all values:> ',array1.mean())
#if you want only pseudo-randoms set a seed value
#np.random.seed(100)  # uncomment for pseudo-randoms
print('Random int between 1 and 100):> ',mynp.random.randint(0, 100))

Taking It With You

The tools shown here serve only as a base for the real work of documenting your code. The real work of documentation requires taking a step back from the code itself so that you can see your application the way your users and other developers would see it. Keeping that in mind, it’s often useful to read documentation for other similar applications. That will give you a good idea of what your users are used to seeing, the types of questions they need answered, and how to distinguish your application as a superior alternative to the existing options.

On the other end of the spectrum, you can also help your users by taking a very close look at your code. Putting your code under the tightest scrutiny will allow you to write tests. The next chapter will show how tests can verify that your application works the way it should and that your documentation stays as accurate as possible.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset