Chapter 4

Cookbook

image

4.1 Constraints

  • No long jumps.
  • Complexity of control flow tamed by dividing the large problem into smaller units using procedural abstraction. Procedures are pieces of functionality that may take input, but that don't necessarily produce output that is relevant for the problem.
  • Procedures may share state in the form of global variables.
  • The larger problem is solved by applying the procedures, one after the other, that change, or add to, the shared state.

4.2 A Program in this Style

  1 #!/usr/bin/env python
  2 import sys, string
  3
  4 # The shared mutable data
  5 data = []
  6 words = []
  7 word_freqs = []
  8
  9 #
 10 # The procedures
 11 #
 12 def read_file(path_to_file):
 13 """
 14 Takes a path to a file and assigns the entire
 15 contents of the file to the global variable data
 16 """
 17 global data
 18 with open(path_to_file) as f:
 19    data = data + list(f.read())
 20
 21 def filter_chars_and_normalize():
 22 """
 23 Replaces all nonalphanumeric chars in data with white space
 24 """
 25 global data
 26 for i in range(len(data)):
 27    if not data[i].isalnum():
 28        data[i] = ' '
 29    else:
 30        data[i] = data[i].lower()
 31
 32 def scan():
 33 """
 34 Scans data for words, filling the global variable words
 35 """
 36 global data
 37 global words
 38 data_str = ''.join(data)
 39 words = words + data_str.split()
 40
 41 def remove_stop_words():
 42 global words
 43 with open('../stop_words.txt') as f:
 44    stop_words = f.read().split(',')
 45 # add single-letter words
 46 stop_words.extend(list(string.ascii_lowercase))
 47 indexes = []
 48 for i in range(len(words)):
 49    if words[i] in stop_words:
 50        indexes.append(i)
 51 for i in reversed(indexes):
 52    words.pop(i)
 53
 54 def frequencies():
 55 """
 56 Creates a list of pairs associating
 57 words with frequencies
 58 """
 59 global words
 60 global word_freqs
 61 for w in words:
 62    keys = [wd[0] for wd in word_freqs]
 63    if w in keys:
 64        word_freqs[keys.index(w)][1] += 1
 65    else:
 66        word_freqs.append([w, 1])
 67
 68 def sort():
 69 """
 70 Sorts word_freqs by frequency
 71 """
 72 global word_freqs
 73 word_freqs.sort(lambda x, y: cmp(y[1], x[1]))
 74
 75
 76 #
 77 # The main function
 78 #
 79 read_file(sys.argv[1])
 80 filter_chars_and_normalize()
 81 scan()
 82 remove_stop_words()
 83 frequencies()
 84 sort()
 85
 86 for tf in word_freqs[0:25]:
 87 print tf[0], ' - ', tf[1]

4.3 Commentary

IN THIS STYLE, the larger problem is subdivided into subunits, aka procedures, each doing one thing. It is common in this style for the procedures to share data among themselves, as a means to achieve the final goal. Furthermore, the state changes may depend on previous values of the variables. The procedures are said to have side effects on this data. The computation proceeds with one procedure processing some data in the pool and preparing data for the next procedure. Procedures don't return data, as such, they just act on the shared data.

The example program is implemented as follows. A pool of shared data is declared (lines #5–7): the first variable, data, holds the contents of the input file; the second variable, words, holds the words that are extracted from the data; finally, the third variable, word_freqs, holds the word-frequency pairs. The three variables are all initialized to the empty list. This data is shared by a collection of procedures (lines #12–75), each doing a specific task:

  • read_file(path to file) (lines #12–19) takes a path to a file and joins the entire contents of that file with the current value of the global variable data.
  • filter_chars_and_normalize() (lines #21–30) replaces all nonalphanumeric characters in data with white space. The replacement is done in place.
  • scan() scans the data (lines #32–39) for words by using the built-in function split, and it adds them to the global variable words.
  • remove_stop_words() (lines #41–52) first loads the list of stop words from a file and appends it with single-letter words (lines #44–48). Then it traverses the words list and removes all the stop words from it. It does so by first storing the indexes of where the stop words are in the words list, and then using the built-in function pop to remove those words from the words list.
  • frequencies() (lines #54–66) traverses the words list and creates a list of pairs associating words with frequencies.
  • sort() (lines #68–73) sorts the contents of the variable word_freqs by decreasing order of frequency. It does so by using the built-in function sort, which can take an anonymous function of 2 arguments, which, in this case, is the second element (index 1) of each pair in the word_freqs list.

The main program is from line #79 until the end. This piece of program text is where the cookbook nature of this style is most visible. Given that the larger problem is neatly divided into smaller subproblems, each addressed by a named procedure, the main program consists of issuing a sequence of commands corresponding to each of those procedures, similar to what one does when following an elaborate cooking recipe. In turn, each of those procedures changes the state of the shared variables, just like our following a cooking recipe changes the state of the ingredients.

A consequence of changing state over time (i.e. mutable state) is that procedures may not be idempotent. That is, calling a procedure twice may result in completely different states of the world, and completely different outputs for the program. For example, if we call the procedure read_file(path_to_file) twice we end up with duplicate data in the data variable because of the cumulative nature of the assignment in line #19. An idempotent function or procedure is one that can be called multiple times yielding exactly the same observable effects as calling it just once. The lack of idempotency is seen by many as a source of programming errors.

4.4 This Style in Systems Design

In programming, this style is well suited for computational tasks that accumulate external data over time and whose behavior depends on that data. For example, for user interactions where the user is prompted for pieces of input at different points in time, maybe changing that input later, and where the outputs of the program depend on all data that the user has entered, holding on to state and changing it over time is a natural fit.

One issue that will be visible in later chapters is the granularity with which state is shared. In the example program, the variables are global, and they are shared by the entire collection of procedures. Global variables have long been considered a bad idea in all but the shortest programs. Many other styles discussed in this book use procedures that share variables in much smaller scopes. In fact, a lot of interesting stylistic work has been done over the years in order to limit side effects in specific manners.

At the systems scale, cookbook architectural styles are widely used in practice, with components sharing and changing external state such as that stored in databases.

4.5 Historical Notes

During the 1960s, more and larger programs were being developed that challenged the programming technologies of the time. One of the main challenges had to do with programs being understood by other people. Programming languages were becoming increasingly more featureful, without them necessary letting go of older constructs. The same program could be expressed in many different ways. In the late 1960s, a debate started about which features of the programming languages were "good" and which ones were "bad" [for the purpose of program understanding]. This debate, led, in part, by Dijkstra, advocated restraint in using some features considered harmful, such as long jumps (GOTO), and instead called for the use of higher-level iterative constructs (e.g. while loops), procedures and appropriate modularization of code. Not everyone agreed with Dijkstra's position, but his arguments prevailed. This gave rise to the era of structured programming – the style that is illustrated in this chapter, and that emerged in opposition to unstructured, or monolithic, programming as seen in the previous chapter.

4.6 Further Reading

Dijkstra, E. (1970). Notes on Structured Programming. Available from http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF
Synopsis: Dijkstra was one of the most vocal advocates of structured programming. These notes lay out some of Dijkstra's thoughts on programming in general. A classic.

Wulf, W. and Shaw, M. (1973). Global variable considered harmful. SIGPLAN Notices 8(2): 28–34.
Synopsis: More opinions on structuring programs. This paper, as the title says, argues against global variables: not just structured programming, but well-structured programming.

4.7 Glossary

Idempotence: A function, or procedure, is idempotent when multiple applications of it produce exactly the same observable effects as applying it just once.

Mutable variable: A mutable variable is one in which its assigned value can change over time.

Procedure: A procedure is a subroutine of a program. It may or may not receive input parameters, and it may or may not return a value.

Side effect: Side effect is a change in an observable part of a program. Side effects include writing to files or the screen, reading input, changing the value of observable variables, raising an exception, etc. Programs interact with the outside world via side effects.

4.8 Exercises

4.1 Another language. Implement the example program in another language, but preserve the style.

4.2 Scope 'em. Modify the example program so that there are no global variables but the imperative style is still the dominant style and the procedures are essentially the same.

4.3 Double trouble. In the example program, which procedures are idempotent and which ones aren't?

4.4 Details matter. Modify the example program as little as possible but make it so that all procedures become idempotent.

4.5 A different program. Write a different program, also in the Cookbook style, that does exactly the same thing as the example program but with different procedures.

4.6 A different task. Write one of the tasks proposed in the Prologue using the Cookbook style.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset