Chapter 4
1 #!/usr/bin/env python 2 import sys, string 3 4 # The shared mutable data 5 data = [] 6 words = [] 7 word_freqs = [] 8 9 # 10 # The procedures 11 # 12 def read_file(path_to_file): 13 """ 14 Takes a path to a file and assigns the entire 15 contents of the file to the global variable data 16 """ 17 global data 18 with open(path_to_file) as f: 19 data = data + list(f.read()) 20 21 def filter_chars_and_normalize(): 22 """ 23 Replaces all nonalphanumeric chars in data with white space 24 """ 25 global data 26 for i in range(len(data)): 27 if not data[i].isalnum(): 28 data[i] = ' ' 29 else: 30 data[i] = data[i].lower() 31 32 def scan(): 33 """ 34 Scans data for words, filling the global variable words 35 """ 36 global data 37 global words 38 data_str = ''.join(data) 39 words = words + data_str.split() 40 41 def remove_stop_words(): 42 global words 43 with open('../stop_words.txt') as f: 44 stop_words = f.read().split(',') 45 # add single-letter words 46 stop_words.extend(list(string.ascii_lowercase)) 47 indexes = [] 48 for i in range(len(words)): 49 if words[i] in stop_words: 50 indexes.append(i) 51 for i in reversed(indexes): 52 words.pop(i) 53 54 def frequencies(): 55 """ 56 Creates a list of pairs associating 57 words with frequencies 58 """ 59 global words 60 global word_freqs 61 for w in words: 62 keys = [wd[0] for wd in word_freqs] 63 if w in keys: 64 word_freqs[keys.index(w)][1] += 1 65 else: 66 word_freqs.append([w, 1]) 67 68 def sort(): 69 """ 70 Sorts word_freqs by frequency 71 """ 72 global word_freqs 73 word_freqs.sort(lambda x, y: cmp(y[1], x[1])) 74 75 76 # 77 # The main function 78 # 79 read_file(sys.argv[1]) 80 filter_chars_and_normalize() 81 scan() 82 remove_stop_words() 83 frequencies() 84 sort() 85 86 for tf in word_freqs[0:25]: 87 print tf[0], ' - ', tf[1]
IN THIS STYLE, the larger problem is subdivided into subunits, aka procedures, each doing one thing. It is common in this style for the procedures to share data among themselves, as a means to achieve the final goal. Furthermore, the state changes may depend on previous values of the variables. The procedures are said to have side effects on this data. The computation proceeds with one procedure processing some data in the pool and preparing data for the next procedure. Procedures don't return data, as such, they just act on the shared data.
The example program is implemented as follows. A pool of shared data is declared (lines #5–7): the first variable, data, holds the contents of the input file; the second variable, words, holds the words that are extracted from the data; finally, the third variable, word_freqs, holds the word-frequency pairs. The three variables are all initialized to the empty list. This data is shared by a collection of procedures (lines #12–75), each doing a specific task:
The main program is from line #79 until the end. This piece of program text is where the cookbook nature of this style is most visible. Given that the larger problem is neatly divided into smaller subproblems, each addressed by a named procedure, the main program consists of issuing a sequence of commands corresponding to each of those procedures, similar to what one does when following an elaborate cooking recipe. In turn, each of those procedures changes the state of the shared variables, just like our following a cooking recipe changes the state of the ingredients.
A consequence of changing state over time (i.e. mutable state) is that procedures may not be idempotent. That is, calling a procedure twice may result in completely different states of the world, and completely different outputs for the program. For example, if we call the procedure read_file(path_to_file) twice we end up with duplicate data in the data variable because of the cumulative nature of the assignment in line #19. An idempotent function or procedure is one that can be called multiple times yielding exactly the same observable effects as calling it just once. The lack of idempotency is seen by many as a source of programming errors.
In programming, this style is well suited for computational tasks that accumulate external data over time and whose behavior depends on that data. For example, for user interactions where the user is prompted for pieces of input at different points in time, maybe changing that input later, and where the outputs of the program depend on all data that the user has entered, holding on to state and changing it over time is a natural fit.
One issue that will be visible in later chapters is the granularity with which state is shared. In the example program, the variables are global, and they are shared by the entire collection of procedures. Global variables have long been considered a bad idea in all but the shortest programs. Many other styles discussed in this book use procedures that share variables in much smaller scopes. In fact, a lot of interesting stylistic work has been done over the years in order to limit side effects in specific manners.
At the systems scale, cookbook architectural styles are widely used in practice, with components sharing and changing external state such as that stored in databases.
During the 1960s, more and larger programs were being developed that challenged the programming technologies of the time. One of the main challenges had to do with programs being understood by other people. Programming languages were becoming increasingly more featureful, without them necessary letting go of older constructs. The same program could be expressed in many different ways. In the late 1960s, a debate started about which features of the programming languages were "good" and which ones were "bad" [for the purpose of program understanding]. This debate, led, in part, by Dijkstra, advocated restraint in using some features considered harmful, such as long jumps (GOTO), and instead called for the use of higher-level iterative constructs (e.g. while loops), procedures and appropriate modularization of code. Not everyone agreed with Dijkstra's position, but his arguments prevailed. This gave rise to the era of structured programming – the style that is illustrated in this chapter, and that emerged in opposition to unstructured, or monolithic, programming as seen in the previous chapter.
Dijkstra, E. (1970). Notes on Structured Programming. Available from http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF
Synopsis: Dijkstra was one of the most vocal advocates of structured programming. These notes lay out some of Dijkstra's thoughts on programming in general. A classic.
Wulf, W. and Shaw, M. (1973). Global variable considered harmful. SIGPLAN Notices 8(2): 28–34.
Synopsis: More opinions on structuring programs. This paper, as the title says, argues against global variables: not just structured programming, but well-structured programming.
Idempotence: A function, or procedure, is idempotent when multiple applications of it produce exactly the same observable effects as applying it just once.
Mutable variable: A mutable variable is one in which its assigned value can change over time.
Procedure: A procedure is a subroutine of a program. It may or may not receive input parameters, and it may or may not return a value.
Side effect: Side effect is a change in an observable part of a program. Side effects include writing to files or the screen, reading input, changing the value of observable variables, raising an exception, etc. Programs interact with the outside world via side effects.
4.1 Another language. Implement the example program in another language, but preserve the style.
4.2 Scope 'em. Modify the example program so that there are no global variables but the imperative style is still the dominant style and the procedures are essentially the same.
4.3 Double trouble. In the example program, which procedures are idempotent and which ones aren't?
4.4 Details matter. Modify the example program as little as possible but make it so that all procedures become idempotent.
4.5 A different program. Write a different program, also in the Cookbook style, that does exactly the same thing as the example program but with different procedures.
4.6 A different task. Write one of the tasks proposed in the Prologue using the Cookbook style.