Chapter 8. Case Study: Word Play

This chapter is intended to let you practice and consolidate the knowledge you have acquired so far, rather than introducing new concepts. To help you gain experience with programming, we will cover a case study that involves solving word puzzles by searching for words that have certain properties. For example, we’ll find the longest palindromes in English and search for words whose letters appear in alphabetical order. And I will present another program development plan: reduction to a previously solved problem.

Reading from and Writing to Files

For the exercises in this chapter, we will need our programs to read text from files. In many programming languages, this often means that we need a statement to open a file, then a statement or group of statements to read the file’s content, and finally a statement to close the file (although this last operation may be performed automatically in some circumstances).

We are interested here in text files that are usually made of lines separated by logical newline characters; depending on your operating system, such logical newline characters consist of either one (Linux, Mac) or two (Windows) physical characters (bytes).

The Perl built-in function open takes the path and name of the file as a parameter and returns a file handle (IO::Handle object) which you can use to read the file (or to write to it):

my $fh = open("path/to/myfile.txt", :r);
my $data = $fh.slurp-rest;
$fh.close;

The :r option is the file mode (read). $fh is a common name for a file handle. The file object provides methods for reading, such as slurp-rest, which returns the full content of the file from the current position to the end (and the entire content of the file if we’ve just opened it).

This is the traditional way of opening and reading files in most languages.

However, Perl’s IO role (in simple terms, a role is a collection of related methods) offers simpler methods which can open a file and read it all in one single instruction (i.e., without having to first open a file handle and then close it):

my $text = slurp "path/to/myfile.txt";
# or:
my $text = "path/to/myfile.txt".IO.slurp;

slurp takes care of opening and closing the file for you.

We can also read the file line by line, which is very practical if each line contains a logical entity such as a record, and is especially useful for very large files that might not fit into memory:

for 'path/to/hugefile.txt'.IO.lines -> $line {
    # Do something with $line
}

By default, the .lines method will remove the trailing newline characters from each line, so that you don’t have to worry about them.

We haven’t studied arrays yet, but you can also read all lines of a file into an array, with each line of the file becoming an array item. For example, you can load the myfile.txt file into the @lines array:

my @lines = "myfile.txt".IO.lines;

Accessing any line can then be done with the bracket operator and an index. For example, to print the first and the seventh line:

say @lines[0];
say @lines[6];

To write data to a file, it is possible to open a file just as when wanting to read from it, except that the :w (write) option should be used:

my $fh = open("path/to/myfile.txt", :w);
$fh.say("line to be written to the file");
$fh.close;

If the file already existed, any existing content will be clobbered. So be careful when you want to open a file in write mode.

It is also possible to open the file in append mode, using the :a option. New data will then be added after the existing content.

Writing to a file can be simplified using the spurt function, which opens the file, writes the data to it, and closes it:

spurt "path/to/myfile.txt", "line to be written to the file
";

Used this way, spurt will clobber any existing content in the file. It may also be used in append mode with the :append option:

spurt "path/to/myfile.txt", "line to be added
", :append;

Reading Word Lists

For the exercises in this chapter we need a list of English words. There are lots of word lists available on the web, but one of the most suitable for our purpose is one of the word lists collected and contributed to the public domain by Grady Ward as part of the Moby lexicon project. It is a list of 113,809 official crosswords; that is, words that are considered valid in crossword puzzles and other word games. In the Moby collection, the filename is 113809of.fic; you can download a copy, with the simpler name words.txt, from http://thinkpython2.com/code/words.txt.

This file is in plain text (with each word of the list on its own line), so you can open it with a text editor, but you can also read it from Perl. Let’s do so in the interactive mode (with the REPL):

> my $fh = open("words.txt", :r);
IO::Handle<words.txt>(opened, at octet 0)
> my $line = get $fh;
aa
> say "<<$line>>";
<<aa>>

The get function reads one line from the file handle.

The first word in this particular list is “aa” (a kind of lava).

Printing the $line variable between angle brackets within a string shows us that the get function removed implicitly the trailing newline characters, in this case a (carriage return and newline) character combination, since this file was apparently prepared under Windows.

The file handle keeps track of what has been read from the file and what it should read next, so if you call get again, you obtain the next line:

> my $line = get $fh;
aah

The next word is “aah,” which is a perfectly legitimate word, so stop looking at me like that.

This is good and fine if we want to explore the first few lines of the words.txt file, but we are not going to read the 113,809 lines of the file this way.

We need a loop to do it for us. We could insert the above get instruction into a while loop, but earlier we discussed an easier and more efficient way of doing this with a for loop and the IO.lines method, without the hassle of having to open or close the file:

for 'words.txt'.IO.lines -> $line {
    say $line;
}

This code reads the file line by line, and prints each line to the screen. We’ll soon do more interesting things than just displaying the lines on the screen.

Exercises

This case study consists mainly of exercises and solutions to them within the body of the chapter because solving the exercises is the main teaching material of this chapter. Therefore, solutions to these exercises are in the following sections of this chapter, not in the appendix. You should at least attempt each one before you read the solutions.

Exercise 8-1.

Write a program that reads words.txt and prints only the words with more than 20 characters.

Exercise 8-2.

In 1939 Ernest Vincent Wright published a 50,000-word novel called Gadsby that does not contain the letter “e”. Since “e” is the most common letter in English, that’s not easy to do. In fact, it is difficult to construct a solitary thought without using that most common letter. Such a writing in which one letter is avoided is sometimes called a lipogram.

Write a subroutine called has-no-e that returns True if the given word doesn’t have the letter “e” in it.

Modify your program from the previous exercise to print only the words that have no “e” and compute the percentage of the words in the list that have no “e”.

(The word dictionary we are using, words.txt, is entirely in lowercase letters, so you don’t need to worry about any uppercase “E”.)

Exercise 8-3.

Write a subroutine named avoids that takes a word and a string of forbidden letters, and that returns True if the word doesn’t use any of the forbidden letters.

Next, modify your program to prompt the user to enter a string of forbidden letters and then print the number of words that don’t contain any of them. Can you find a combination of five forbidden letters that excludes the smallest number of words?

Exercise 8-4.

Write a subroutine named uses-only that takes a word and a string of letters, and that returns True if the word contains only letters in the list. Can you make a sentence using only the letters acefhlo? Other than “Hoe alfalfa?”

Exercise 8-5.

Write a subroutine named uses-all that takes a word and a string of required letters, and returns True if the word uses all the required letters at least once. How many words are there that use all the vowels aeiou? How about aeiouy?

Exercise 8-6.

Write a function called is_abecedarian that returns True if the letters in a word appear in alphabetical order (double letters are ok). How many abecedarian words are there?

Debugging

Testing programs is hard. The functions in this chapter are relatively easy to test because you can check the results by hand. Even so, it is somewhere between difficult and impossible to choose a set of words that test for all possible errors.

Taking has_no_e as an example, there are two obvious cases to check: words that have an “e” should return False, and words that don’t should return True. You should have no trouble coming up with one of each.

Within each case, there are some less obvious subcases. Among the words that have an “e”, you should test words with an “e” at the beginning, the end, and somewhere in the middle. You should test long words, short words, and very short words, like an empty string. The empty string is an example of a special case, which is one of the nonobvious cases where errors often lurk.

In addition to the test cases you generate, you can also test your program with a word list like words.txt. By scanning the output, you might be able to catch errors, but be careful: you might catch one kind of error (words that should not be included, but are) and not another (words that should be included, but aren’t).

In general, testing can help you find bugs, but it is not easy to generate a good set of test cases, and even if you do, you can’t be sure your program is correct.

According to a legendary computer scientist:

Program testing can be used to show the presence of bugs, but never to show their absence!

— Edsger W. Dijkstra

Glossary

file object

A value that represents an open file.

reduction to a previously solved problem

A way of solving a problem by expressing it as an instance of a previously solved problem.

special case

A test case that is atypical or nonobvious (and less likely to be handled correctly). The expressions edge case and corner case convey more or less the same idea.

Exercises

Exercise 8-7.

This question is based on a Puzzler that was broadcast on the radio program Car Talk:

Give me a word with three consecutive double letters. I’ll give you a couple of words that almost qualify, but don’t. For example, the word committee, c-o-m-m-i-t-t-e-e. It would be great except for the “i” that sneaks in there. Or Mississippi: M-i-s-s-i-s-s-i-p-p-i. If you could take out those i’s it would work. But there is a word that has three consecutive pairs of letters and to the best of my knowledge this may be the only word. Of course there are probably 500 more but I can only think of one. What is the word?

Write a program to find it.

Solution: “Exercise 8-7: Consecutive Double Letters”.

Exercise 8-8.

Here’s another Car Talk Puzzler):

I was driving on the highway the other day and I happened to notice my odometer. Like most odometers, it shows six digits, in whole miles only. So, if my car had 300,000 miles, for example, I’d see 3-0-0-0-0-0.

Now, what I saw that day was very interesting. I noticed that the last 4 digits were palindromic; that is, they read the same forward as backward. For example, 5-4-4-5 is a palindrome, so my odometer could have read 3-1-5-4-4-5.

One mile later, the last 5 numbers were palindromic. For example, it could have read 3-6-5-4-5-6. One mile after that, the middle 4 out of 6 numbers were palindromic. And you ready for this? One mile later, all 6 were palindromic!

The question is, what was on the odometer when I first looked?

Write a program that tests all the six-digit numbers and prints any numbers that satisfy these requirements.

Solution: “Exercise 8-8: Palindromes in Odometers”.

Exercise 8-9.

Here’s another Car Talk Puzzler you can solve with a search:

Recently I had a visit with my mom and we realized that the two digits that make up my age when reversed resulted in her age. For example, if she’s 73, I’m 37. We wondered how often this has happened over the years but we got sidetracked with other topics and we never came up with an answer.

When I got home I figured out that the digits of our ages have been reversible six times so far. I also figured out that if we’re lucky it would happen again in a few years, and if we’re really lucky it would happen one more time after that. In other words, it would have happened 8 times over all. So the question is, how old am I now?

Write a Perl program that searches for solutions to this Puzzler. Hint: you might find the string formatting method sprintf useful.

Solution: “Exercise 8-9: Palindromes in Ages”.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset