Now that you know how to do basic math using Tcl, you’re ready to learn how to perform a wide variety of string operations. Tcl has a rich set of commands and functionality for manipulating strings, an unsurprising fact when you consider that Tcl is a string-based programming language. Everything in Tcl is a string, even numbers. This characteristic of the language sometimes takes beginners by surprise because certain operators behave differently, depending on the context in which they are used, which can lead to unexpected results. If I’ve done my job properly, though, you’ll be able to recognize and avoid these gotchas. In this chapter, you will spend some quality time with the string
command, which is the primary Tcl command for working with strings. The final section continues the discussion of Tcl control structures I started in the previous chapter by introducing two looping commands, while
and for
.
To play this chapter’s game, you provide a word that meets specific criteria, such as an adjective, a verb ending in -ing, or a noun, to create what we called Mad Libs when I was growing up. The script takes the words and parts of speech that you provide and plugs them into a story. The result is a silly or nonsense story that is also (hopefully) amusing or at least mildly entertaining. To start the game, execute the script mad_lib.tcl in this chapter’s code directory. Here are the results of one execution:
$ ./mad_lib.tcl
Enter a verb ending in -ing: swimming
Enter a adjective: enormous
Enter a mythical creature: unicorn
Enter a piece of furniture: coffee table
Enter a noun: sink
Enter a past tense verb: yanked
Enter a noun: shovel
Enter a number: 10
One day while I was swimming in my living room, a enormous unicorn fell through the
roof. It jumped on the coffee table and knocked over the sink. Then it ran into the
dining room and yanked a shovel. After 10 minutes of chasing it through the house I
finally caught it and put it outside. It quickly flew away.
Okay, nothing is blowing up, and you’re probably not rolling on the floor laughing. Nonetheless, mad_lib.tcl shows you how to do the following programming tasks:
Repeat a body of Tcl code multiple times.
Find characters in strings.
Find substrings in strings.
Replace one substring with another.
Incorporate user input into your script.
A significant portion of Tcl programming, indeed, of almost any programming, is reading, writing, and manipulating string-based data. This chapter introduces you to a substantial portion of Tcl’s string-handling capabilities. There is a lot to cover in this respect, though too much to stuff into one chapter, so I’ve saved more advanced string-handling functionality for later chapters.
The command you will use most often to work with strings is the aptly named string
command. As of Tcl 8.4, the string
command has 21 options that define all of the operations you can perform with it. The general form of the string
command is:
string option arg ?arg? ...
Each option
accepts at least one argument, arg
, but most take more. For convenience and completeness, Table 4.1 lists each of string
’s options and gives a short description of the option’s purpose.
Table 4.1. string
Options
Option | Description |
---|---|
| Returns the number of bytes required to store a string in memory. |
| Tests two strings for lexicographic equality. |
| Tests two strings for lexicographic equality, returning 1 if the strings are identical, 0 if they are not |
| Returns the index of the first occurrence of a substring. |
| Returns the character that appears at a specified location in a string. |
| Tests whether a string is a member of a given character class. |
| Returns the index of the last occurrence of a substring. |
| Returns the length of a string. |
| Replaces substrings with new values based on key-value pairs. |
| Tests a string for matches against a pattern using shell-style globbing. |
| Returns a substring specified by start and end values. |
| Returns a string repeated a specified number of times. |
| Removes a specified substring or replaces a specified substring with another. |
| Converts a string to all lowercase characters. |
| Converts a string to all uppercase characters. |
| Converts the first character of a string to uppercase. |
| Removes leading and trailing characters that match a specified pattern. |
| Removes leading characters that match a specified pattern. |
| Removes trailing characters that match a specified pattern. |
| Returns the index of the end of the word containing a specified character. |
| Returns the index of the beginning of the word containing a specified character. |
Table 4.1 should give you a good sense of the breadth of Tcl’s string-handling capabilities. I’ll show each option’s syntax diagram and describe each of the options in the following sections. To structure the discussion, I’ve arranged the options into three broad groups based on their function: options for comparing strings, options for getting information about strings, and options for modifying strings.
Comparing one string to another is a common programming task. Typically, you want to see if one string is the same as another (or not), such as validating a user name or password. Another frequent need is testing a string to see if it contains a given character or sequence of characters. For example, you might want to make sure that user input, say, the number of players in a game, contains only numbers and no letters. The string command has three options for comparing strings: compare
, equal
, and match
. In addition, you can use the operators eq
, ne
, ==
, !=
, <
, <=
, >
, and >=
.
Kurt’s First Rule for Comparing Strings: Use compare
, equal
, eq
, and ne
to compare strings. String comparisons almost always occur in an if
, while
, or expr
command. However, using the logical operators (==
, !=
, <
, and >
) is inefficient because of the way that Tcl parses expressions. As you learned in Chapter 3, the expr
command has its own expression evaluator that performs substitutions before the main interpreter performs its substitutions. Recall also that the if
command (and the while
command that you’ll see at the end of this chapter) use the same engine as expr
. When the expression parser encounters one of the logical operators, it converts the operands to numeric values and then converts them back to strings when it detects that a string comparison is being performed. The compare
and equal
options (and the eq
and ne
operators) do not perform these internal conversions because they are designed for use with strings.
The following example, rule.tcl in this chapter’s code directory, illustrates the point:
set hexVal "0xF" set intVal "15" # Use compare, equal, eq, and ne to compare strings if {$hexVal == $intVal} { puts "$hexVal equals $intVal" } else { puts "$hexVal does not equal $intVal" } if {$hexVal eq $intVal} { puts "$hexVal equals $intVal" } else { puts "$hexVal does not equal $intVal" }
If you execute this program, you’ll see this odd result:
$ ./rule.tcl
0xF equals 15
0xF does not equal 15
Since when is “0xF” the same as “15”? The first if
statement compares the variables hexVal
and intVal
using the logical operator ==
. Their values are converted to decimal (integer) numbers, 15 in both cases, and found to be equal. If you intended to compare two strings (by declaring the variables using “” around their values), you would expect this comparison to evaluate to false. The second if
command uses eq
, a synonym for the equal
operator you’ll see in the next section, which prevents the expression evaluator from performing the numeric conversion and, even in the absence of quotes in the if
command, compares the two variables’ values as strings.
The compare
option tests two strings for lexicographic equality, where “equality” means the two strings are the same on a character-by-character basis. Its syntax is:
string compare ?-nocase? ?-length N? string1 string2
string1
and string2
are the strings to compare. By default, the comparison is case-sensitive, so if you want a case-insensitive comparison, specify the -nocase
option. To limit the comparison to the first N
characters, where N
is an integer, specify -length
N
. compare
works the same way as C’s strcmp()
and strncmp()
functions, so it returns -1 if string1
is lexicographically less than string2
, 1 if string1
is lexicographically greater than string2
, and 0 if the two strings are equal. The following script (compare.tcl in this chapter’s code directory) illustrates how compare
works:
puts -nonewline "Enter player name: " flush stdout gets stdin playerName # Test for strict equality (case-sensitive) if {![string compare $playerName "Bubba"]} { puts ""$playerName" is in use." puts -nonewline "Please select another name: " flush stdout gets stdin playerName } puts ""$playerName" successfully registered."
Notice in the last line how I use “” to cause the name entered to appear in quotes in the output. It’s a little ugly to write and to look at, but that’s how you have to do it. Executing the script, you might see the following results:
$ ./compare.tcl Enter player name: Bubba "Bubba" is in use. Please select another name: Kurt "Kurt" successfully registered. $ ./compare.tcl Enter player name: BUBBA "BUBBA" successfully registered.
Entering the name BUBBA foils the point of the code, which is to make sure that the player name Bubba doesn’t get used twice in the same game. This is when the -nocase
argument comes in handy, because it disables case-sensitivity when comparing two strings (see compare_nocase.tcl in this chapter’s code directory):
puts -nonewline "Play again (Y/N): " flush stdout gets stdin choice # Case-insensitive comparison if {![string compare -nocase $choice "y"]} { puts "Excellent! Starting next level." } else { puts "Quitters never win. Exiting." }
compare_nocase.tcl’s output should resemble the following:
$ ./compare_nocase.tcl Play again (Y/N): y Excellent! Starting next level. $ ./compare_nocase.tcl Play again (Y/N): Y Excellent! Starting next level.
This script shows how you can make a script slightly more tolerant of sloppy typing using string compare
’s -nocase
argument. Whether the user types “y” or “Y,” the game will continue (or it will insult the user if “n” or “N” is entered). Modifying comparte.tcl to ignore case is left as an exercise for the reader.
The -length
N
argument enables you to limit the comparison to the first N
characters of the strings being compared. If N
is negative, the -length
argument will be ignored, although I have a hard time imagining a situation in which N
would be negative, except when it is passed a variable whose range might include a negative value.
The equal
option is almost identical to the compare
option (the syntax is identical). The difference between the two is that equal
compares strings for strict equality, returning 1 (true) if the strings are exactly identical or 0 (false) if the strings are not identical. compare
, you will recall, evaluates whether two strings are lexicographically less than, equal to, or greater than one another. The following example, equal.tcl in this chapter’s code directory, rewrites compare.tcl to use equal
:
puts -nonewline "Enter player name: " flush stdout gets stdin playerName # Test for strict equality (case-sensitive) if {[string equal $playerName "Bubba"]} { puts ""$playerName" is in use." puts -nonewline "Please select another name: " flush stdout gets stdin playerName } puts ""$playerName" successfully registered."
Like I said, compare
and equal
have the same syntax; the only difference is the nature of the comparison. As a result, you will most often use the equal
option because it is rare that you need to determine if one string is less than or greater than another.
The eq
operator is a synonym for string equal
and exists to make tests for string easier to read and write and to make such statements look more like other logical operations. For example, string equal
requires the awkward looking expressions in the previous examples, such as string equal $playername "Bubba"
. The eq
operator lets you write the more natural expression $playername eq "Bubba"
. Thus, equal.tcl becomes eq.tcl:
puts -nonewline "Enter player name: " flush stdout gets stdin playerName # Test for strict equality (case-sensitive) if {$playerName eq "Bubba"} { puts ""$playerName" is in use." puts -nonewline "Please select another name: " flush stdout gets stdin playerName } puts ""$playerName" successfully registered."
Using eq
instead of string equal
makes the if
command much easier to scan and understand, in my opinion. Notice that brackets weren’t necessary in this case; I wanted the variable $playerName
to be substituted so the comparison would work. In fact, grouping the conditional expression in the if
command would result in a syntax error because the interpreter would treat the literal string Bubba
as a command.
The match
option compares a string to a pattern and returns 1 if the string matches the pattern and 0 otherwise. The complete syntax is:
string match ?-nocase? pattern string
Where equal
tests for simple equivalence between two strings, match introduces the ability to test for equivalence between pattern
and string
. As usual, string
can be either a literal string or a string variable. Likewise, pattern
can be a literal string or a variable. The difference is that pattern
can contain the wildcard characters *
and ?
. *
represents a sequence of zero or more characters and ?
represents any one character. The UNIX geeks among you will recognize pattern
as a glob.
Consider the pattern alpha*
, which is the literal string alpha
followed by any sequence of zero or more characters. The following list shows a few matching and nonmatching strings:
alphabet—matches
Alphanumeric—doesn’t match (uppercase A)
alpha male—matches
alpha—matches (*
matches a sequence zero or more characters)
alpaca—doesn’t match
lambda nalpha—doesn’t match (*
matches at the end of the string)
Similarly, given the pattern ga?e
, the strings game
, gate
, and gale
match the pattern while the strings gayle
, glare
, and regale
do not. match.tcl demonstrates matches using *
and ?
.
In addition to *
and ?
, you can specify a pattern that consists of a set of characters using the form [chars
], where chars
is a list of characters. chars
can be specified using the format x-y
to indicate a range of consecutive Unicode characters. For example, to see if a one-character string variable input
is an uppercase character, one (inefficient) way to write the test is:
if {[string match {[A-Z]} $input]} { # do something } else { # do something else }
Notice that the expression [A-Z]
is enclosed in braces. If you don’t use the braces, the interpreter will attempt to execute a command named A-Z
and substitute the results into the string match expression. You probably don’t have a command named A-Z (Tcl certainly doesn’t). The braces prevent this substitution.
match
CharactersIf you need to match one of the wildcard characters or the right or left bracket, escape it with a (thus,
*
, ?
, [
, ]
).
Pattern matching using string match
is useful when you need to compare a string to a value that can vary in a regular or systematic way. For example, if you store player scores in files named name.scr, where name is each player’s name, you could use the expression string match "*.scr" $filerName
. Another way to use string match
is to test whether or not a given string contains characters that might be forbidden. For example, to make sure that player names do not contain uppercase letters, you might write the following bit of code (see no_caps.tcl in this chapter’s code directory):
if {[string match {*[A-Z]*} $playerName]} { puts "Your player name cannot contain uppercase letters" }
The pattern *[A-Z]*
matches zero or more characters followed by any single uppercase character followed by zero or more characters. This pattern will match any string that contains a capital letter, regardless of where in the string it occurs.
string
’s match
option gives you a powerful and easy-to-use tool to identify matches that aren’t exact. As you gain experience with Tcl, the situations in which pattern matching is an appropriate solution will be clear.
Although comparing strings to one another is a useful thing to be able to do, it is also one of the least interesting things to do. The string options you learn in this chapter let you find out more about a string, such as how long it is, what character is present at a particular location in the string, what is the first or last character in the string, and what kind of characters the strings contain.
string bytelength string string length string
The bytelength
option returns the length of string
in bytes, whereas the length
option returns the length of the string in characters. A string’s bytelength might not be the same as the number of characters because, as you might remember, Tcl uses Unicode, which can take up to three bytes to represent a character. In this book and in most of your work with Tcl, you will almost always want to use string length
string
, because the situations in which you need to know a string’s length in actual bytes are uncommon. For completeness’ sake, however, length.tcl shows the use of both:
set phrase "®" puts "Length in bytes of phrase: [string bytelength $phrase]" puts "Length in characters of phrase: [string length $phrase]"
The output shows you the difference between the length
and bytelength
options:
$ ./length.tcl
Length in bytes of phrase: 2
Length in characters of phrase: 1
As you can see, the phrase, which translates to fine quality, is only 1 character long (count ’em yourself if you wish), but it requires 2 bytes to store (two bytes per character).
If you want to find out what character is at a given position in a string, use the string index
command. Its complete syntax is:
string index string n
This command returns the character located at position, or index, n
of string
. Index values are 0-based (counted from 0). For example, given the string “dice,” the command string index "dice" 0
returns d
and string index "dice" 3
returns e
(see index.tcl):
set str "dice" puts "The character at index 0 of dice is '[string index $str 0]'" puts "The character at index 1 of dice is '[string index $str 1]'" puts "The character at index 2 of dice is '[string index $str 2]'" puts "The character at index 3 of dice is '[string index $str 3]'"
The output of this script should look just like the following:
$ ./index.tcl
The character at index 0 of dice is 'd'
The character at index 1 of dice is 'i'
The character at index 2 of dice is 'c'
The character at index 3 of dice is 'e'
You can specify the index value n
using an integer, the word end
, or the expression end-
int
, where int
is an integer. If n
is less than 0 or greater than the length of the string, string index
returns the empty string. That’s right. Unlike many programming languages, referring to an invalid string index in Tcl does not generate an error. The end-
int
syntax for specifying an index makes it trivial to iterate over a string in reverse (that is, to perform an operation on a string starting from its last character and ending at its first). You don’t know how to loop over a string in this way (yet!—see “Iterative Loops: The for Command” later in this chapter), but trust me, it’s a common operation, so you’ll appreciate having a brain-dead easy syntax for doing it.
The first
and last
options make it possible to find the index value of the first and last occurrences, respectively, of a substring in a string. Their complete syntax is:
string first substr str ?start? string last substr str ?end?
string first
searches for the first occurrence of the substring substr
in the string str
and returns the index of the first letter of substr
. string last
, similarly, returns the index of the first letter of the last occurrence of substr
in str
. If the specified substr
is not found, both options return -1. By default, string first
’s search starts at index 0 of str
; if you specify start
, the search will start at that index rather than at index 0. string last
’s optional argument, end
, lets you specify the ending index of the search, meaning that it will only look for substr
between index 0 and the index specified by last
.
substr.tcl in this chapter’s code directory illustrates how to use string first
and string last
. The example is short because it is incomplete. I’m going to build on it in the next two sections.
# Original sentence set old "He was ?verbing? his wife's hair." set start [string first "?" $old] set end [string last "?" $old] puts "start = $start" puts "end = $end"
This script might serve as the start of a routine for performing a search-and-replace operation. The first step is to search for some text. The assumption in this example is that the text you want to replace is surrounded by ? characters. I use string first
and string last
to find the index position of the ? characters and then display those indices:
$ ./substr.tcl
start = 7
end = 15
Remember that index values are zero-based, so ? appears at positions 7 and 15, not 8 and 16 as you might expect. If you were writing a search-and-replace procedure, your next step would be to replace the “found” text with something new, which is precisely what the string replace
command does.
The range
option returns a range of characters, that is, a substring, specified by start and end index values:
string range str start end
string range
returns the substring that begins at position start
and ends at position end
from the string str
.
If you’re thinking that the start
and end
arguments look an awful lot like the return values from string first
and string last
, you’d be spot on. In fact, this is a good example of how you’d use Tcl’s command nesting. range.tcl builds on substr.tcl from the previous section to extract a ?-delimited substring from another string:
# Original sentence set old "He was ?verbing? his wife's hair." # Get the starting and end points set start [string first "?" $old] set end [string last "?" $old] # Extract the substring set substr [string range $old $start $end] puts "substring is $substr"
The output is what you’d expect, the ?-delimited substring:
$ ./range.tcl
substring is ?verbing?
If you want to use Tcl’s ability to nest commands, you could rewrite this script as shown in the following example (range_nested.tcl in this chapter’s code directory):
# Original sentence set old "He was ?verbing? his wife's hair." # Extract the substring set substr [string range $old [string first "?" $old] [string last "?" $old]]; puts "substring is $substr";
The output is identical to the previous example. You can decide for yourself which model you prefer, the sequential method that limits nested commands (illustrated in range.tcl) or the more, um, “Tcl-ish” method that relies upon and takes advantage of command nesting (illustrated in range_nested.tcl). Tcl beginners find code written in the iterative or sequential mode easier to read, but using nested commands results in more idiomatic Tcl. Indeed, the more experienced you become with Tcl, you might find that using nested commands becomes a more natural way to write Tcl code.
The string replace
command completes the search-and-replace set of commands you’ve been exploring in the last few sections. Its complete syntax is:
string replace str start end ?newstr?
This command removes the substring between and including the indices start
and end
from the string specified by str
. If you include the optional argument newstr
, the removed text will be replaced with the string specified by newstr
. replace.tcl in this chapter’s code directory illustrates replacing text using string replace
.
# Source sentence set old "He was ?verbing? his wife's hair with a ?noun?." puts "Old sentence: $old" # Find this set verb "?verbing?" # Replace with this set newVerb "washing" # Get the verb's starting and ending positions set start [string first "?" $old] set end [string first "?" $old [expr $start + 1]] # Replace and display puts "New sentence: [string replace $old $start $end $newVerb]"
This script replaces the string ?verbing?
with the string washing. Notice in the fourth block of code that I use string first
twice. Why? Because string last
returns the index of the last occurrence of the search string. Using string first
with the optional start
argument lets me reset the starting point of the search. The expression set start [string first "?" $old]
found the index of the first ?
. The nested expr
command, [expr $start + 1]
, sets the starting point of the next search to the character that follows the first ?
. This adjustment is necessary because the optional start
argument for string first (remember, the syntax is string first
substr str ?start?
) begins the search at start
. If I hadn’t incremented the starting index, the second string first
command would have returned the position of the first ?
instead of the second one.
The last command actually performs the replacement and displays the result. Here’s the output of this script:
$ ./replace.tcl
Old sentence: He was ?verbing? his wife's hair with a ?noun?.
New sentence: He was washing his wife's hair with a ?noun?.
I’ll leave replacing ?noun?
with something else as an exercise for you. As a hint, you can simplify the code if you save the modified sentence produced in replace.tcl.
The is
option, that is, the string is
command, enables you to test whether or not a given string belongs to a character class. A character class is a named group of characters that serves as a shorthand notation for the range operator, [
charlist
]
, introduced earlier in the chapter. For example, the character range for all lowercase characters is specified [a-z]
using the range operator. The corresponding character class is lower
.
In addition to serving as a shorthand notation, character classes are more general than sets specified using the range operator because character classes are defined over the Unicode character set. At this book’s beginning level, the fact that character classes are Unicode-aware won’t make a lot of difference. However, if you write a runaway hit game using Tcl and Tk and it gets translated to, say, Tamil, you’ll be happy to know that at least the code that uses character classes rather than hand-coded character ranges will work as intended and with no modifications.
The syntax for string is
is:
string is class ?-strict? ?-failindex varname? str
class
can be any of the classes listed in Table 4.2 and str
is the string to test. If str
is a member of class
, string is
returns 1; otherwise, it returns 0. The empty string, “”, is regarded as a member of all character classes unless you specify the -strict
option, in which case the empty string is a member of no character class. If a string isn’t a member of a given character class, you can specify -failindex
varname
to have Tcl save the index at which str
fails the comparison to the desired character class. Before you see an example, review the list of possible character classes, shown in Table 4.2.
Table 4.2. Tcl Character Classes
Class | Description |
---|---|
| Any Unicode alphabetic character or digit. |
| Any Unicode alphabetic character. |
| Any character in the ASCII character set (7-bit characters). |
| Any of the forms used for Boolean values. |
| Any Unicode control character. |
| Any Unicode digit. |
| Any of the forms used to represent double values. |
| Any of the forms used for Boolean values that evaluate to false. |
| Any Unicode printing character, except a space. |
| Any of the forms used to represent integer values. |
| Any lowercase Unicode alphabetic character. |
| Any Unicode printing character, including space. |
| Any Unicode space character. |
| Any of the forms used for Boolean values that evaluate to true. |
| Any uppercase Unicode alphabetic character. |
| Any Unicode word character. |
| Any hexadecimal digit character. |
As you can see from this table, there’s a character class for almost every need you might have. A notable exception is octal digits (that is, digits in the base-8 number system). You can see the string is
command at work in the following example, which tests the Japanese character for membership in each of the classes listed in Table 4.2:
proc TestClass {str class} { if {[string is $class $str]} { set msg "$str is in class '$class'" } else { set msg "$str is not in class '$class'" } puts $msg } set symbol "®" TestClass $symbol alnum TestClass $symbol alpha TestClass $symbol ascii TestClass $symbol boolean TestClass $symbol control TestClass $symbol digit TestClass $symbol double TestClass $symbol false TestClass $symbol graph TestClass $symbol integer TestClass $symbol lower TestClass $symbol print TestClass $symbol space TestClass $symbol true TestClass $symbol upper TestClass $symbol wordchar TestClass $symbol xdigit
In is.tcl, I use a procedure named TestClass
to perform the actual test, passing the procedure of the string I want to test and the character class name against which I want to test. Using the TestClass
procedure makes writing the rest of the script a lot easier, because the balance of the script is a bunch of calls to TestClass
for each class that interests me. The output of this script should resemble the following:
$ ./is.tcl
® is not in class 'alnum'
® is not in class 'alpha'
® is not in class 'ascii'
® is not in class 'boolean'
® is not in class 'control'
® is not in class 'digit'
® is not in class 'double'
® is not in class 'false'
® is in class 'graph'
® is not in class 'integer'
® is not in class 'lower'
® is in class 'print'
® is not in class 'space'
® is not in class 'true'
® is not in class 'upper'
® is not in class 'wordchar'
® is not in class 'xdigit'
As you can see, the character ® is a member of the graph
and print
classes and not a member of the others.
While it’s very interesting and even useful to know if a character is a member of a given character class or where in a string a substring appears, it’s even more useful to know how to slice, dice, and julienne strings.
The simplest string modification is likely repeating a string. Thus, we have the aptly named string repeat
command:
string repeat str count
string repeat
repeats the string str count
times. It is much easier to write, for example:
puts [string repeat "*" 50]
than it is to write:
puts "**************************************************"
Both commands print 50 asterisks, but guess which one is easier to type?
Another frequently used operation is modifying the case of a string. Tcl’s string command supports three options for doing so: changing a string to all uppercase (using string toupper
), changing a string to all lowercase (using string tolower
), and changing a string to sentence case (using the inaccurately named string totitle
). Each of the three options shares a common syntax:
string toupper str ?start? ?end? string tolower str ?start? ?end? string totitle str ?start? ?end?
In each case, the string specified by str
will be returned with all characters modified appropriate to the option requested. By default, the entire string is modified; start
and end (which are both integral values) specify alternative starting and stopping index values. If you specify start
, the modification begins at that index; if you specify end
, the modification stops at that index.
For example, given the deliberately perverse string “yOuR gUeSs MuSt Be BeTwEeN 1 aNd 20: “, case.tcl in this chapter’s code directory shows how toupper
, tolower
, and totitle
modify it:
set str "yOuR gUeSs MuSt Be BeTwEeN 1 aNd 20: " puts "toupper: [string toupper $str]" puts "tolower: [string tolower $str]" puts "totitle: [string totitle $str]"
When you execute the script, the output darn well better look like the following:
$ ./case.tcl
toupper: YOUR GUESS MUST BE BETWEEN 1 AND 20:
tolower: your guess must be between 1 and 20:
totitle: Your guess must be between 1 and 20:
Like I wrote, the totitle
option seems misnamed because it doesn’t render what I consider “title case,” capitalizing the first letter of each word. Rather, it capitalizes the first letter of the target string and lowercases the rest. However, it’s named totitle
so that’s what we have to use. You’re free to write your own ToTitle
command if you want, of course.
Trimming strings refers to deleting unwanted characters from the beginning or end of strings. Tcl’s string trimming commands, string trimleft
, string trimright
, and string trim
, are usually used to remove unwanted white space from the beginning or end of user input (those darn users will type anything!) The syntax of these commands is:
string trimleft str ?chars? string trimright str ?chars? string trim str ?chars?
str
is the string to trim. By default, white space (spaces, tabs, newlines, and carriage returns) will be removed. If specified, chars
defines a set of one or more characters that should be removed from str
. As their names suggest, trimleft
returns str
with characters removed from the left end; trimright
returns str
with characters removed from the right end; and trim
returns str
with characters removed from the left and right ends. If str
doesn’t contain any of the characters listed in chars
, str
will be returned unmolested.
String operations are nondestructive in that they do not modify their string arguments. All of the string operations discussed in this chapter return a new string that reflects the changes performed; the original or source string is left alone. This feature is a direct result of Tcl’s programming model (grouping and command substitution) and enables you to use the results of string operations without worrying about your source data being modified in some inscrutable fashion. It also means that you must explicitly use the set
command to assign the results of string operations to variables if you want to keep those results for later use.
Trimming strings is uncomplicated, so I won’t discuss it further here. Nevertheless, the script trim.tcl in this chapter’s code directory demonstrates the usage of all three string-trimming options.
Up to now, if you wanted to add text to a string variable, you would use the set
command:
set label "Player Name: " set label "$label Kurt Wall" puts $label
This approach is functional, but is not the most efficient way to build up a long variable. The easy, efficient way is to use the append
command. For example, the previous two set
commands are equivalent to the following command:
append label "Player Name:" "Kurt Wall"
append
’s syntax is:
append var value ?...?
append
tacks each value
on to the end of the variable specified by var
. If var
doesn’t exist, its value will be the concatenation of each value
specified. Unlike the various string
commands discussed in this chapter, append
modifies the value of var
. It also returns the modified string. The reason that append
is more efficient than multiple set
commands is that append
uses Tcl’s internal memory manager to extend the variable being assigned, whereas set
takes a more roundabout approach. I’ll prove this to you in the next section when you learn how to use the for
command to write an iterative loop.
In the previous chapter, I introduced the notion of control structures, which allow you to write scripts that do more than execute sequentially from the first to the last line of the script. In particular, I showed you how to use the conditional execution command, if
. In addition to conditional execution, Tcl also supports a number of commands for looping, or executing the same command or set of commands multiple times. I’ll cover two of them in this chapter, while
and for
. The while
command creates a loop that executes as long as, or while, a Boolean test expression evaluates to true. When the test expression evaluates to false, control exits the loop and continues with the command immediately following the while
command. The for
command creates an iterative loop, that is, a loop that executes a fixed number of times and then terminates (again, with control passing to the command immediately following the for
command).
Loops that use while
are sometimes referred to as indeterminate loops because you don’t know how many times they will execute, only that they will (hopefully) eventually terminate when their test condition evaluates to false. The syntax of the while
command is:
while {test} {body}
test
is a Boolean expression (an expression that has a Boolean result). When the loop starts, test
is evaluated; if it is true, the command or commands in body
execute. Otherwise, body
is skipped and execution resumes with the command immediately following the while
command. After each pass through body
, test
is re-evaluated. If test
is still true, body
will execute; otherwise, the loop terminates and execution resumes with the command immediately following the while
command.
Strictly speaking, the braces I used in the syntax diagram aren’t required. However, test
will almost always need to be enclosed in braces because you need to protect its condition from premature substitution. If you don’t use braces, the likely result is either an infinite loop (a loop that never terminates) or a loop that never executes at all. The braces are usually necessary because, without them, Tcl interpreter will substitute the value in the test
condition before the while command evaluates it. Using braces around the test
condition prevents premature substitution. I suggest enclosing the body of the while
loop in braces as well. Until you are much more confident of your ability to predict how substitution and grouping will behave, enclosing the body command(s) in braces will result in fewer surprises and unpleasant side effects.
The following script, while.tcl in this chapter’s code directory, offers a useful illustration of how while
loops work:
set lineCnt 0 set charCnt 0 while {[gets stdin line] >= 0} { incr lineCnt incr charCnt [string length $line] } puts "Read $lineCnt lines" puts "Read $charCnt characters"
This simple script reads input typed at the keyboard (or redirected from another file). Each time it encounters a newline, it increments the variable lineCnt
by 1 and the variable charCnt
by the number of characters in the line. When it encounters EOF (end-of-file), it drops out of the loop and displays the number of lines and number of characters read.
$ ./while.tcl < while.tcl
Read 13 lines
Read 229 characters
Recall that gets
returns -1 when it reads EOF—that means that the test condition [gets std line] >= 0
will return evaluate to true as long as gets
receives valid input. When gets
sees EOF in the input stream, the test condition evaluates to false and the loop terminates.
The Linux- or UNIX-using reader (and the obsessive-compulsive reader who counts everything) will notice that while.tcl actually has 242 characters:
$ wc -c while.tcl 242 while.tcl
So, why does while.tcl say that it only has 229? Because gets
discards the newline. Accordingly, if you think that newlines should also be counted, change the last line of while.tcl to the following:
puts "Read [expr $charCnt + $lineCnt] characters"
I cheated by introducing a command you haven’t seen yet, incr
. Briefly, incr
increments (hence the name) the value of a variable. incr
’s virtue is that it is easier to write than set someVar [expr someVar + someValue]
. incr
’s syntax is simple:
incr var ?unit?
By default, incr
increments var
, which must be an integer variable, by 1. If you specify unit
, which must also be an integer value (or an expression that evaluates to an integer value, as in while.tcl), unit
will be added to var
. Yes, unit
can be a negative integer, which would have the effect of decrementing var
. No, there isn’t a separate command decr
used to decrement a variable, although you could certainly write one if you have a rage for order and symmetry.
The for
command enables you to execute one or more commands a fixed number of times, or iterations. Hence, for
loops are often referred to as iterative loops. Its syntax is:
for {start} {test} {next} {body}
Again, the braces shown in the syntax diagram aren’t required, but I recommend using them to preserve your sanity. start
is an expression that initializes a loop counter, the variable that controls how many times the loop executes. test
is a Boolean expression that controls whether or not the command(s) in body
will be executed by testing the loop counter against the terminating condition, the value at which the loop exits. next
is an expression that increments the loop counter.
When a for
loop starts, the expression in start
is executed, which sets the initial value of the loop counter. Then the expression in test
is evaluated. test
usually includes the loop counter, but it doesn’t have to. If test
evaluates to false, the for
loop will be skipped, and execution resumes with the command immediately following the for
command. Otherwise, the command(s) in body
will be executed. The next
expression is evaluated after the last command in body
. next
increments or decrements or otherwise modifies the loop counter so that the for
loop eventually terminates. After the next
expression is executed, the test
condition is evaluated. If test
evaluates to false, the loop terminates and control passes to the command immediately following the for
command. If test
evaluates to true, body
will be executed, followed by the next
expression. Wash. Rinse. Repeat.
Confused? The following script (for.tcl in this chapter’s code directory) should help:
for {set i 1} {$i <= 10} {incr i} { puts "Loop counter: $i" }
This script increments the value of a loop counter variable, i
, and displays that value. In terms of the syntax diagram I showed at the beginning of this section:
The start
condition is set i 1
.
The test
condition is $i <= 10
.
The next
expression is incr i
, which increments the value of i
by 1 on each pass through the loop.
The body
command is puts "Loop counter: $i"
.
The body of the loop executes for each value of i
that is less than or equal to 10. The runtime behavior should be unsurprising:
$ ./for.tcl
Loop counter: 1
Loop counter: 2
Loop counter: 3
Loop counter: 4
Loop counter: 5
Loop counter: 6
Loop counter: 7
Loop counter: 8
Loop counter: 9
Loop counter: 10
You’ll use for
loops quite a bit in your scripts because for
is an easy, natural way to create loops that need to execute a fixed number of times. You’ll learn yet another looping construct, foreach
, in the next chapter.
As you’ll see in the “Looking at the Code” section, mad_lib.tcl doesn’t use all of the commands you learned in this chapter. It does illustrate key commands and gives you a fertile base for further experimentation.
#!/usr/bin/tclsh # mad_lib.tcl # Demonstrate string manipulation # Block 1 # The source sentence set line "One day while I was ?verb ending in -ing? in my living room, " append line "a ?adjective? ?mythical creature? fell through the roof. " append line "It jumped on the ?piece of furniture? and knocked over the " append line "?noun?. Then it ran into the dining room and ?past tense verb? " append line "a ?noun?. After ?number? minutes of chasing it through the " append line "house I finally caught it and put it outside. It quickly " append line "flew away." # Block 2 while {[string first "?" $line] != -1} { # Block 2a # Find the next ??-enclosed word or phrase set start [string first "?" $line] set end [string first "?" $line [expr $start + 1]] # Block 2b # Extract the text between the ?? set prompt [string range $line [expr $start + 1] [expr $end–1]] # Block 2c # Display the prompt and get the user's input puts -nonewline "Enter a $prompt: " flush stdout gets stdin input # Block 2d # Update the sentence set line [string replace $line $start $end $input] } # Block 3 # Print the completed mad lib puts $line
The code in Block 1 just sets up the sentence that the rest of the script will be modifying. What I’ve done is delimit text I want to replace with ? characters. This makes it easy to find the text and replace it with the input provided by the user. The other salient point in this block of code is that I’m following my own advice and using append
rather than set
to build up the string. Block 3 is nothing new; it just displays the completed mad lib.
Block 2, which I’ve subdivided into Blocks 2a through 2d, is where the real work gets done. The test in the while
loop provides the terminating condition. Recall that string first
returns -1 if it doesn’t find a specified substring. In this case, once I’ve replaced all the ?-delimited text, there will be no more ? characters for string first "?" $line
to find, so the command will return -1, the test condition will evaluate to false, and control will drop out of the loop and display the completed silly sentences.
The first step is to find text enclosed in the delimiters, which is handled by Block 2a. I use the same string first
technique that I described in replace.tcl earlier in the chapter. Once I’ve found the starting and ending points, which needs to include the delimiters, I save them in the aptly named start
and end
variables because I’m going to need these values several times.
Block 2b extracts the text, without the ? delimiters, which gives me a ready-made prompt to display to the user. To drop the leading ?, I increment one character into the substring. Similarly, to get rid of the trailing ?, I decrement the ending index value by one character. As I’ve suggested before, Tcl’s ability, even affinity for, nested commands makes this kind of operation easy to express in code, albeit potentially hard to read for Tcl neophytes. However, once you’ve become familiar with this particular idiom, it will become a natural way to write code.
Block 2c uses the prompt extracted in Block 2b to ask the user to enter a particular word or phrase. The technique for reading user input is the same one I introduced in the previous chapter, so it should look familiar. Whatever the user types gets stored in the variable named input
.
In Block 2d, finally, I use the string replace
command to replace the ?-delimited text with the word the user typed. At this point, control returns to the top of the while
loop, the test condition is evaluated again, and, if it’s true, control reenters the loop body. If the test condition is false, control passes to Block 3, and I reveal the completed silly story.
Here are some exercises you can try to practice what you learned in this chapter.
4.1 Modify mad_lib.tcl to use a different delimiter in the source string so that the source string can include ? characters.
4.2 Modify Block 2b of mad_lib.tcl to use another method to extract the prompt. Hint: All you’re really doing is stripping off leading and trailing characters.
4.3 Modify Block 3 of mad_lib.tcl to format the output such that words don’t break across lines. That is, make the printed mad lib fit into lines of approximately 75 characters.
Working with strings is an essential component of most Tcl programs, and Tcl is well-equipped for dealing with strings. In fact, Tcl has such a rich set of commands for dealing with strings that you might not be sure which one to use in a given situation. You can compare strings for equality and for membership in a certain character class. You can also find out how long strings are. Tcl also allows you to find where in a string a certain character or substring of characters is located and, if you need to do so, Tcl even has a command for replacing one substring with another. Miscellaneous functions, such as removing unwanted characters from the ends of a string and changing a string’s case, round out the basic string functionality. I’ll introduce additional string-handling capabilities in later chapters, but first, you’re going to learn about another Tcl strong point, lists.