7. Strings

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 7. Strings

The next major type on our built-in object tour is the Python string—an ordered collection of characters used to store and represent text-based information. We looked briefly at strings in Chapter 4. Here, we will revisit them in more depth, filling in some of the details we skipped then.

From a functional perspective, strings can be used to represent just about anything that can be encoded as text: symbols and words (e.g., your name), contents of text files loaded into memory, Internet addresses, Python programs, and so on. They can also be used to hold the absolute binary values of bytes, and multibyte Unicode text used in internationalized programs.

You may have used strings in other languages, too. Python’s strings serve the same role as character arrays in languages such as C, but they are a somewhat higher-level tool than arrays. Unlike in C, in Python, strings come with a powerful set of processing tools. Also unlike languages such as C, Python has no distinct type for individual characters; instead, you just use one-character strings.

Strictly speaking, Python strings are categorized as immutable sequences, meaning that the characters they contain have a left-to-right positional order and that they cannot be changed in-place. In fact, strings are the first representative of the larger class of objects called sequences that we will study here. Pay special attention to the sequence operations introduced in this chapter, because they will work the same on other sequence types we’ll explore later, such as lists and tuples.

Table 7-1 previews common string literals and operations we will discuss in this chapter. Empty strings are written as a pair of quotation marks (single or double) with nothing in between, and there are a variety of ways to code strings. For processing, strings support expression operations such as concatenation (combining strings), slicing (extracting sections), indexing (fetching by offset), and so on. Besides expressions, Python also provides a set of string methods that implement common string-specific tasks, as well as modules for more advanced text-processing tasks such as pattern matching. We’ll explore all of these later in the chapter.

Table 7-1. Common string literals and operations

Operation	Interpretation
`S = ''`	Empty string
`S = "spam's"`	Double quotes, same as single
`S = 's p ax00m'`	Escape sequences
`S = """..."""`	Triple-quoted block strings
`S = r' empspam'`	Raw strings
`S = b'spam'`	Byte strings in 3.0 (Chapter 36)
`S = u'spam'`	Unicode strings in 2.6 only (Chapter 36)
`S1 + S2` `S * 3`	Concatenate, repeat
`S[i]` `S[i:j]` `len(S)`	Index, slice, length
`"a %s parrot" % kind`	String formatting expression
`"a {0} parrot".format(kind)`	String formatting method in 2.6 and 3.0
`S.find('pa')` `S.rstrip()` `S.replace('pa', 'xx')` `S.split(',')` `S.isdigit()` `S.lower()` `S.endswith('spam')` `'spam'.join(strlist)` `S.encode('latin-1')`	String method calls: search, remove whitespace, replacement, split on delimiter, content test, case conversion, end test, delimiter join, Unicode encoding, etc.
`for x in S: print(x)` `'spam' in S` `[c * 2 for c in S]` `map(ord, S)`	Iteration, membership

Beyond the core set of string tools in Table 7-1, Python also supports more advanced pattern-based string processing with the standard library’s re (regular expression) module, introduced in Chapter 4, and even higher-level text processing tools such as XML parsers, discussed briefly in Chapter 36. This book’s scope, though, is focused on the fundamentals represented by Table 7-1.

To cover the basics, this chapter begins with an overview of string literal forms and string expressions, then moves on to look at more advanced tools such as string methods and formatting. Python comes with many string tools, and we won’t look at them all here; the complete story is chronicled in the Python library manual. Our goal here is to explore enough commonly used tools to give you a representative sample; methods we won’t see in action here, for example, are largely analogous to those we will.

Note

Content note: Technically speaking, this chapter tells only part of the string story in Python—the part most programmers need to know. It presents the basic str string type, which handles ASCII text and works the same regardless of which version of Python you use. That is, this chapter intentionally limits its scope to the string processing essentials that are used in most Python scripts.

From a more formal perspective, ASCII is a simple form of Unicode text. Python addresses the distinction between text and binary data by including distinct object types:

In Python 3.0 there are three string types: str is used for Unicode text (ASCII or otherwise), bytes is used for binary data (including encoded text), and bytearray is a mutable variant of bytes.
In Python 2.6, unicode strings represent wide Unicode text, and str strings handle both 8-bit text and binary data.

The bytearray type is also available as a back-port in 2.6, but not earlier, and it’s not as closely bound to binary data as it is in 3.0. Because most programmers don’t need to dig into the details of Unicode encodings or binary data formats, though, I’ve moved all such details to the Advanced Topics part of this book, in Chapter 36.

If you do need to deal with more advanced string concepts such as alternative character sets or packed binary data and files, see Chapter 36 after reading the material here. For now, we’ll focus on the basic string type and its operations. As you’ll find, the basics we’ll study here also apply directly to the more advanced string types in Python’s toolset.

String Literals

By and large, strings are fairly easy to use in Python. Perhaps the most complicated thing about them is that there are so many ways to write them in your code:

Single quotes: 'spa"m'
Double quotes: "spa'm"
Triple quotes: '''... spam ...''', """... spam ..."""
Escape sequences: "s p am"
Raw strings: r"C: ew est.spm"
Byte strings in 3.0 (see Chapter 36): b'spx01am'
Unicode strings in 2.6 only (see Chapter 36): u'eggsu0020spam'

The single- and double-quoted forms are by far the most common; the others serve specialized roles, and we’re postponing discussion of the last two advanced forms until Chapter 36. Let’s take a quick look at all the other options in turn.

Single- and Double-Quoted Strings Are the Same

Around Python strings, single and double quote characters are interchangeable. That is, string literals can be written enclosed in either two single or two double quotes—the two forms work the same and return the same type of object. For example, the following two strings are identical, once coded:

>>> 'shrubbery', "shrubbery"
('shrubbery', 'shrubbery')

The reason for supporting both is that it allows you to embed a quote character of the other variety inside a string without escaping it with a backslash. You may embed a single quote character in a string enclosed in double quote characters, and vice versa:

>>> 'knight"s', "knight's"
('knight"s', "knight's")

Incidentally, Python automatically concatenates adjacent string literals in any expression, although it is almost as simple to add a + operator between them to invoke concatenation explicitly (as we’ll see in Chapter 12, wrapping this form in parentheses also allows it to span multiple lines):

>>> title = "Meaning " 'of' " Life"        # Implicit concatenation
>>> title
'Meaning of Life'

Notice that adding commas between these strings would result in a tuple, not a string. Also notice in all of these outputs that Python prefers to print strings in single quotes, unless they embed one. You can also embed quotes by escaping them with backslashes:

>>> 'knight's', "knight"s"
("knight's", 'knight"s')

To understand why, you need to know how escapes work in general.

Escape Sequences Represent Special Bytes

The last example embedded a quote inside a string by preceding it with a backslash. This is representative of a general pattern in strings: backslashes are used to introduce special byte codings known as escape sequences.

Escape sequences let us embed byte codes in strings that cannot easily be typed on a keyboard. The character , and one or more characters following it in the string literal, are replaced with a single character in the resulting string object, which has the binary value specified by the escape sequence. For example, here is a five-character string that embeds a newline and a tab:

>>> s = 'a
b	c'

The two characters stand for a single character—the byte containing the binary value of the newline character in your character set (usually, ASCII code 10). Similarly, the sequence is replaced with the tab character. The way this string looks when printed depends on how you print it. The interactive echo shows the special characters as escapes, but print interprets them instead:

>>> s
'a
b	c'
>>> print(s)
a
b       c

To be completely sure how many bytes are in this string, use the built-in len function—it returns the actual number of bytes in a string, regardless of how it is displayed:

>>> len(s)
5

This string is five bytes long: it contains an ASCII a byte, a newline byte, an ASCII b byte, and so on. Note that the original backslash characters are not really stored with the string in memory; they are used to tell Python to store special byte values in the string. For coding such special bytes, Python recognizes a full set of escape code sequences, listed in Table 7-2.

Table 7-2. String backslash characters

Escape	Meaning
`newline`	Ignored (continuation line)
`\`	Backslash (stores one )
`'`	Single quote (stores `'`)
`"`	Double quote (stores `"`)
`a`	Bell
	Backspace
`f`	Formfeed
	Newline (linefeed)
	Carriage return
	Horizontal tab
`v`	Vertical tab
`x``hh`	Character with hex value `hh` (at most 2 digits)
`ooo`	Character with octal value `ooo` (up to 3 digits)
	Null: binary 0 character (doesn’t end string)
`N{ id }`	Unicode database ID
`u``hhhh`	Unicode 16-bit hex
`U``hhhhhhhh`	Unicode 32-bit hex^[a]
`other`	Not an escape (keeps both and `other`)
^[a]The `U``hhhh``...` escape sequence takes exactly eight hexadecimal digits (`h`); both `u` and `U` can be used only in Unicode string literals.

Some escape sequences allow you to embed absolute binary values into the bytes of a string. For instance, here’s a five-character string that embeds two binary zero bytes (coded as octal escapes of one digit):

>>> s = 'abc'
>>> s
'ax00bx00c'
>>> len(s)
5

In Python, the zero (null) byte does not terminate a string the way it typically does in C. Instead, Python keeps both the string’s length and text in memory. In fact, no character terminates a string in Python. Here’s a string that is all absolute binary escape codes—a binary 1 and 2 (coded in octal), followed by a binary 3 (coded in hexadecimal):

>>> s = '0102x03'
>>> s
'x01x02x03'
>>> len(s)
3

Notice that Python displays nonprintable characters in hex, regardless of how they were specified. You can freely combine absolute value escapes and the more symbolic escape types in Table 7-2. The following string contains the characters “spam”, a tab and newline, and an absolute zero value byte coded in hex:

>>> S = "s	p
ax00m"
>>> S
's	p
ax00m'
>>> len(S)
7
>>> print(S)
s       p
a m

This becomes more important to know when you process binary data files in Python. Because their contents are represented as strings in your scripts, it’s OK to process binary files that contain any sorts of binary byte values (more on files in Chapter 9).^[17]

Finally, as the last entry in Table 7-2 implies, if Python does not recognize the character after a as being a valid escape code, it simply keeps the backslash in the resulting string:

>>> x = "C:pycode"           # Keeps  literally
>>> x
'C:\py\code'
>>> len(x)
10

Unless you’re able to commit all of Table 7-2 to memory, though, you probably shouldn’t rely on this behavior.^[18] To code literal backslashes explicitly such that they are retained in your strings, double them up (\ is an escape for one ) or use raw strings; the next section shows how.

Raw Strings Suppress Escapes

As we’ve seen, escape sequences are handy for embedding special byte codes within strings. Sometimes, though, the special treatment of backslashes for introducing escapes can lead to trouble. It’s surprisingly common, for instance, to see Python newcomers in classes trying to open a file with a filename argument that looks something like this:

myfile = open('C:
ew	ext.dat', 'w')

thinking that they will open a file called text.dat in the directory C: ew. The problem here is that is taken to stand for a newline character, and is replaced with a tab. In effect, the call tries to open a file named C:(newline)ew(tab)ext.dat, with usually less than stellar results.

This is just the sort of thing that raw strings are useful for. If the letter r (uppercase or lowercase) appears just before the opening quote of a string, it turns off the escape mechanism. The result is that Python retains your backslashes literally, exactly as you type them. Therefore, to fix the filename problem, just remember to add the letter r on Windows:

myfile = open(r'C:
ew	ext.dat', 'w')

Alternatively, because two backslashes are really an escape sequence for one backslash, you can keep your backslashes by simply doubling them up:

myfile = open('C:\new\text.dat', 'w')

In fact, Python itself sometimes uses this doubling scheme when it prints strings with embedded backslashes:

>>> path = r'C:
ew	ext.dat'
>>> path                          # Show as Python code
'C:\new\text.dat'
>>> print(path)                   # User-friendly format
C:
ew	ext.dat
>>> len(path)                     # String length
15

As with numeric representation, the default format at the interactive prompt prints results as if they were code, and therefore escapes backslashes in the output. The print statement provides a more user-friendly format that shows that there is actually only one backslash in each spot. To verify this is the case, you can check the result of the built-in len function, which returns the number of bytes in the string, independent of display formats. If you count the characters in the print(path) output, you’ll see that there really is just 1 character per backslash, for a total of 15.

Besides directory paths on Windows, raw strings are also commonly used for regular expressions (text pattern matching, supported with the re module introduced in Chapter 4). Also note that Python scripts can usually use forward slashes in directory paths on Windows and Unix because Python tries to interpret paths portably (i.e., 'C:/new/text.dat' works when opening files, too). Raw strings are useful if you code paths using native Windows backslashes, though.

Note

Despite its role, even a raw string cannot end in a single backslash, because the backslash escapes the following quote character—you still must escape the surrounding quote character to embed it in the string. That is, r"..." is not a valid string literal—a raw string cannot end in an odd number of backslashes. If you need to end a raw string with a single backslash, you can use two and slice off the second (r'1 b c'[:-1]), tack one on manually (r'1 b c' + ''), or skip the raw string syntax and just double up the backslashes in a normal string ('1\nb\tc'). All three of these forms create the same eight-character string containing three backslashes.

Triple Quotes Code Multiline Block Strings

So far, you’ve seen single quotes, double quotes, escapes, and raw strings in action. Python also has a triple-quoted string literal format, sometimes called a block string, that is a syntactic convenience for coding multiline text data. This form begins with three quotes (of either the single or double variety), is followed by any number of lines of text, and is closed with the same triple-quote sequence that opened it. Single and double quotes embedded in the string’s text may be, but do not have to be, escaped—the string does not end until Python sees three unescaped quotes of the same kind used to start the literal. For example:

>>> mantra = """Always look
...  on the bright
... side of life."""
>>>
>>> mantra
'Always look
 on the bright
side of life.'

This string spans three lines (in some interfaces, the interactive prompt changes to ... on continuation lines; IDLE simply drops down one line). Python collects all the triple-quoted text into a single multiline string, with embedded newline characters () at the places where your code has line breaks. Notice that, as in the literal, the second line in the result has a leading space, but the third does not—what you type is truly what you get. To see the string with the newlines interpreted, print it instead of echoing:

>>> print(mantra)
Always look
 on the bright
side of life.

Triple-quoted strings are useful any time you need multiline text in your program; for example, to embed multiline error messages or HTML or XML code in your source code files. You can embed such blocks directly in your scripts without resorting to external text files or explicit concatenation and newline characters.

Triple-quoted strings are also commonly used for documentation strings, which are string literals that are taken as comments when they appear at specific points in your file (more on these later in the book). These don’t have to be triple-quoted blocks, but they usually are to allow for multiline comments.

Finally, triple-quoted strings are also sometimes used as a “horribly hackish” way to temporarily disable lines of code during development (OK, it’s not really too horrible, and it’s actually a fairly common practice). If you wish to turn off a few lines of code and run your script again, simply put three quotes above and below them, like this:

X = 1
"""
import os                            # Disable this code temporarily
print(os.getcwd())
"""
Y = 2

I said this was hackish because Python really does make a string out of the lines of code disabled this way, but this is probably not significant in terms of performance. For large sections of code, it’s also easier than manually adding hash marks before each line and later removing them. This is especially true if you are using a text editor that does not have support for editing Python code specifically. In Python, practicality often beats aesthetics .

Strings in Action

Once you’ve created a string with the literal expressions we just met, you will almost certainly want to do things with it. This section and the next two demonstrate string expressions, methods, and formatting—the first line of text-processing tools in the Python language.

Basic Operations

Let’s begin by interacting with the Python interpreter to illustrate the basic string operations listed earlier in Table 7-1. Strings can be concatenated using the + operator and repeated using the * operator:

% python
>>> len('abc')            # Length: number of items
3
>>> 'abc' + 'def'         # Concatenation: a new string
'abcdef'
>>> 'Ni!' * 4             # Repetition: like "Ni!" + "Ni!" + ...
'Ni!Ni!Ni!Ni!'

Formally, adding two string objects creates a new string object, with the contents of its operands joined. Repetition is like adding a string to itself a number of times. In both cases, Python lets you create arbitrarily sized strings; there’s no need to predeclare anything in Python, including the sizes of data structures.^[19] The len built-in function returns the length of a string (or any other object with a length).

Repetition may seem a bit obscure at first, but it comes in handy in a surprising number of contexts. For example, to print a line of 80 dashes, you can count up to 80, or let Python count for you:

>>> print('------- ...more... ---')      # 80 dashes, the hard way
>>> print('-' * 80)                      # 80 dashes, the easy way

Notice that operator overloading is at work here already: we’re using the same + and * operators that perform addition and multiplication when using numbers. Python does the correct operation because it knows the types of the objects being added and multiplied. But be careful: the rules aren’t quite as liberal as you might expect. For instance, Python doesn’t allow you to mix numbers and strings in + expressions: 'abc'+9 raises an error instead of automatically converting 9 to a string.

As shown in the last row in Table 7-1, you can also iterate over strings in loops using for statements and test membership for both characters and substrings with the in expression operator, which is essentially a search. For substrings, in is much like the str.find() method covered later in this chapter, but it returns a Boolean result instead of the substring’s position:

>>> myjob = "hacker"
>>> for c in myjob: print(c, end=' ')   # Step through items
...
h a c k e r
>>> "k" in myjob                        # Found
True
>>> "z" in myjob                        # Not found
False
>>> 'spam' in 'abcspamdef'              # Substring search, no position returned
True

The for loop assigns a variable to successive items in a sequence (here, a string) and executes one or more statements for each item. In effect, the variable c becomes a cursor stepping across the string here. We will discuss iteration tools like these and others listed in Table 7-1 in more detail later in this book (especially in Chapters 14 and 20).

Indexing and Slicing

Because strings are defined as ordered collections of characters, we can access their components by position. In Python, characters in a string are fetched by indexing—providing the numeric offset of the desired component in square brackets after the string. You get back the one-character string at the specified position.

As in the C language, Python offsets start at 0 and end at one less than the length of the string. Unlike C, however, Python also lets you fetch items from sequences such as strings using negative offsets. Technically, a negative offset is added to the length of a string to derive a positive offset. You can also think of negative offsets as counting backward from the end. The following interaction demonstrates:

>>> S = 'spam'
>>> S[0], S[−2]                         # Indexing from front or end
('s', 'a')
>>> S[1:3], S[1:], S[:−1]               # Slicing: extract a section
('pa', 'pam', 'spa')

The first line defines a four-character string and assigns it the name S. The next line indexes it in two ways: S[0] fetches the item at offset 0 from the left (the one-character string 's'), and S[−2] gets the item at offset 2 back from the end (or equivalently, at offset (4 + (–2)) from the front). Offsets and slices map to cells as shown in Figure 7-1.^[20]

Offsets and slices: positive offsets start from the left end (offset 0 is the first item), and negatives count back from the right end (offset −1 is the last item). Either kind of offset can be used to give positions in indexing and slicing operations.

Figure 7-1. Offsets and slices: positive offsets start from the left end (offset 0 is the first item), and negatives count back from the right end (offset −1 is the last item). Either kind of offset can be used to give positions in indexing and slicing operations.

The last line in the preceding example demonstrates slicing, a generalized form of indexing that returns an entire section, not a single item. Probably the best way to think of slicing is that it is a type of parsing (analyzing structure), especially when applied to strings—it allows us to extract an entire section (substring) in a single step. Slices can be used to extract columns of data, chop off leading and trailing text, and more. In fact, we’ll explore slicing in the context of text parsing later in this chapter.

The basics of slicing are straightforward. When you index a sequence object such as a string on a pair of offsets separated by a colon, Python returns a new object containing the contiguous section identified by the offset pair. The left offset is taken to be the lower bound (inclusive), and the right is the upper bound (noninclusive). That is, Python fetches all items from the lower bound up to but not including the upper bound, and returns a new object containing the fetched items. If omitted, the left and right bounds default to 0 and the length of the object you are slicing, respectively.

For instance, in the example we just saw, S[1:3] extracts the items at offsets 1 and 2: it grabs the second and third items, and stops before the fourth item at offset 3. Next, S[1:] gets all items beyond the first—the upper bound, which is not specified, defaults to the length of the string. Finally, S[:−1] fetches all but the last item—the lower bound defaults to 0, and −1 refers to the last item, noninclusive.

This may seem confusing at first glance, but indexing and slicing are simple and powerful tools to use, once you get the knack. Remember, if you’re unsure about the effects of a slice, try it out interactively. In the next chapter, you’ll see that it’s even possible to change an entire section of another object in one step by assigning to a slice (though not for immutables like strings). Here’s a summary of the details for reference:

Indexing (S[i]) fetches components at offsets:
- The first item is at offset 0.
- Negative indexes mean to count backward from the end or right.
- S[0] fetches the first item.
- S[−2] fetches the second item from the end (like S[len(S)−2]).
Slicing (S[i:j]) extracts contiguous sections of sequences:
- The upper bound is noninclusive.
- Slice boundaries default to 0 and the sequence length, if omitted.
- S[1:3] fetches items at offsets 1 up to but not including 3.
- S[1:] fetches items at offset 1 through the end (the sequence length).
- S[:3] fetches items at offset 0 up to but not including 3.
- S[:−1] fetches items at offset 0 up to but not including the last item.
- S[:] fetches items at offsets 0 through the end—this effectively performs a top-level copy of S.

The last item listed here turns out to be a very common trick: it makes a full top-level copy of a sequence object—an object with the same value, but a distinct piece of memory (you’ll find more on copies in Chapter 9). This isn’t very useful for immutable objects like strings, but it comes in handy for objects that may be changed in-place, such as lists.

In the next chapter, you’ll see that the syntax used to index by offset (square brackets) is used to index dictionaries by key as well; the operations look the same but have different interpretations.

Extended slicing: the third limit and slice objects

In Python 2.3 and later, slice expressions have support for an optional third index, used as a step (sometimes called a stride). The step is added to the index of each item extracted. The full-blown form of a slice is now X[I:J:K], which means “extract all the items in X, from offset I through J−1, by K.” The third limit, K, defaults to 1, which is why normally all items in a slice are extracted from left to right. If you specify an explicit value, however, you can use the third limit to skip items or to reverse their order.

For instance, X[1:10:2] will fetch every other item in X from offsets 1–9; that is, it will collect the items at offsets 1, 3, 5, 7, and 9. As usual, the first and second limits default to 0 and the length of the sequence, respectively, so X[::2] gets every other item from the beginning to the end of the sequence:

>>> S = 'abcdefghijklmnop'
>>> S[1:10:2]
'bdfhj'
>>> S[::2]
'acegikmo'

You can also use a negative stride. For example, the slicing expression "hello"[::−1] returns the new string "olleh"—the first two bounds default to 0 and the length of the sequence, as before, and a stride of −1 indicates that the slice should go from right to left instead of the usual left to right. The effect, therefore, is to reverse the sequence:

>>> S = 'hello'
>>> S[::−1]
'olleh'

With a negative stride, the meanings of the first two bounds are essentially reversed. That is, the slice S[5:1:−1] fetches the items from 2 to 5, in reverse order (the result contains items from offsets 5, 4, 3, and 2):

>>> S = 'abcedfg'
>>> S[5:1:−1]
'fdec'

Skipping and reversing like this are the most common use cases for three-limit slices, but see Python’s standard library manual for more details (or run a few experiments interactively). We’ll revisit three-limit slices again later in this book, in conjunction with the for loop statement.

Later in the book, we’ll also learn that slicing is equivalent to indexing with a slice object, a finding of importance to class writers seeking to support both operations:

>>> 'spam'[1:3]                        # Slicing syntax
'pa'
>>> 'spam'[slice(1, 3)]                # Slice objects
'pa'
>>> 'spam'[::-1]
'maps'
>>> 'spam'[slice(None, None, −1)]
'maps'

Throughout this book, I will include common use case sidebars (such as this one) to give you a peek at how some of the language features being introduced are typically used in real programs. Because you won’t be able to make much sense of real use cases until you’ve seen more of the Python picture, these sidebars necessarily contain many references to topics not introduced yet; at most, you should consider them previews of ways that you may find these abstract language concepts useful for common programming tasks.

For instance, you’ll see later that the argument words listed on a system command line used to launch a Python program are made available in the argv attribute of the built-in sys module:

# File echo.py
import sys
print(sys.argv)

% python echo.py −a −b −c
['echo.py', '−a', '−b', '−c']

Usually, you’re only interested in inspecting the arguments that follow the program name. This leads to a very typical application of slices: a single slice expression can be used to return all but the first item of a list. Here, sys.argv[1:] returns the desired list, ['−a', '−b', '−c']. You can then process this list without having to accommodate the program name at the front.

Slices are also often used to clean up lines read from input files. If you know that a line will have an end-of-line character at the end (a newline marker), you can get rid of it with a single expression such as line[:−1], which extracts all but the last character in the line (the lower limit defaults to 0). In both cases, slices do the job of logic that must be explicit in a lower-level language.

Note that calling the line.rstrip method is often preferred for stripping newline characters because this call leaves the line intact if it has no newline character at the end—a common case for files created with some text-editing tools. Slicing works if you’re sure the line is properly terminated.

String Conversion Tools

One of Python’s design mottos is that it refuses the temptation to guess. As a prime example, you cannot add a number and a string together in Python, even if the string looks like a number (i.e., is all digits):

>>> "42" + 1
TypeError: cannot concatenate 'str' and 'int' objects

This is by design: because + can mean both addition and concatenation, the choice of conversion would be ambiguous. So, Python treats this as an error. In Python, magic is generally omitted if it will make your life more complex.

What to do, then, if your script obtains a number as a text string from a file or user interface? The trick is that you need to employ conversion tools before you can treat a string like a number, or vice versa. For instance:

>>> int("42"), str(42)          # Convert from/to string
(42, '42')
>>> repr(42)                    # Convert to as-code string
'42'

The int function converts a string to a number, and the str function converts a number to its string representation (essentially, what it looks like when printed). The repr function (and the older backquotes expression, removed in Python 3.0) also converts an object to its string representation, but returns the object as a string of code that can be rerun to recreate the object. For strings, the result has quotes around it if displayed with a print statement:

>>> print(str('spam'), repr('spam'))
('spam', "'spam'")

See the sidebar str and repr Display Formats for more on this topic. Of these, int and str are the generally prescribed conversion techniques.

Now, although you can’t mix strings and number types around operators such as +, you can manually convert operands before that operation if needed:

>>> S = "42"
>>> I = 1
>>> S + I
TypeError: cannot concatenate 'str' and 'int' objects

>>> int(S) + I            # Force addition
43

>>> S + str(I)            # Force concatenation
'421'

Similar built-in functions handle floating-point number conversions to and from strings:

>>> str(3.1415), float("1.5")
('3.1415', 1.5)

>>> text = "1.234E-10"
>>> float(text)
1.2340000000000001e-010

Later, we’ll further study the built-in eval function; it runs a string containing Python expression code and so can convert a string to any kind of object. The functions int and float convert only to numbers, but this restriction means they are usually faster (and more secure, because they do not accept arbitrary expression code). As we saw briefly in Chapter 5, the string formatting expression also provides a way to convert numbers to strings. We’ll discuss formatting further later in this chapter.

Character code conversions

On the subject of conversions, it is also possible to convert a single character to its underlying ASCII integer code by passing it to the built-in ord function—this returns the actual binary value of the corresponding byte in memory. The chr function performs the inverse operation, taking an ASCII integer code and converting it to the corresponding character:

>>> ord('s')
115
>>> chr(115)
's'

You can use a loop to apply these functions to all characters in a string. These tools can also be used to perform a sort of string-based math. To advance to the next character, for example, convert and do the math in integer:

>>> S = '5'
>>> S = chr(ord(S) + 1)
>>> S
'6'
>>> S = chr(ord(S) + 1)
>>> S
'7'

At least for single-character strings, this provides an alternative to using the built-in int function to convert from string to integer:

>>> int('5')
5
>>> ord('5') - ord('0')
5

Such conversions can be used in conjunction with looping statements, introduced in Chapter 4 and covered in depth in the next part of this book, to convert a string of binary digits to their corresponding integer values. Each time through the loop, multiply the current value by 2 and add the next digit’s integer value:

>>> B = '1101'                 # Convert binary digits to integer with ord
>>> I = 0
>>> while B != '':
...     I = I * 2 + (ord(B[0]) - ord('0'))
...     B = B[1:]
...
>>> I
13

A left-shift operation (I << 1) would have the same effect as multiplying by 2 here. We’ll leave this change as a suggested exercise, though, both because we haven’t studied loops in detail yet and because the int and bin built-ins we met in Chapter 5 handle binary conversion tasks for us in Python 2.6 and 3.0:

>>> int('1101', 2)             # Convert binary to integer: built-in
13
>>> bin(13)                    # Convert integer to binary
'0b1101'

Given enough time, Python tends to automate most common tasks!

Changing Strings

Remember the term “immutable sequence”? The immutable part means that you can’t change a string in-place (e.g., by assigning to an index):

>>> S = 'spam'
>>> S[0] = "x"
Raises an error!

So, how do you modify text information in Python? To change a string, you need to build and assign a new string using tools such as concatenation and slicing, and then, if desired, assign the result back to the string’s original name:

>>> S = S + 'SPAM!'         # To change a string, make a new one
>>> S
'spamSPAM!'
>>> S = S[:4] + 'Burger' + S[−1]
>>> S
'spamBurger!'

The first example adds a substring at the end of S, by concatenation (really, it makes a new string and assigns it back to S, but you can think of this as “changing” the original string). The second example replaces four characters with six by slicing, indexing, and concatenating. As you’ll see in the next section, you can achieve similar effects with string method calls like replace:

>>> S = 'splot'
>>> S = S.replace('pl', 'pamal')
>>> S
'spamalot'

Like every operation that yields a new string value, string methods generate new string objects. If you want to retain those objects, you can assign them to variable names. Generating a new string object for each string change is not as inefficient as it may sound—remember, as discussed in the preceding chapter, Python automatically garbage collects (reclaims the space of) old unused string objects as you go, so newer objects reuse the space held by prior values. Python is usually more efficient than you might expect.

Finally, it’s also possible to build up new text values with string formatting expressions. Both of the following substitute objects into a string, in a sense converting the objects to strings and changing the original string according to a format specification:

>>> 'That is %d %s bird!' % (1, 'dead')           # Format expression
That is 1 dead bird!
>>> 'That is {0} {1} bird!'.format(1, 'dead')     # Format method in 2.6 and 3.0
'That is 1 dead bird!'

Despite the substitution metaphor, though, the result of formatting is a new string object, not a modified one. We’ll study formatting later in this chapter; as we’ll find, formatting turns out to be more general and useful than this example implies. Because the second of the preceding calls is provided as a method, though, let’s get a handle on string method calls before we explore formatting further.

Note

As we’ll see in Chapter 36, Python 3.0 and 2.6 introduce a new string type known as bytearray, which is mutable and so may be changed in place. bytearray objects aren’t really strings; they’re sequences of small, 8-bit integers. However, they support most of the same operations as normal strings and print as ASCII characters when displayed. As such, they provide another option for large amounts of text that must be changed frequently. In Chapter 36 we’ll also see that ord and chr handle Unicode characters, too, which might not be stored in single bytes.

String Methods

In addition to expression operators, strings provide a set of methods that implement more sophisticated text-processing tasks. Methods are simply functions that are associated with particular objects. Technically, they are attributes attached to objects that happen to reference callable functions. In Python, expressions and built-in functions may work across a range of types, but methods are generally specific to object types—string methods, for example, work only on string objects. The method sets of some types intersect in Python 3.0 (e.g., many types have a count method), but they are still more type-specific than other tools.

In finer-grained detail, functions are packages of code, and method calls combine two operations at once (an attribute fetch and a call):

Attribute fetches: An expression of the form object.attribute means “fetch the value of attribute in object.”
Call expressions: An expression of the form function(arguments) means “invoke the code of function, passing zero or more comma-separated argument objects to it, and return function’s result value.”

Putting these two together allows us to call a method of an object. The method call expression object.method(arguments) is evaluated from left to right—Python will first fetch the method of the object and then call it, passing in the arguments. If the method computes a result, it will come back as the result of the entire method-call expression.

As you’ll see throughout this part of the book, most objects have callable methods, and all are accessed using this same method-call syntax. To call an object method, as you’ll see in the following sections, you have to go through an existing object.

Table 7-3 summarizes the methods and call patterns for built-in string objects in Python 3.0; these change frequently, so be sure to check Python’s standard library manual for the most up-to-date list, or run a help call on any string interactively. Python 2.6’s string methods vary slightly; it includes a decode, for example, because of its different handling of Unicode data (something we’ll discuss in Chapter 36). In this table, S is a string object, and optional arguments are enclosed in square brackets. String methods in this table implement higher-level operations such as splitting and joining, case conversions, content tests, and substring searches and replacements.

Table 7-3. String method calls in Python 3.0

`S.capitalize()`	`S.ljust(width [, fill])`
`S.center(width [, fill])`	`S.lower()`
`S.count(sub [, start [, end]])`	`S.lstrip([chars])`
`S.encode([encoding [,errors]])`	`S.maketrans(x[, y[, z]])`
`S.endswith(suffix [, start [, end]])`	`S.partition(sep)`
`S.expandtabs([tabsize])`	`S.replace(old, new [, count])`
`S.find(sub [, start [, end]])`	`S.rfind(sub [,start [,end]])`
`S.format(fmtstr, args, *kwargs)`	`S.rindex(sub [, start [, end]])`
`S.index(sub [, start [, end]])`	`S.rjust(width [, fill])`
`S.isalnum()`	`S.rpartition(sep)`
`S.isalpha()`	`S.rsplit([sep[, maxsplit]])`
`S.isdecimal()`	`S.rstrip([chars])`
`S.isdigit()`	`S.split([sep [,maxsplit]])`
`S.isidentifier()`	`S.splitlines([keepends])`
`S.islower()`	`S.startswith(prefix [, start [, end]])`
`S.isnumeric()`	`S.strip([chars])`
`S.isprintable()`	`S.swapcase()`
`S.isspace()`	`S.title()`
`S.istitle()`	`S.translate(map)`
`S.isupper()`	`S.upper()`
`S.join(iterable)`	`S.zfill(width)`

As you can see, there are quite a few string methods, and we don’t have space to cover them all; see Python’s library manual or reference texts for all the fine points. To help you get started, though, let’s work through some code that demonstrates some of the most commonly used methods in action, and illustrates Python text-processing basics along the way.

String Method Examples: Changing Strings

As we’ve seen, because strings are immutable, they cannot be changed in-place directly. To make a new text value from an existing string, you construct a new string with operations such as slicing and concatenation. For example, to replace two characters in the middle of a string, you can use code like this:

>>> S = 'spammy'
>>> S = S[:3] + 'xx' + S[5:]
>>> S
'spaxxy'

But, if you’re really just out to replace a substring, you can use the string replace method instead:

>>> S = 'spammy'
>>> S = S.replace('mm', 'xx')
>>> S
'spaxxy'

The replace method is more general than this code implies. It takes as arguments the original substring (of any length) and the string (of any length) to replace it with, and performs a global search and replace:

>>> 'aa$bb$cc$dd'.replace('$', 'SPAM')
'aaSPAMbbSPAMccSPAMdd'

In such a role, replace can be used as a tool to implement template replacements (e.g., in form letters). Notice that this time we simply printed the result, instead of assigning it to a name—you need to assign results to names only if you want to retain them for later use.

If you need to replace one fixed-size string that can occur at any offset, you can do a replacement again, or search for the substring with the string find method and then slice:

>>> S = 'xxxxSPAMxxxxSPAMxxxx'
>>> where = S.find('SPAM')            # Search for position
>>> where                             # Occurs at offset 4
4
>>> S = S[:where] + 'EGGS' + S[(where+4):]
>>> S
'xxxxEGGSxxxxSPAMxxxx'

The find method returns the offset where the substring appears (by default, searching from the front), or −1 if it is not found. As we saw earlier, it’s a substring search operation just like the in expression, but find returns the position of a located substring.

Another option is to use replace with a third argument to limit it to a single substitution:

>>> S = 'xxxxSPAMxxxxSPAMxxxx'
>>> S.replace('SPAM', 'EGGS')         # Replace all
'xxxxEGGSxxxxEGGSxxxx'

>>> S.replace('SPAM', 'EGGS', 1)      # Replace one
'xxxxEGGSxxxxSPAMxxxx'

Notice that replace returns a new string object each time. Because strings are immutable, methods never really change the subject strings in-place, even if they are called “replace”!

The fact that concatenation operations and the replace method generate new string objects each time they are run is actually a potential downside of using them to change strings. If you have to apply many changes to a very large string, you might be able to improve your script’s performance by converting the string to an object that does support in-place changes:

>>> S = 'spammy'
>>> L = list(S)
>>> L
['s', 'p', 'a', 'm', 'm', 'y']

The built-in list function (or an object construction call) builds a new list out of the items in any sequence—in this case, “exploding” the characters of a string into a list. Once the string is in this form, you can make multiple changes to it without generating a new copy for each change:

>>> L[3] = 'x'                        # Works for lists, not strings
>>> L[4] = 'x'
>>> L
['s', 'p', 'a', 'x', 'x', 'y']

If, after your changes, you need to convert back to a string (e.g., to write to a file), use the string join method to “implode” the list back into a string:

>>> S = ''.join(L)
>>> S
'spaxxy'

The join method may look a bit backward at first sight. Because it is a method of strings (not of lists), it is called through the desired delimiter. join puts the strings in a list (or other iterable) together, with the delimiter between list items; in this case, it uses an empty string delimiter to convert from a list back to a string. More generally, any string delimiter and iterable of strings will do:

>>> 'SPAM'.join(['eggs', 'sausage', 'ham', 'toast'])
'eggsSPAMsausageSPAMhamSPAMtoast'

In fact, joining substrings all at once this way often runs much faster than concatenating them individually. Be sure to also see the earlier note about the mutable bytearray string in Python 3.0 and 2.6, described fully in Chapter 36; because it may be changed in place, it offers an alternative to this list/join combination for some kinds of text that must be changed often.

String Method Examples: Parsing Text

Another common role for string methods is as a simple form of text parsing—that is, analyzing structure and extracting substrings. To extract substrings at fixed offsets, we can employ slicing techniques:

>>> line = 'aaa bbb ccc'
>>> col1 = line[0:3]
>>> col3 = line[8:]
>>> col1
'aaa'
>>> col3
'ccc'

Here, the columns of data appear at fixed offsets and so may be sliced out of the original string. This technique passes for parsing, as long as the components of your data have fixed positions. If instead some sort of delimiter separates the data, you can pull out its components by splitting. This will work even if the data may show up at arbitrary positions within the string:

>>> line = 'aaa bbb  ccc'
>>> cols = line.split()
>>> cols
['aaa', 'bbb', 'ccc']

The string split method chops up a string into a list of substrings, around a delimiter string. We didn’t pass a delimiter in the prior example, so it defaults to whitespace—the string is split at groups of one or more spaces, tabs, and newlines, and we get back a list of the resulting substrings. In other applications, more tangible delimiters may separate the data. This example splits (and hence parses) the string at commas, a separator common in data returned by some database tools:

>>> line = 'bob,hacker,40'
>>> line.split(',')
['bob', 'hacker', '40']

Delimiters can be longer than a single character, too:

>>> line = "i'mSPAMaSPAMlumberjack"
>>> line.split("SPAM")
["i'm", 'a', 'lumberjack']

Although there are limits to the parsing potential of slicing and splitting, both run very fast and can handle basic text-extraction chores.

Other Common String Methods in Action

Other string methods have more focused roles—for example, to strip off whitespace at the end of a line of text, perform case conversions, test content, and test for a substring at the end or front:

>>> line = "The knights who say Ni!
"
>>> line.rstrip()
'The knights who say Ni!'
>>> line.upper()
'THE KNIGHTS WHO SAY NI!
'
>>> line.isalpha()
False
>>> line.endswith('Ni!
')
True
>>> line.startswith('The')
True

Alternative techniques can also sometimes be used to achieve the same results as string methods—the in membership operator can be used to test for the presence of a substring, for instance, and length and slicing operations can be used to mimic endswith:

>>> line
'The knights who say Ni!
'

>>> line.find('Ni') != −1       # Search via method call or expression
True
>>> 'Ni' in line
True

>>> sub = 'Ni!
'
>>> line.endswith(sub)          # End test via method call or slice
True
>>> line[-len(sub):] == sub
True

See also the format string formatting method described later in this chapter; it provides more advanced substitution tools that combine many operations in a single step.

Again, because there are so many methods available for strings, we won’t look at every one here. You’ll see some additional string examples later in this book, but for more details you can also turn to the Python library manual and other documentation sources, or simply experiment interactively on your own. You can also check the help(S.method) results for a method of any string object S for more hints.

Note that none of the string methods accepts patterns—for pattern-based text processing, you must use the Python re standard library module, an advanced tool that was introduced in Chapter 4 but is mostly outside the scope of this text (one further example appears at the end of Chapter 36). Because of this limitation, though, string methods may sometimes run more quickly than the re module’s tools.

The Original string Module (Gone in 3.0)

The history of Python’s string methods is somewhat convoluted. For roughly the first decade of its existence, Python provided a standard library module called string that contained functions that largely mirrored the current set of string object methods. In response to user requests, in Python 2.0 these functions were made available as methods of string objects. Because so many people had written so much code that relied on the original string module, however, it was retained for backward compatibility.

Today, you should use only string methods, not the original string module. In fact, the original module-call forms of today’s string methods have been removed completely from Python in Release 3.0. However, because you may still see the module in use in older Python code, a brief look is in order here.

The upshot of this legacy is that in Python 2.6, there technically are still two ways to invoke advanced string operations: by calling object methods, or by calling string module functions and passing in the objects as arguments. For instance, given a variable X assigned to a string object, calling an object method:

X.method(arguments)

is usually equivalent to calling the same operation through the string module (provided that you have already imported the module):

string.method(X, arguments)

Here’s an example of the method scheme in action:

>>> S = 'a+b+c+'
>>> x = S.replace('+', 'spam')
>>> x
'aspambspamcspam'

To access the same operation through the string module in Python 2.6, you need to import the module (at least once in your process) and pass in the object:

>>> import string
>>> y = string.replace(S, '+', 'spam')
>>> y
'aspambspamcspam'

Because the module approach was the standard for so long, and because strings are such a central component of most programs, you might see both call patterns in Python 2.X code you come across.

Again, though, today you should always use method calls instead of the older module calls. There are good reasons for this, besides the fact that the module calls have gone away in Release 3.0. For one thing, the module call scheme requires you to import the string module (methods do not require imports). For another, the module makes calls a few characters longer to type (when you load the module with import, that is, not using from). And, finally, the module runs more slowly than methods (the module maps most calls back to the methods and so incurs an extra call along the way).

The original string module itself, without its string method equivalents, is retained in Python 3.0 because it contains additional tools, including predefined string constants and a template object system (a relatively obscure tool omitted here—see the Python library manual for details on template objects). Unless you really want to have to change your 2.6 code to use 3.0, though, you should consider the basic string operation calls in it to be just ghosts from the past.

String Formatting Expressions

Although you can get a lot done with the string methods and sequence operations we’ve already met, Python also provides a more advanced way to combine string processing tasks—string formatting allows us to perform multiple type-specific substitutions on a string in a single step. It’s never strictly required, but it can be convenient, especially when formatting text to be displayed to a program’s users. Due to the wealth of new ideas in the Python world, string formatting is available in two flavors in Python today:

String formatting expressions: The original technique, available since Python’s inception; this is based upon the C language’s “printf” model and is used in much existing code.
String formatting method calls: A newer technique added in Python 2.6 and 3.0; this is more unique to Python and largely overlaps with string formatting expression functionality.

Since the method call flavor is new, there is some chance that one or the other of these may become deprecated over time. The expressions are more likely to be deprecated in later Python releases, though this should depend on the future practice of real Python programmers. As they are largely just variations on a theme, though, either technique is valid to use today. Since string formatting expressions are the original in this department, let’s start with them.

Python defines the % binary operator to work on strings (you may recall that this is also the remainder of division, or modulus, operator for numbers). When applied to strings, the % operator provides a simple way to format values as strings according to a format definition. In short, the % operator provides a compact way to code multiple string substitutions all at once, instead of building and concatenating parts individually.

To format strings:

On the left of the % operator, provide a format string containing one or more embedded conversion targets, each of which starts with a % (e.g., %d).
On the right of the % operator, provide the object (or objects, embedded in a tuple) that you want Python to insert into the format string on the left in place of the conversion target (or targets).

For instance, in the formatting example we saw earlier in this chapter, the integer 1 replaces the %d in the format string on the left, and the string 'dead' replaces the %s. The result is a new string that reflects these two substitutions:

>>> 'That is %d %s bird!' % (1, 'dead')           # Format expression
That is 1 dead bird!

Technically speaking, string formatting expressions are usually optional—you can generally do similar work with multiple concatenations and conversions. However, formatting allows us to combine many steps into a single operation. It’s powerful enough to warrant a few more examples:

>>> exclamation = "Ni"
>>> "The knights who say %s!" % exclamation
'The knights who say Ni!'

>>> "%d %s %d you" % (1, 'spam', 4)
'1 spam 4 you'

>>> "%s -- %s -- %s" % (42, 3.14159, [1, 2, 3])
'42 -- 3.14159 -- [1, 2, 3]'

The first example here plugs the string "Ni" into the target on the left, replacing the %s marker. In the second example, three values are inserted into the target string. Note that when you’re inserting more than one value, you need to group the values on the right in parentheses (i.e., put them in a tuple). The % formatting expression operator expects either a single item or a tuple of one or more items on its right side.

The third example again inserts three values—an integer, a floating-point object, and a list object—but notice that all of the targets on the left are %s, which stands for conversion to string. As every type of object can be converted to a string (the one used when printing), every object type works with the %s conversion code. Because of this, unless you will be doing some special formatting, %s is often the only code you need to remember for the formatting expression.

Again, keep in mind that formatting always makes a new string, rather than changing the string on the left; because strings are immutable, it must work this way. As before, assign the result to a variable name if you need to retain it.

Advanced String Formatting Expressions

For more advanced type-specific formatting, you can use any of the conversion type codes listed in Table 7-4 in formatting expressions; they appear after the % character in substitution targets. C programmers will recognize most of these because Python string formatting supports all the usual C printf format codes (but returns the result, instead of displaying it, like printf). Some of the format codes in the table provide alternative ways to format the same type; for instance, %e, %f, and %g provide alternative ways to format floating-point numbers.

Table 7-4. String formatting type codes

Code	Meaning
`s`	String (or any object’s `str(X)` string)
`r`	`s`, but uses `repr`, not `str`
`c`	Character
`d`	Decimal (integer)
`i`	Integer
`u`	Same as `d` (obsolete: no longer unsigned)
`o`	Octal integer
`x`	Hex integer
`X`	`x`, but prints uppercase
`e`	Floating-point exponent, lowercase
`E`	Same as `e`, but prints uppercase
`f`	Floating-point decimal
`F`	Floating-point decimal
`g`	Floating-point `e` or `f`
`G`	Floating-point `E` or `F`
`%`	Literal `%`

In fact, conversion targets in the format string on the expression’s left side support a variety of conversion operations with a fairly sophisticated syntax all their own. The general structure of conversion targets looks like this:

%[(name)][flags][width][.precision]typecode

The character type codes in Table 7-4 show up at the end of the target string. Between the % and the character code, you can do any of the following: provide a dictionary key; list flags that specify things like left justification (−), numeric sign (+), and zero fills (0); give a total minimum field width and the number of digits after a decimal point; and more. Both width and precision can also be coded as a * to specify that they should take their values from the next item in the input values.

Formatting target syntax is documented in full in the Python standard manuals, but to demonstrate common usage, let’s look at a few examples. This one formats integers by default, and then in a six-character field with left justification and zero padding:

>>> x = 1234
>>> res = "integers: ...%d...%−6d...%06d" % (x, x, x)
>>> res
'integers: ...1234...1234  ...001234'

The %e, %f, and %g formats display floating-point numbers in different ways, as the following interaction demonstrates (%E is the same as %e but the exponent is uppercase):

>>> x = 1.23456789
>>> x
1.2345678899999999

>>> '%e | %f | %g' % (x, x, x)
'1.234568e+00 | 1.234568 | 1.23457'

>>> '%E' % x
'1.234568E+00'

For floating-point numbers, you can achieve a variety of additional formatting effects by specifying left justification, zero padding, numeric signs, field width, and digits after the decimal point. For simpler tasks, you might get by with simply converting to strings with a format expression or the str built-in function shown earlier:

>>> '%−6.2f | %05.2f | %+06.1f' % (x, x, x)
'1.23   | 01.23 | +001.2'

>>> "%s" % x, str(x)
('1.23456789', '1.23456789')

When sizes are not known until runtime, you can have the width and precision computed by specifying them with a * in the format string to force their values to be taken from the next item in the inputs to the right of the % operator—the 4 in the tuple here gives precision:

>>> '%f, %.2f, %.*f' % (1/3.0, 1/3.0, 4, 1/3.0)
'0.333333, 0.33, 0.3333'

If you’re interested in this feature, experiment with some of these examples and operations on your own for more information.

Dictionary-Based String Formatting Expressions

String formatting also allows conversion targets on the left to refer to the keys in a dictionary on the right and fetch the corresponding values. I haven’t told you much about dictionaries yet, so here’s an example that demonstrates the basics:

>>> "%(n)d %(x)s" % {"n":1, "x":"spam"}
'1 spam'

Here, the (n) and (x) in the format string refer to keys in the dictionary literal on the right and fetch their associated values. Programs that generate text such as HTML or XML often use this technique—you can build up a dictionary of values and substitute them all at once with a single formatting expression that uses key-based references:

>>> reply = """                               # Template with substitution targets
Greetings...
Hello %(name)s!
Your age squared is %(age)s
"""
>>> values = {'name': 'Bob', 'age': 40}       # Build up values to substitute
>>> print(reply % values)                     # Perform substitutions

Greetings...
Hello Bob!
Your age squared is 40

This trick is also used in conjunction with the vars built-in function, which returns a dictionary containing all the variables that exist in the place it is called:

>>> food = 'spam'
>>> age = 40
>>> vars()
{'food': 'spam', 'age': 40, ...many more... }

When used on the right of a format operation, this allows the format string to refer to variables by name (i.e., by dictionary key):

>>> "%(age)d %(food)s" % vars()
'40 spam'

We’ll study dictionaries in more depth in Chapter 8. See also Chapter 5 for examples that convert to hexadecimal and octal number strings with the %x and %o formatting target codes.

String Formatting Method Calls

As mentioned earlier, Python 2.6 and 3.0 introduced a new way to format strings that is seen by some as a bit more Python-specific. Unlike formatting expressions, formatting method calls are not closely based upon the C language’s “printf” model, and they are more verbose and explicit in intent. On the other hand, the new technique still relies on some “printf” concepts, such as type codes and formatting specifications. Moreover, it largely overlaps with (and sometimes requires a bit more code than) formatting expressions, and it can be just as complex in advanced roles. Because of this, there is no best-use recommendation between expressions and method calls today, so most programmers would be well served by a cursory understanding of both schemes.

The Basics

In short, the new string object’s format method in 2.6 and 3.0 (and later) uses the subject string as a template and takes any number of arguments that represent values to be substituted according to the template. Within the subject string, curly braces designate substitution targets and arguments to be inserted either by position (e.g., {1}) or keyword (e.g., {food}). As we’ll learn when we study argument passing in depth in Chapter 18, arguments to functions and methods may be passed by position or keyword name, and Python’s ability to collect arbitrarily many positional and keyword arguments allows for such general method call patterns. In Python 2.6 and 3.0, for example:

>>> template = '{0}, {1} and {2}'                             # By position
>>> template.format('spam', 'ham', 'eggs')
'spam, ham and eggs'

>>> template = '{motto}, {pork} and {food}'                   # By keyword
>>> template.format(motto='spam', pork='ham', food='eggs')
'spam, ham and eggs'

>>> template = '{motto}, {0} and {food}'                      # By both
>>> template.format('ham', motto='spam', food='eggs')
'spam, ham and eggs'

Naturally, the string can also be a literal that creates a temporary string, and arbitrary object types can be substituted:

>>> '{motto}, {0} and {food}'.format(42, motto=3.14, food=[1, 2])
'3.14, 42 and [1, 2]'

Just as with the % expression and other string methods, format creates and returns a new string object, which can be printed immediately or saved for further work (recall that strings are immutable, so format really must make a new object). String formatting is not just for display:

>>> X = '{motto}, {0} and {food}'.format(42, motto=3.14, food=[1, 2])
>>> X
'3.14, 42 and [1, 2]'

>>> X.split(' and ')
['3.14, 42', '[1, 2]']

>>> Y = X.replace('and', 'but under no circumstances')
>>> Y
'3.14, 42 but under no circumstances [1, 2]'

Adding Keys, Attributes, and Offsets

Like % formatting expressions, format calls can become more complex to support more advanced usage. For instance, format strings can name object attributes and dictionary keys—as in normal Python syntax, square brackets name dictionary keys and dots denote object attributes of an item referenced by position or keyword. The first of the following examples indexes a dictionary on the key “spam” and then fetches the attribute “platform” from the already imported sys module object. The second does the same, but names the objects by keyword instead of position:

>>> import sys

>>> 'My {1[spam]} runs {0.platform}'.format(sys, {'spam': 'laptop'})
'My laptop runs win32'

>>> 'My {config[spam]} runs {sys.platform}'.format(sys=sys,
                                                  config={'spam': 'laptop'})
'My laptop runs win32'

Square brackets in format strings can name list (and other sequence) offsets to perform indexing, too, but only single positive offsets work syntactically within format strings, so this feature is not as general as you might think. As with % expressions, to name negative offsets or slices, or to use arbitrary expression results in general, you must run expressions outside the format string itself (note the use of *parts here to unpack a tuple’s items into individual arguments, as we did on Page 132 when studying fractions; more on this in Chapter 18):

>>> somelist = list('SPAM')
>>> somelist
['S', 'P', 'A', 'M']

>>> 'first={0[0]}, third={0[2]}'.format(somelist)
'first=S, third=A'

>>> 'first={0}, last={1}'.format(somelist[0], somelist[-1])   # [-1] fails in fmt
'first=S, last=M'

>>> parts = somelist[0], somelist[-1], somelist[1:3]          # [1:3] fails in fmt
>>> 'first={0}, last={1}, middle={2}'.format(*parts)
"first=S, last=M, middle=['P', 'A']"

Adding Specific Formatting

Another similarity with % expressions is that more specific layouts can be achieved by adding extra syntax in the format string. For the formatting method, we use a colon after the substitution target’s identification, followed by a format specifier that can name the field size, justification, and a specific type code. Here’s the formal structure of what can appear as a substitution target in a format string:

{fieldname!conversionflag:formatspec}

In this substitution target syntax:

fieldname is a number or keyword naming an argument, followed by optional “.name” attribute or “[index]” component references.
conversionflag can be r, s, or a to call repr, str, or ascii built-in functions on the value, respectively.
formatspec specifies how the value should be presented, including details such as field width, alignment, padding, decimal precision, and so on, and ends with an optional data type code.

The formatspec component after the colon character is formally described as follows (brackets denote optional components and are not coded literally):

[[fill]align][sign][#][0][width][.precision][typecode]

align may be <, >, =, or ^, for left alignment, right alignment, padding after a sign character, or centered alignment, respectively. The formatspec also contains nested {} format strings with field names only, to take values from the arguments list dynamically (much like the * in formatting expressions).

See Python’s library manual for more on substitution syntax and a list of the available type codes—they almost completely overlap with those used in % expressions and listed previously in Table 7-4, but the format method also allows a “b” type code used to display integers in binary format (it’s equivalent to using the bin built-in call), allows a “%” type code to display percentages, and uses only “d” for base-10 integers (not “i” or “u”).

As an example, in the following {0:10} means the first positional argument in a field 10 characters wide, {1:<10} means the second positional argument left-justified in a 10-character-wide field, and {0.platform:>10} means the platform attribute of the first argument right-justified in a 10-character-wide field (note the use of dict() to make a dictionary from keyword arguments, introduced in Chapter 4 and covered in Chapter 8):

>>> '{0:10} = {1:10}'.format('spam', 123.4567)
'spam       =    123.457'

>>> '{0:>10} = {1:<10}'.format('spam', 123.4567)
'      spam = 123.457   '

>>> '{0.platform:>10} = {1[item]:<10}'.format(sys, dict(item='laptop'))
'     win32 = laptop    '

Floating-point numbers support the same type codes and formatting specificity in formatting method calls as in % expressions. For instance, in the following {2:g} means the third argument formatted by default according to the “g” floating-point representation, {1:.2f} designates the “f” floating-point format with just 2 decimal digits, and {2:06.2f} adds a field with a width of 6 characters and zero padding on the left:

>>> '{0:e}, {1:.3e}, {2:g}'.format(3.14159, 3.14159, 3.14159)
'3.141590e+00, 3.142e+00, 3.14159'

>>> '{0:f}, {1:.2f}, {2:06.2f}'.format(3.14159, 3.14159, 3.14159)
'3.141590, 3.14, 003.14'

Hex, octal, and binary formats are supported by the format method as well. In fact, string formatting is an alternative to some of the built-in functions that format integers to a given base:

>>> '{0:X}, {1:o}, {2:b}'.format(255, 255, 255)      # Hex, octal, binary
'FF, 377, 11111111'

>>> bin(255), int('11111111', 2), 0b11111111         # Other to/from binary
('0b11111111', 255, 255)

>>> hex(255), int('FF', 16), 0xFF                    # Other to/from hex
('0xff', 255, 255)

>>> oct(255), int('377', 8), 0o377, 0377             # Other to/from octal
('0377', 255, 255, 255)                              # 0377 works in 2.6, not 3.0!

Formatting parameters can either be hardcoded in format strings or taken from the arguments list dynamically by nested format syntax, much like the star syntax in formatting expressions:

>>> '{0:.2f}'.format(1 / 3.0)                        # Parameters hardcoded
'0.33'
>>> '%.2f' % (1 / 3.0)
'0.33'

>>> '{0:.{1}f}'.format(1 / 3.0, 4)                   # Take value from arguments
'0.3333'
>>> '%.*f' % (4, 1 / 3.0)                            # Ditto for expression
'0.3333'

Finally, Python 2.6 and 3.0 also provide a new built-in format function, which can be used to format a single item. It’s a more concise alternative to the string format method, and is roughly similar to formatting a single item with the % formatting expression:

>>> '{0:.2f}'.format(1.2345)                         # String method
'1.23'
>>> format(1.2345, '.2f')                            # Built-in function
'1.23'
>>> '%.2f' % 1.2345                                  # Expression
'1.23'

Technically, the format built-in runs the subject object’s __format__ method, which the str.format method does internally for each formatted item. It’s still more verbose than the original % expression’s equivalent, though—which leads us to the next section.

Comparison to the % Formatting Expression

If you study the prior sections closely, you’ll probably notice that at least for positional references and dictionary keys, the string format method looks very much like the % formatting expression, especially in advanced use with type codes and extra formatting syntax. In fact, in common use cases formatting expressions may be easier to code than formatting method calls, especially when using the generic %s print-string substitution target:

print('%s=%s' % ('spam', 42))            # 2.X+ format expression

print('{0}={1}'.format('spam', 42))      # 3.0 (and 2.6) format method

As we’ll see in a moment, though, more complex formatting tends to be a draw in terms of complexity (difficult tasks are generally difficult, regardless of approach), and some see the formatting method as largely redundant.

On the other hand, the formatting method also offers a few potential advantages. For example, the original % expression can’t handle keywords, attribute references, and binary type codes, although dictionary key references in % format strings can often achieve similar goals. To see how the two techniques overlap, compare the following % expressions to the equivalent format method calls shown earlier:

# The basics: with % instead of format()

>>> template = '%s, %s, %s'
>>> template % ('spam', 'ham', 'eggs')                        # By position
    'spam, ham, eggs'

>>> template = '%(motto)s, %(pork)s and %(food)s'
>>> template % dict(motto='spam', pork='ham', food='eggs')    # By key
'spam, ham and eggs'

>>> '%s, %s and %s' % (3.14, 42, [1, 2])                      # Arbitrary types
'3.14, 42 and [1, 2]'


# Adding keys, attributes, and offsets

>>> 'My %(spam)s runs %(platform)s' % {'spam': 'laptop', 'platform': sys.platform}
'My laptop runs win32'

>>> 'My %(spam)s runs %(platform)s' % dict(spam='laptop', platform=sys.platform)
'My laptop runs win32'

>>> somelist = list('SPAM')
>>> parts = somelist[0], somelist[-1], somelist[1:3]
>>> 'first=%s, last=%s, middle=%s' % parts
"first=S, last=M, middle=['P', 'A']"

When more complex formatting is applied the two techniques approach parity in terms of complexity, although if you compare the following with the format method call equivalents listed earlier you’ll again find that the % expressions tend to be a bit simpler and more concise:

# Adding specific formatting

>>> '%-10s = %10s' % ('spam', 123.4567)
'spam       =   123.4567'

>>> '%10s = %-10s' % ('spam', 123.4567)
'      spam = 123.4567  '

>>> '%(plat)10s = %(item)-10s' % dict(plat=sys.platform, item='laptop')
'     win32 = laptop    '


# Floating-point numbers

>>> '%e, %.3e, %g' % (3.14159, 3.14159, 3.14159)
'3.141590e+00, 3.142e+00, 3.14159'

>>> '%f, %.2f, %06.2f' % (3.14159, 3.14159, 3.14159)
'3.141590, 3.14, 003.14'


# Hex and octal, but not binary

>>> '%x, %o' % (255, 255)
'ff, 377'

The format method has a handful of advanced features that the % expression does not, but even more involved formatting still seems to be essentially a draw in terms of complexity. For instance, the following shows the same result generated with both techniques, with field sizes and justifications and various argument reference methods:

# Hardcoded references in both

>>> import sys

>>> 'My {1[spam]:<8} runs {0.platform:>8}'.format(sys, {'spam': 'laptop'})
'My laptop   runs    win32'

>>> 'My %(spam)-8s runs %(plat)8s' % dict(spam='laptop', plat=sys.platform)
'My laptop   runs    win32'

In practice, programs are less likely to hardcode references like this than to execute code that builds up a set of substitution data ahead of time (to collect data to substitute into an HTML template all at once, for instance). When we account for common practice in examples like this, the comparison between the format method and the % expression is even more direct (as we’ll see in Chapter 18, the **data in the method call here is special syntax that unpacks a dictionary of keys and values into individual “name=value” keyword arguments so they can be referenced by name in the format string):

# Building data ahead of time in both

>>> data = dict(platform=sys.platform, spam='laptop')

>>> 'My {spam:<8} runs {platform:>8}'.format(**data)
'My laptop   runs    win32'

>>> 'My %(spam)-8s runs %(platform)8s' % data
'My laptop   runs    win32'

As usual, the Python community will have to decide whether % expressions, format method calls, or a toolset with both techniques proves better over time. Experiment with these techniques on your own to get a feel for what they offer, and be sure to see the Python 2.6 and 3.0 library manuals for more details.

Note

String format method enhancements in Python 3.1: The upcoming 3.1 release (in alpha form as this chapter was being written) will add a thousand-separator syntax for numbers, which inserts commas between three-digit groups. Add a comma before the type code to make this work, as follows:

>>> '{0:d}'.format(999999999999)
'999999999999'

>>> '{0:,d}'.format(999999999999)
'999,999,999,999'

Python 3.1 also assigns relative numbers to substitution targets automatically if they are not included explicitly, though using this extension may negate one of the main benefits of the formatting method, as the next section describes:

>>> '{:,d}'.format(999999999999)
'999,999,999,999'

>>> '{:,d} {:,d}'.format(9999999, 8888888)
'9,999,999 8,888,888'

>>> '{:,.2f}'.format(296999.2567)
'296,999.26'

This book doesn’t cover 3.1 officially, so you should take this as a preview. Python 3.1 will also address a major performance issue in 3.0 related to the speed of file input/output operations, which made 3.0 impractical for many types of programs. See the 3.1 release notes for more details. See also the formats.py comma-insertion and money-formatting function examples in Chapter 24 for a manual solution that can be imported and used prior to Python 3.1.

Why the New Format Method?

Now that I’ve gone to such lengths to compare and contrast the two formatting techniques, I need to explain why you might want to consider using the format method variant at times. In short, although the formatting method can sometimes require more code, it also:

Has a few extra features not found in the % expression
Can make substitution value references more explicit
Trades an operator for an arguably more mnemonic method name
Does not support different syntax for single and multiple substitution value cases

Although both techniques are available today and the formatting expression is still widely used, the format method might eventually subsume it. But because the choice is currently still yours to make, let’s briefly expand on some of the differences before moving on.

Extra features

The method call supports a few extras that the expression does not, such as binary type codes and (coming in Python 3.1) thousands groupings. In addition, the method call supports key and attribute references directly. As we’ve seen, though, the formatting expression can usually achieve the same effects in other ways:

>>> '{0:b}'.format((2 ** 16) −1)
'1111111111111111'

>>> '%b' % ((2 ** 16) −1)
ValueError: unsupported format character 'b' (0x62) at index 1

>>> bin((2 ** 16) −1)
'0b1111111111111111'

>>> '%s' % bin((2 ** 16) −1)[2:]
'1111111111111111'

See also the prior examples that compare dictionary-based formatting in the % expression to key and attribute references in the format method; especially in common practice, the two seem largely variations on a theme.

Explicit value references

One use case where the format method is at least debatably clearer is when there are many values to be substituted into the format string. The lister.py classes example we’ll meet in Chapter 30, for example, substitutes six items into a single string, and in this case the method’s {i} position labels seem easier to read than the expression’s %s:

'
%s<Class %s, address %s:
%s%s%s>
' % (...)               # Expression

'
{0}<Class {1}, address {2}:
{3}{4}{5}>
'.format(...)     # Method

On the other hand, using dictionary keys in % expressions can mitigate much of this difference. This is also something of a worst-case scenario for formatting complexity, and not very common in practice; more typical use cases seem largely a tossup. Moreover, in Python 3.1 (still in alpha release form as I write these words), numbering substitution values will become optional, thereby subverting this purported benefit altogether:

C:misc> C:Python31python
>>> 'The {0} side {1} {2}'.format('bright', 'of', 'life')
'The bright side of life'
>>>
>>> 'The {} side {} {}'.format('bright', 'of', 'life')        # Python 3.1+
'The bright side of life'
>>>
>>> 'The %s side %s %s' % ('bright', 'of', 'life')
'The bright side of life'

Using 3.1’s automatic relative numbering like this seems to negate a large part of the method’s advantage. Compare the effect on floating-point formatting, for example—the formatting expression is still more concise, and still seems less cluttered:

C:misc> C:Python31python
>>> '{0:f}, {1:.2f}, {2:05.2f}'.format(3.14159, 3.14159, 3.14159)
'3.141590, 3.14, 03.14'
>>>
>>> '{:f}, {:.2f}, {:06.2f}'.format(3.14159, 3.14159, 3.14159)
'3.141590, 3.14, 003.14'
>>>
>>> '%f, %.2f, %06.2f' % (3.14159, 3.14159, 3.14159)
'3.141590, 3.14, 003.14'

Method names and general arguments

Given this 3.1 auto-numbering change, the only clearly remaining potential advantages of the formatting method are that it replaces the % operator with a more mnemonic format method name and does not distinguish between single and multiple substitution values. The former may make the method appear simpler to beginners at first glance (“format” may be easier to parse than multiple “%” characters), though this is too subjective to call.

The latter difference might be more significant—with the format expression, a single value can be given by itself, but multiple values must be enclosed in a tuple:

>>> '%.2f' % 1.2345
'1.23'
>>> '%.2f %s' % (1.2345, 99)
'1.23 99'

Technically, the formatting expression accepts either a single substitution value, or a tuple of one or more items. In fact, because a single item can be given either by itself or within a tuple, a tuple to be formatted must be provided as nested tuples:

>>> '%s' % 1.23
'1.23'
>>> '%s' % (1.23,)
'1.23'
>>> '%s' % ((1.23,),)
'(1.23,)'

The formatting method, on the other hand, tightens this up by accepting general function arguments in both cases:

>>> '{0:.2f}'.format(1.2345)
'1.23'
>>> '{0:.2f} {1}'.format(1.2345, 99)
'1.23 99'

>>> '{0}'.format(1.23)
'1.23'
>>> '{0}'.format((1.23,))
'(1.23,)'

Consequently, it might be less confusing to beginners and cause fewer programming mistakes. This is still a fairly minor issue, though—if you always enclose values in a tuple and ignore the nontupled option, the expression is essentially the same as the method call here. Moreover, the method incurs an extra price in inflated code size to achieve its limited flexibility. Given that the expression has been used extensively throughout Python’s history, it’s not clear that this point justifies breaking existing code for a new tool that is so similar, as the next section argues.

Possible future deprecation?

As mentioned earlier, there is some risk that Python developers may deprecate the % expression in favor of the format method in the future. In fact, there is a note to this effect in Python 3.0’s manuals.

This has not yet occurred, of course, and both formatting techniques are fully available and reasonable to use in Python 2.6 and 3.0 (the versions of Python this book covers). Both techniques are supported in the upcoming Python 3.1 release as well, so deprecation of either seems unlikely for the foreseeable future. Moreover, because formatting expressions are used extensively in almost all existing Python code written to date, most programmers will benefit from being familiar with both techniques for many years to come.

If this deprecation ever does occur, though, you may need to recode all your % expressions as format methods, and translate those that appear in this book, in order to use a newer Python release. At the risk of editorializing here, I hope that such a change will be based upon the future common practice of actual Python programmers, not the whims of a handful of core developers—particularly given that the window for Python 3.0’s many incompatible changes is now closed. Frankly, this deprecation would seem like trading one complicated thing for another complicated thing—one that is largely equivalent to the tool it would replace! If you care about migrating to future Python releases, though, be sure to watch for developments on this front over time.

General Type Categories

Now that we’ve explored the first of Python’s collection objects, the string, let’s pause to define a few general type concepts that will apply to most of the types we look at from here on. With regard to built-in types, it turns out that operations work the same for all the types in the same category, so we’ll only need to define most of these ideas once. We’ve only examined numbers and strings so far, but because they are representative of two of the three major type categories in Python, you already know more about several other types than you might think.

Types Share Operation Sets by Categories

As you’ve learned, strings are immutable sequences: they cannot be changed in-place (the immutable part), and they are positionally ordered collections that are accessed by offset (the sequence part). Now, it so happens that all the sequences we’ll study in this part of the book respond to the same sequence operations shown in this chapter at work on strings—concatenation, indexing, iteration, and so on. More formally, there are three major type (and operation) categories in Python:

Numbers (integer, floating-point, decimal, fraction, others): Support addition, multiplication, etc.
Sequences (strings, lists, tuples): Support indexing, slicing, concatenation, etc.
Mappings (dictionaries): Support indexing by key, etc.

Sets are something of a category unto themselves (they don’t map keys to values and are not positionally ordered sequences), and we haven’t yet explored mappings on our in-depth tour (dictionaries are discussed in the next chapter). However, many of the other types we will encounter will be similar to numbers and strings. For example, for any sequence objects X and Y:

X + Y makes a new sequence object with the contents of both operands.
X * N makes a new sequence object with N copies of the sequence operand X.

In other words, these operations work the same way on any kind of sequence, including strings, lists, tuples, and some user-defined object types. The only difference is that the new result object you get back is of the same type as the operands X and Y—if you concatenate lists, you get back a new list, not a string. Indexing, slicing, and other sequence operations work the same on all sequences, too; the type of the objects being processed tells Python which flavor of the task to perform.

Mutable Types Can Be Changed In-Place

The immutable classification is an important constraint to be aware of, yet it tends to trip up new users. If an object type is immutable, you cannot change its value in-place; Python raises an error if you try. Instead, you must run code to make a new object containing the new value. The major core types in Python break down as follows:

Immutables (numbers, strings, tuples, frozensets): None of the object types in the immutable category support in-place changes, though we can always run expressions to make new objects and assign their results to variables as needed.
Mutables (lists, dictionaries, sets, bytearray): Conversely, the mutable types can always be changed in-place with operations that do not create new objects. Although such objects can be copied, in-place changes support direct modification.

Generally, immutable types give some degree of integrity by guaranteeing that an object won’t be changed by another part of a program. For a refresher on why this matters, see the discussion of shared object references in Chapter 6. To see how lists, dictionaries, and tuples participate in type categories, we need to move ahead to the next chapter.

Chapter Summary

In this chapter, we took an in-depth tour of the string object type. We learned about coding string literals, and we explored string operations, including sequence expressions, string method calls, and string formatting with both expressions and method calls. Along the way, we studied a variety of concepts in depth, such as slicing, method call syntax, and triple-quoted block strings. We also defined some core ideas common to a variety of types: sequences, for example, share an entire set of operations.

In the next chapter, we’ll continue our types tour with a look at the most general object collections in Python—lists and dictionaries. As you’ll find, much of what you’ve learned here will apply to those types as well. And as mentioned earlier, in the final part of this book we’ll return to Python’s string model to flesh out the details of Unicode text and binary data, which are of interest to some, but not all, Python programmers. Before moving on, though, here’s another chapter quiz to review the material covered here.

Test Your Knowledge: Quiz

Can the string find method be used to search a list?
Can a string slice expression be used on a list?
How would you convert a character to its ASCII integer code? How would you convert the other way, from an integer to a character?
How might you go about changing a string in Python?
Given a string S with the value "s,pa,m", name two ways to extract the two characters in the middle.
How many characters are there in the string "a bx1f00d"?
Why might you use the string module instead of string method calls?

Test Your Knowledge: Answers

No, because methods are always type-specific; that is, they only work on a single data type. Expressions like X+Y and built-in functions like len(X) are generic, though, and may work on a variety of types. In this case, for instance, the in membership expression has a similar effect as the string find, but it can be used to search both strings and lists. In Python 3.0, there is some attempt to group methods by categories (for example, the mutable sequence types list and bytearray have similar method sets), but methods are still more type-specific than other operation sets.
Yes. Unlike methods, expressions are generic and apply to many types. In this case, the slice expression is really a sequence operation—it works on any type of sequence object, including strings, lists, and tuples. The only difference is that when you slice a list, you get back a new list.
The built-in ord(S) function converts from a one-character string to an integer character code; chr(I) converts from the integer code back to a string.
Strings cannot be changed; they are immutable. However, you can achieve a similar effect by creating a new string—by concatenating, slicing, running formatting expressions, or using a method call like replace—and then assigning the result back to the original variable name.
You can slice the string using S[2:4], or split on the comma and index the string using S.split(',')[1]. Try these interactively to see for yourself.
Six. The string "a bx1f00d" contains the bytes a, newline (), b, binary 31 (a hex escape x1f), binary 0 (an octal escape 00), and d. Pass the string to the built-in len function to verify this, and print each of its character’s ord results to see the actual byte values. See Table 7-2 for more details.
You should never use the string module instead of string object method calls today—it’s deprecated, and its calls are removed completely in Python 3.0. The only reason for using the string module at all is for its other tools, such as predefined constants. You might also see it appear in what is now very old and dusty Python code.

^[17]If you need to care about binary data files, the chief distinction is that you open them in binary mode (using open mode flags with a b, such as 'rb', 'wb', and so on). In Python 3.0, binary file content is a bytes string, with an interface similar to that of normal strings; in 2.6, such content is a normal str string. See also the standard struct module introduced in Chapter 9, which can parse binary data loaded from a file, and the extended coverage of binary files and byte strings in Chapter 36.

^[18]In classes, I’ve met people who have indeed committed most or all of this table to memory; I’d probably think that was really sick, but for the fact that I’m a member of the set, too.

^[19]Unlike with C character arrays, you don’t need to allocate or manage storage arrays when using Python strings; you can simply create string objects as needed and let Python manage the underlying memory space. As discussed in Chapter 6, Python reclaims unused objects’ memory space automatically, using a reference-count garbage-collection strategy. Each object keeps track of the number of names, data structures, etc., that reference it; when the count reaches zero, Python frees the object’s space. This scheme means Python doesn’t have to stop and scan all the memory to find unused space to free (an additional garbage component also collects cyclic objects).

^[20]More mathematically minded readers (and students in my classes) sometimes detect a small asymmetry here: the leftmost item is at offset 0, but the rightmost is at offset –1. Alas, there is no such thing as a distinct –0 value in Python.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7. Strings

Create new playlist

Sign In

Sign Up

Chapter 7. Strings

Note

String Literals

Single- and Double-Quoted Strings Are the Same

Escape Sequences Represent Special Bytes

Raw Strings Suppress Escapes

Note

Triple Quotes Code Multiline Block Strings

Strings in Action

Basic Operations

Indexing and Slicing

Extended slicing: the third limit and slice objects

String Conversion Tools

Character code conversions

Changing Strings

Note

String Methods

String Method Examples: Changing Strings

String Method Examples: Parsing Text

Other Common String Methods in Action

The Original string Module (Gone in 3.0)

String Formatting Expressions

Advanced String Formatting Expressions

Dictionary-Based String Formatting Expressions

String Formatting Method Calls

The Basics

Adding Keys, Attributes, and Offsets

Adding Specific Formatting

Comparison to the % Formatting Expression

Note

Why the New Format Method?

Extra features

Explicit value references

Method names and general arguments

Possible future deprecation?

General Type Categories

Types Share Operation Sets by Categories

Mutable Types Can Be Changed In-Place

Chapter Summary

Test Your Knowledge: Quiz

Test Your Knowledge: Answers

Table of Contents for
7. Strings