images

You’ve seen strings before, and know how to make them. You’ve also looked at how to access their individual characters by indexing and slicing. In this chapter, you see how to use them to format other values (for printing, for example), and take a quick look at the useful things you can do with string methods, such as splitting, joining, searching, and more.

Basic String Operations

All the standard sequence operations (indexing, slicing, multiplication, membership, length, minimum, and maximum) work with strings, as you saw in the previous chapter. Remember, however, that strings are immutable, so all kinds of item or slice assignments are illegal:

>>> website = 'http://www.python.org'
>>> website[-3:] = 'com'
Traceback (most recent call last):
  File "<pyshell#19>", line 1, in ?
    website[-3:] = 'com'
TypeError: object doesn't support slice assignment

String Formatting: The Short Version

If you are new to Python programming, chances are you won’t need all the options that are available in Python string formatting, so I’ll give you the short version here. If you are interested in the details, take a look at the section “String Formatting: The Long Version,” which follows. Otherwise, just read this and skip down to the section “String Methods.”

String formatting uses the (aptly named) string formatting operator, the percent (%) sign.

imagesNote  As you may remember, % is also used as a modulus (remainder) operator.

To the left of the %, you place a string (the format string); to the right of it, you place the value you want to format. You can use a single value such as a string or a number, you can use a tuple of values (if you want to format more than one), or, as I discuss in the next chapter, you can use a dictionary. The most common case is the tuple:

>>> format = "Hello, %s. %s enough for ya?"
>>> values = ('world', 'Hot')
>>> print format % values
Hello, world. Hot enough for ya?

imagesNote  If you use a list or some other sequence instead of a tuple, the sequence will be interpreted as a single value. Only tuples and dictionaries (discussed in Chapter 4) will allow you to format more than one value.

The %s parts of the format string are called conversion specifiers. They mark the places where the values are to be inserted. The s means that the values should be formatted as if they were strings; if they aren’t, they’ll be converted with str. This works with most values. For a list of other specifier types, see Table 3-1 later in the chapter.

imagesNote  To actually include a percent sign in the format string, you must write %% so Python doesn’t mistake it for the beginning of a conversion specifier.

If you are formatting real numbers (floats), you can use the f specifier type and supply the precision as a . (dot), followed by the number of decimals you want to keep. The format specifier always ends with a type character, so you must put the precision before that:

>>> format = "Pi with three decimals: %.3f"
>>> from math import pi
>>> print format % pi
Pi with three decimals: 3.142

TEMPLATE STRINGS

String Formatting: The Long Version

The right operand of the formatting operator may be anything; if it is either a tuple or a mapping (like a dictionary), it is given special treatment. We haven’t looked at mappings (such as dictionaries) yet, so let’s focus on tuples here. We’ll use mappings in formatting in Chapter 4, where they’re discussed in greater detail.

If the right operand is a tuple, each of its elements is formatted separately, and you need a conversion specifier for each of the values.

imagesNote  If you write the tuple to be converted as part of the conversion expression, you must enclose it in parentheses to avoid confusing Python:

>>> '%s plus %s equals %s' % (1, 1, 2)
'1 plus 1 equals 2'
>>> '%s plus %s equals %s' % 1, 1, 2 # Lacks parentheses!
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: not enough arguments for format string

A basic conversion specifier (as opposed to a full conversion specifier, which may contain a mapping key as well; see Chapter 4 for more information) consists of the items that follow. Note that the order of these items is crucial.

  • The % character: This marks the beginning of the conversion specifier.
  • Conversion flags: These are optional and may be -, indicating left alignment; +, indicating that a sign should precede the converted value; “ ” (a space character), indicating that a space should precede positive numbers; or 0, indicating that the conversion should be zero-padded.
  • The minimum field width: This is also optional and specifies that the converted string will be at least this wide. If this is an * (asterisk), the width will be read from the value tuple.
  • A . (dot) followed by the precision: This is also optional. If a real number is converted, this many decimals should be shown. If a string is converted, this number is the maximum field width. If this is an * (asterisk), the precision will be read from the value tuple.
  • The conversion type: This can be any of the types listed in Table 3-1.

Table 3-1. String Formatting Conversion Types

Conversion Type Meaning
d, i Signed integer decimal
o Unsigned octal
u Unsigned decimal
x Unsigned hexadecimal (lowercase)
X Unsigned hexadecimal (uppercase)
e Floating-point exponential format (lowercase)
E Floating-point exponential format (uppercase)
f, F Floating-point decimal format
g Same as e if exponent is greater than –4 or less than precision; f otherwise
G Same as E if exponent is greater than –4 or less than precision; F otherwise
c Single character (accepts an integer or a single character string)
r String (converts any Python object using repr)
s String (converts any Python object using str)

The following sections discuss the various elements of the conversion specifiers in more detail.

Simple Conversion

The simple conversion, with only a conversion type, is really easy to use:

>>> 'Price of eggs: $%d' % 42
'Price of eggs: $42'
>>> 'Hexadecimal price of eggs: %x' % 42
'Hexadecimal price of eggs: 2a'
>>> from math import pi
>>> 'Pi: %f...' % pi
'Pi: 3.141593...'
>>> 'Very inexact estimate of pi: %i' % pi
'Very inexact estimate of pi: 3'
>>> 'Using str: %s' % 42L
'Using str: 42'
>>> 'Using repr: %r' % 42L
'Using repr: 42L'

Width and Precision

A conversion specifier may include a field width and a precision. The width is the minimum number of characters reserved for a formatted value. The precision is (for a numeric conversion) the number of decimals that will be included in the result or (for a string conversion) the maximum number of characters the formatted value may have.

These two parameters are supplied as two integer numbers (width first, then precision), separated by a . (dot). Both are optional, but if you want to supply only the precision, you must also include the dot:

>>> '%10f' % pi       # Field width 10
'  3.141593'
>>> '%10.2f' % pi     # Field width 10, precision 2
'      3.14'
>>> '%.2f' % pi       # Precision 2
'3.14'
>>> '%.5s' % 'Guido van Rossum'
'Guido'

You can use an * (asterisk) as the width or precision (or both). In that case, the number will be read from the tuple argument:

>>> '%.*s' % (5, 'Guido van Rossum')
'Guido'

Signs, Alignment, and Zero-Padding

Before the width and precision numbers, you may put a “flag,” which may be either zero, plus, minus, or blank. A zero means that the number will be zero-padded:

>>> '%010.2f' % pi
'0000003.14'

It’s important to note here that the leading zero in 010 in the preceding code does not mean that the width specifier is an octal number, as it would in a normal Python number. When you use 010 as the width specifier, it means that the width should be 10 and that the number should be zero-padded, not that the width should be 8:

>>> 010
8

A minus sign (-) left-aligns the value:

>>> '%-10.2f' % pi
'3.14      '

As you can see, any extra space is put on the right-hand side of the number.

A blank (“ ”) means that a blank should be put in front of positive numbers. This may be useful for aligning positive and negative numbers:

>>> print ('% 5d' % 10) + ' ' + ('% 5d' % -10)F
   10
  -10

Finally, a plus (+) means that a sign (either plus or minus) should precede both positive and negative numbers (again, useful for aligning):

>>> print ('%+5d' % 10) + ' ' + ('%+5d' % -10)
  +10
  -10

In the example shown in Listing 3-1, I use the asterisk width specifier to format a table of fruit prices, where the user enters the total width of the table. Because this information is supplied by the user, I can’t hard-code the field widths in my conversion specifiers. By using the asterisk, I can have the field width read from the converted tuple.

Listing 3-1. String Formatting Example

# Print a formatted price list with a given width

width = input('Please enter width: ')

price_width = 10
item_width = width - price_width

header_format = '%-*s%*s'
format       = '%-*s%*.2f'

print '=' * width

print header_format % (item_width, 'Item', price_width, 'Price')

print '-' * width

print format % (item_width, 'Apples', price_width, 0.4)
print format % (item_width, 'Pears', price_width, 0.5)
print format % (item_width, 'Cantaloupes', price_width, 1.92)
print format % (item_width, 'Dried Apricots (16 oz.)', price_width, 8)
print format % (item_width, 'Prunes (4 lbs.)', price_width, 12)

print '=' * width

The following is a sample run of the program:

Please enter width: 35
===================================
Item                          Price
———————————————————————————————————
Apples                         0.40
Pears                          0.50
Cantaloupes                    1.92
Dried Apricots (16 oz.)        8.00
Prunes (4 lbs.)               12.00
===================================

String Methods

You have already encountered methods in lists. Strings have a much richer set of methods, in part because strings have “inherited” many of their methods from the string module where they resided as functions in earlier versions of Python (and where you may still find them, if you feel the need).

Because there are so many string methods, only some of the most useful ones are described here. For a full reference, see Appendix B. In the description of the string methods, you will find references to other, related string methods in this chapter (marked “See also”) or in Appendix B.

BUT STRING ISN’T DEAD

find

The find method finds a substring within a larger string. It returns the leftmost index where the substring is found. If it is not found, –1 is returned:

>>> 'With a moo-moo here, and a moo-moo there'.find('moo')
7
>>> title = "Monty Python's Flying Circus"
>>> title.find('Monty')
0
>>> title.find('Python')
6
>>> title.find('Flying')
15
>>> title.find('Zirquss')
-1

In our first encounter with membership in Chapter 2, we created part of a spam filter by using the expression '$$$' in subject. We could also have used find (which would also have worked prior to Python 2.3, when in could be used only when checking for single character membership in strings):

>>> subject = '$$$ Get rich now!!! $$$'
>>> subject.find('$$$')
0

imagesNote  The string method find does not return a Boolean value. If find returns 0, as it did here, it means that it has found the substring, at index zero.

You may also supply a starting point for your search and, optionally, an ending point:

>>> subject = '$$$ Get rich now!!! $$$'
>>> subject.find('$$$')
0
>>> subject.find('$$$', 1) # Only supplying the start
20
>>> subject.find('!!!')
16
>>> subject.find('!!!', 0, 16) # Supplying start and end
-1

Note that the range specified by the start and stop values (second and third parameters) includes the first index but not the second. This is common practice in Python.

In Appendix B: rfind, index, rindex, count, startswith, endswith.

join

A very important string method, join is the inverse of split. It is used to join the elements of a sequence:

>>> seq = [1, 2, 3, 4, 5]
>>> sep = '+'
>>> sep.join(seq) # Trying to join a list of numbers
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: sequence item 0: expected string, int found
>>> seq = ['1', '2', '3', '4', '5']
>>> sep.join(seq) # Joining a list of strings
'1+2+3+4+5'
>>> dirs = '', 'usr', 'bin', 'env'
>>> '/'.join(dirs)
'/usr/bin/env'
>>> print 'C:' + ''.join(dirs)
C:usrinenv

As you can see, the sequence elements that are to be joined must all be strings. Note how in the last two examples I use a list of directories and format them according to the conventions of UNIX and DOS/Windows simply by using a different separator (and adding a drive name in the DOS version).

See also: split.

lower

The lower method returns a lowercase version of the string:

>>> 'Trondheim Hammer Dance'.lower()
'trondheim hammer dance'

This can be useful if you want to write code that is case insensitive—that is, code that ignores the difference between uppercase and lowercase letters. For instance, suppose you want to check whether a user name is found in a list. If your list contains the string 'gumby' and the user enters his name as 'Gumby', you won’t find it:

>>> if 'Gumby' in ['gumby', 'smith', 'jones']: print 'Found it!'
...
>>>

Of course, the same thing will happen if you have stored 'Gumby' and the user writes 'gumby', or even 'GUMBY'. A solution to this is to convert all names to lowercase both when storing and searching. The code would look something like this:

>>> name = 'Gumby'
>>> names = ['gumby', 'smith', 'jones']
>>> if name.lower() in names: print 'Found it!'
...
Found it!
>>>

See also: translate.

In Appendix B: islower, capitalize, swapcase, title, istitle, upper, isupper.

TITLE CASING

replace

The replace method returns a string where all the occurrences of one string have been replaced by another:

>>> 'This is a test'.replace('is', 'eez')
'Theez eez a test'

If you have ever used the “search and replace” feature of a word processing program, you will no doubt see the usefulness of this method.

See also: translate.

In Appendix B: expandtabs.

split

A very important string method, split is the inverse of join, and is used to split a string into a sequence:

>>> '1+2+3+4+5'.split('+')
['1', '2', '3', '4', '5']
>>> '/usr/bin/env'.split('/')
['', 'usr', 'bin', 'env']
>>> 'Using  the  default'.split()
['Using', 'the', 'default']

Note that if no separator is supplied, the default is to split on all runs of consecutive whitespace characters (spaces, tabs, newlines, and so on).

See also: join.

In Appendix B: rsplit, splitlines.

strip

The strip method returns a string where whitespace on the left and right (but not internally) has been stripped (removed):

>>> '      internal whitespace is kept      '.strip()
'internal whitespace is kept'

As with lower, strip can be useful when comparing input to stored values. Let’s return to the user name example from the section on lower, and let’s say that the user inadvertently types a space after his name:

>>> names = ['gumby', 'smith', 'jones']
>>> name = 'gumby '
>>> if name in names: print 'Found it!'
...
>>> if name.strip() in names: print 'Found it!'
...
Found it!
>>>

You can also specify which characters are to be stripped, by listing them all in a string parameter:

>>> '*** SPAM * for * everyone!!! ***'.strip(' *!')
'SPAM * for * everyone'

Stripping is performed only at the ends, so the internal asterisks are not removed.

In Appendix B: lstrip, rstrip.

translate

Similar to replace, translate replaces parts of a string, but unlike replace, translate works only with single characters. Its strength lies in that it can perform several replacements simultaneously, and can do so more efficiently than replace.

There are quite a few rather technical uses for this method (such as translating newline characters or other platform-dependent special characters), but let’s consider a simpler (although slightly more silly) example. Let’s say you want to translate a plain English text into one with a German accent. To do this, you must replace the character c with k, and s with z.

Before you can use translate, however, you must make a translation table. This translation table is a full listing of which characters should be replaced by which. Because this table (which is actually just a string) has 256 entries, you won’t write it out yourself. Instead, you’ll use the function maketrans from the string module.

The maketrans function takes two arguments: two strings of equal length, indicating that each character in the first string should be replaced by the character in the same position in the second string. Got that? In the case of our simple example, the code would look like the following:

>>> from string import maketrans
>>> table = maketrans('cs', 'kz')

WHAT’S IN A TRANSLATION TABLE?

Once you have this table, you can use it as an argument to the translate method, thereby translating your string:

>>> 'this is an incredible test'.translate(table)
'thiz iz an inkredible tezt'

An optional second argument can be supplied to translate, specifying letters that should be deleted. If you wanted to emulate a really fast-talking German, for instance, you could delete all the spaces:

>>> 'this is an incredible test'.translate(table, ' ')
'thizizaninkredibletezt'

See also: replace, lower.

PROBLEMS WITH NON-ENGLISH STRINGS

A Quick Summary

In this chapter, you have seen two important ways of working with strings:

String formatting: The modulo operator (%) can be used to splice values into a string that contains conversion flags, such as %s. You can use this to format values in many ways, including right or left justification, setting a specific field width and precision, adding a sign (plus or minus), or left-padding with zeros.

String methods: Strings have a plethora of methods. Some of them are extremely useful (such as split and join), while others are used less often (such as istitle or capitalize).

New Functions in This Chapter

Function Description
string.capwords(s[, sep]) Splits s with split (using sep), capitalize items, and join with a single space
string.maketrans(from, to) Makes a translation table for translate

What Now?

Lists, strings, and dictionaries are three of the most important data types in Python. You’ve seen lists and strings, so guess what’s next? In the next chapter, you see how dictionaries not only support indices, but other kinds of keys (such as strings or tuples) as well. Dictionaries also support a few methods, although not as many as strings.

1. For more information, see Section 4.1.2, “Template strings,” of the Python Library Reference (http://python.org/doc/lib/node40.html).

2. For a more thorough description of the module, check out Section 4.1 of the Python Library Reference (http://python.org/doc/lib/module-string.html).

3. In Python 3.0, string.letters and friends will be removed. You will need to use constants like string.ascii_letters instead.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset