Chapter 2. Manipulating Strings

One of the most common and important functions of the Python language is to process and manipulate large amounts of text when implementing scripts, parsing XML/HTML, and interfacing with databases. For that reason, Python includes extremely dynamic and powerful string manipulation methods.

The phrases in this chapter are intended to give you a quick start into manipulating strings using the Python language. Although this chapter is not comprehensive, it tries to cover both the most commonly used functionality such as string comparisons, searching, and formatting, as well as some of the more powerful and dynamic functionality such as using strings as executable code, interpolating variables in strings, and evaluating strings as Python expressions.

Comparing Strings

Example . 

if cmpStr.upper() == upperStr.upper():
    print upperStr + " Matches " + cmpStr

Comparing strings in Python is best accomplished using a simple logical operation. For example, to determine whether a string matches another string exactly, you would use the is equal or == operation. You can also use other logical operations such as >= or < to determine a sort order for several strings.

Python provides several methods for string objects that help when comparing. The most commonly used are the upper() and lower() methods, which return a new string that is all upper- or lowercase, respectively.

Another useful method is the capitalize() method, which returns a new string with the first letter capitalized. There is also a swapcase() that will return a new string with exactly the opposite casing for each character.

cmpStr = "abc"
upperStr = "ABC"
lowerStr = "abc"

print "Case Sensitive Compare"
if cmpStr == lowerStr:
    print lowerStr + " Matches " + cmpStr

if cmpStr == upperStr:
    print upperStr + " Matches " + cmpStr

print "
Case In-Sensitive Compare"
if cmpStr.upper() == lowerStr.upper():
    print lowerStr + " Matches " + cmpStr

if cmpStr.upper() == upperStr.upper():
    print upperStr + " Matches " + cmpStr

comp_str.py

Case Sensitive Compare
abc Matches abc

Case In-Sensitive Compare
abc Matches abc
ABC Matches abc

Output from comp_str.py code

Joining Strings

Example . 

print "Words:" + word1 + word2 + word3 + word4
print "List: " + ' '.join(wordList)

Strings can be joined together using a simple add operation, formatting the strings together or using the join() method. Using either the + or += operation is the simplest method to implement and start off with. The two strings are simply appended to each other.

Formatting strings together is accomplished by defining a new string with string format codes, %s, and then adding additional strings as parameters to fill in each string format code. This can be extremely useful, especially when the strings need to be joined in a complex format.

The fastest way to join a list of strings is to use the join(wordList) method to join all the strings in a list. Each string, starting with the first, is added to the existing string in order. The join method can be a little tricky at first because it essentially performs a string+=list[x] operation on each iteration through the list of strings. This results in the string being appended as a prefix to each item in the list. This actually becomes extremely useful if you want to add spaces between the words in the list because you simply define a string as a single space and then implement the join method from that string:

word1 = "A"
word2 = "few"
word3 = "good"
word4 = "words"
wordList = ["A", "few", "more", "good", "words"]

#simple Join
print "Words:" + word1 + word2 + word3 + word4
print "List: " + ' '.join(wordList)

#Formatted String
sentence = ("First: %s %s %s %s." %
(word1,word2,word3,word4))
print sentence

#Joining a list of words
sentence = "Second:"
for word in wordList:
    sentence += " " + word
sentence += "."
print sentence

join_str.py

Words:Afewgoodwords
List: A few more good words
First: A few good words.
Second: A few more good words.

Output from join_str.py code

Splitting Strings

Example . 

print sentence.split()
print entry.split(':')
print paragraph.splitlines(1)

The split(separator) and splitlines(keeplineends) methods are provided by Python to split strings into substrings. The split method searches a string, splits it on each occurrence of the separator character, and subdivides it into a list of strings. If no separator character is specified, the split method will split the string at each occurrence of a whitespace character (space, tab, newline, and so on).

The splitlines method splits the string at each newline character into a list of strings. This can be extremely useful when you are parsing a large amount of text. The splitlines method accepts one argument that is a Boolean true or false to determine whether the newline character should be kept.

sentence = "A Simple Sentence."

paragraph = "This is a simple paragraph.

It is made up of of multiple

lines of text."

entry =
  "Name:Brad Dayley:Occupation:Software Engineer"

print sentence.split()
print entry.split(':')
print paragraph.splitlines(1)

split_str.py

['A', 'Simple', 'Sentence.']
['Name', 'Brad Dayley', 'Occupation',
 'Software Engineer']
['This is a simple paragraph.
',
 'It is made up of of multiple
',
 'lines of text.']

Output from split_str.py code

Searching Strings for Substrings

Example . 

print searchStr.find("Red")
print searchStr.rfind("Blue")
print searchStr.index("Blue")
print searchStr.index("Blue",8)

The two most common ways to search for a substring contained inside another string are the find(sub, [, start, [,end]])) and index(sub, [, start, [,end]]) methods.

The index method is faster than the find method; however, if the substring is not found in the string, an exception is thrown. If the find method fails to find the substring, then a -1 is returned. The find and index methods accept a search string as the first argument. The area of the string that is searched can be limited by specifying the optional start and/or end index. Only characters within those indexes will be searched.

Python also provides the rfind and rindex methods. These methods work in a similar manner as the find and index methods; however, they look for the right-most occurrence of the substring.

searchStr =
 "Red Blue Violet Green Blue Yellow Black"

print searchStr.find("Red")
print searchStr.rfind("Blue")
print searchStr.find("Blue")
print searchStr.find("Teal")
print searchStr.index("Blue")
print searchStr.index("Blue",20)
print searchStr.rindex("Blue")
print searchStr.rindex("Blue",1,18)

search_str.py

0
22
4
-1
4
22
22
4

Output from search_str.py code

Search and Replace in Strings

Example . 

question2 = question.replace("swallow", 
    "European swallow")
question3 = question.replace("swallow", 
    "African swallow")

The native string type in Python provides a replace(old, new, maxreplace) method to replace a specific substring with new text. The replace method accepts a search string as the first argument and replacement string as the second argument. Each occurrence of the search string will be replaced with the new string. Optionally, you can specify a maximum number of times to perform the replace operation as the third argument.

question = "What is the air speed velocity of 
    an unlaiden swallow?"
print question
question2 = question.replace("swallow", 
    "European swallow")
print question2
question3 = question.replace("swallow", 
    "African swallow")
print question3

replace_str.py

What is the air speed velocity of an unlaiden
swallow?
What is the air speed velocity of an unlaiden
European swallow?
What is the air speed velocity of an unlaiden
African swallow?

Output from replace_str.py code

Searching Strings for Specific Endings/Beginnings

Example . 

if f.endswith('.py'):
    print "Python file: " + f
elif f.endswith('.txt'):
    print "Text file: " + f

The endswith(suffix, [, start, [,end]]) and startswith(prefix, [, start, [,end]]) methods provide a simple and safe way to determine whether a string begins or ends with a specific prefix or suffix, respectively. The first argument is a string used to compare to the prefix or suffix of the string. The endswith and startswith methods are dynamic enough for you to limit the search to within a specific range of the string using the start and/or end arguments.

Note

The endswith and startswith methods are extremely useful when parsing file lists for extensions or filenames.

import os

for f in os.listdir('C:\txtfiles'):
    if f.endswith('.py'):
        print "Python file: " + f
    elif f.endswith('.txt'):
        print "Text file: " + f

end_str.py

Python file: comp_str.py
Python file: end_str.py
Python file: eval_str.py
Python file: join_str.py
Text file: output.txt
Python file: replace_str.py
Python file: search_str.py
Python file: split_str.py
Python file: trim_str.py
Python file: unicode_str.py
Python file: var_str.py

Output from end_str.py code

Trimming Strings

Example . 

str(len(badSentence.rstrip(' ')))
print badSentence.lstrip('	')
print badParagraph.strip((' ?!	'))

Common problems when parsing text are leftover characters at the beginning or end of the string. Python provides several strip methods to remove those characters. The strip([chrs]), lstrip([chrs]), and rstrip([chrs]) methods accept a list of characters as the only argument and return a new string with those characters trimmed from either the start, end, or both ends of the string.

Note

The strip will remove the specified characters from both the beginning and end of the string. The lstrip and rstrip methods remove the characters only from the beginning or end of the string, respectively.

import string
badSentence = "		This sentence has problems.   "

badParagraph = "		This paragraph 
has even 
    more 
problems.!?   "

#Strip trailing spaces
print "Length = " + str(len(badSentence))
print "Without trailing spaces = " + 
    str(len(badSentence.rstrip(' ')))

#Strip tabs
print "
Bad:
" + badSentence
print "
Fixed:
" + badSentence.lstrip('	')

#Strip leading and trailing characters
print "
Bad:
" + badParagraph
print "
Fixed:
" + badParagraph.strip((' ?!	'))

trim_str.py

Length = 32
Without trailing spaces = 29

Bad:
               This sentence has problems.

Fixed:
This sentence has problems.

Bad:
               This paragraph
has even more
problems.!?

Fixed:
This paragraph
has even more
problems.

Output from trim_str.py code

Aligning/Formatting Strings

Example . 

print "Chapter " + str(x) + 
    str(chapters[x]).rjust(15,'.')
print "
Hex String: " + hexStr.upper().ljust(8,'0')
print "Chapter %d %15s" % (x,str(chapters[x]))

One of the biggest advantages of the Python language is its capability to process and manipulate strings quickly and effectively. The native string type implements the rjust(width [, fill]) and ljust(width [, fill]) methods to quickly justify the text in a string a specific width to the right or left, respectively. The optional fill argument to the rjust and ljust methods will fill the space created by the justification with the specified character.

Another extremely useful part of Python’s string management is the capability to create complex string formatting on the fly by creating a format string and passing arguments to that string using the % operator. This results in a new formatted string that can be used in a string assignment, passed as an argument, or used in a print statement.

chapters = {1:5, 2:46, 3:52, 4:87, 5:90}
hexStr = "3f8"

#Right justify
print "Hex String: " + hexStr.upper().rjust(8,'0')
print
for x in chapters:
    print "Chapter " + str(x) + 
        str(chapters[x]).rjust(15,'.')


#Left justify
print "
Hex String: " + hexStr.upper().ljust(8,'0')

#String format
print
for x in chapters:
    print "Chapter %d %15s" % (x,str(chapters[x]))

format_str.py

Hex String: 000003F8

Chapter 1..............5
Chapter 2.............46
Chapter 3.............52
Chapter 4.............87
Chapter 5.............90

Hex String: 3F800000

Chapter 1               5
Chapter 2              46
Chapter 3              52
Chapter 4              87
Chapter 5              90

Output from format_str.py code

Executing Code Inside Strings

Example . 

codeStr = "for card in cards: 
    print "Card = " + card"
exec(codeStr)

One of the most dynamic features of Python is the capability to evaluate a string that contains code and execute the code locally. The exec(str [,globals [,locals]]) function will execute Python code that is contained in the str string and return the result. Local and global variables can be added to the environment used to execute the code by specifying global and/or local dictionaries containing corresponding variable name and values.

The eval(str [,globals [,locals]]) function works in a similar manner as the exec function except that it only evaluates the string as a Python expression and returns the results.

cards = ['Ace', 'King', 'Queen', 'Jack']
codeStr = "for card in cards: 
    print "Card = " + card"
areaStr = "pi*(radius*radius)"

#Execute string
exec(codeStr)

#Evaluate string
print "
Area = " + str(eval(areaStr, 
    {"pi":3.14}, {"radius":5}))

eval_str.py

Card = Ace
Card = King
Card = Queen
Card = Jack

Area = 78.5

Output from eval_str.py code

Interpolating Variables Inside Strings

Example . 

s = string.Template("Variable v = $v")
for x in values:
   print s.substitute(v=x)

Python provides the capability to interpolate variables inside strings. This functionality provides the ability to create string templates and then apply variable values to them based on the state of an existing variable.

Interpolating variables is accomplished in two steps. The first step is to create a string template, using the Template(string) method, which includes the formatted text and properly placed variable names preceded by the $ character.

Note

To include a $ character in your template string use a double $$ set. The $$ will be replaced with a single $ when the template is applied.

Once the template has been created, the second step is to apply a variable value to the template using the substitute(m, [, kwargs]) method of the Template class. The argument m can be a specific assignment, a dictionary of variable values, or a keyword list.

import string

values = [5, 3, 'blue', 'red']
s = string.Template("Variable v = $v")

for x in values:
    print s.substitute(v=x)

var_str.py

Variable v = 5
Variable v = 3
Variable v = blue
Variable v = red

Output from var_str.py code

Converting Unicode to Local Strings

Example . 

print uniStr.encode('utf-8')
print uniStr.encode('utf-16')
print uniStr.encode('iso-8859-1')
asciiStr =asciiStr.translate( 
    string.maketrans('xF1','n'), '')
print asciiStr.encode('ascii')

The Python language provides a simple encode(encoding) method to convert unicode strings to a local string for easier processing. The encoding method takes only encoding such as utf-8, utf-16, iso-8859-1, and ascii as its single argument and returns a string encoded in that format.

Strings can be converted to unicode by several different methods. One is to define the string as unicode by prefixing it with a u when assigning it to a variable. Another is to combine a unicode string with another string. The resulting string will be unicode. You can also use the decode(encoding) method to decode the string. The decode method returns a unicode form of the string.

Note

The ASCII encoding allows only for characters up to 128. If your string includes characters that are above that range, you will need to translate those characters before encoding the string to ASCII.

import string

locStr = "El "
uniStr = u"Niu00F1o"

print uniStr.encode('utf-8')
print uniStr.encode('utf-16')
print uniStr.encode('iso-8859-1')

#Combine local and unicode results
#in new unicode string
newStr = locStr+uniStr
print newStr.encode('iso-8859-1')


#ascii will error because character 'xF1'
#is out of range
asciiStr = newStr.encode('iso-8859-1')
asciiStr =asciiStr.translate(
    string.maketrans('xF1','n'), '')
print asciiStr.encode('ascii')
print newStr.encode('ascii')

unicode_str.py

Niño
ÿþN|I|ñ|o
Niño
El Niño
El Nino
Traceback (most recent call last):
  File "C:ookspythonCH2codeunicode_str.py",
line 19, in ?
    print newStr.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode
 character u'xf1' in position 5: ordinal not in
 range(128)

Output from unicode_str.py code

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset