© Magnus Lie Hetland 2017

Magnus Lie Hetland, Beginning Python, 10.1007/978-1-4842-0028-5_1

1. Instant Hacking: The Basics

Magnus Lie Hetland

(1)Trondheim, Norway

It’s time to start hacking.1 In this chapter, you learn how to take control of your computer by speaking a language it understands: Python. Nothing here is particularly difficult, so if you know the basic principles of how your computer works, you should be able to follow the examples and try them out yourself. I’ll go through the basics, starting with the excruciatingly simple, but because Python is such a powerful language, you’ll soon be able to do pretty advanced things.

To begin, you need to install Python, or verify that you already have it installed. If you’re running macOS or Linux/UNIX, open a terminal (the Terminal app on a Mac), type in python, and press Enter. You should get a welcome message, ending with the following prompt:

>>>

If you do, you can start entering Python commands immediately. Note, however, that you may have an old version of Python. If the first line starts with Python 2 rather than Python 3, you might want to install a newer version anyway, as Python 3 introduces several breaking changes.

The details of the installation process will of course vary with your OS and preferred installation mechanism, but the most straightforward approach is to visit www.python.org , where you should find a link to a download page. It is all pretty self-explanatory—just follow the link to the most recent version for your platform, be it Windows, macOS, Linux/UNIX, or something else. For Windows and Mac, you’ll download an installer that you can run to actually install Python. For Linux/UNIX, there are source code tarballs that you’ll need to compile yourself, by following the included instructions. If you’re using a package manager such as Homebrew or APT, you can use that to streamline the process.

Once you have Python installed, try to fire up the interactive interpreter. If you’re using the command line, you could simply use the python command, or perhaps python3 if you have an older version installed as well. If you’d rather use a graphical interface, you can start the IDLE app that comes with the Python installation.

The Interactive Interpreter

When you start up Python, you get a prompt similar to the following:

Python 3.5.0 (default, Dec 5 2015, 15:03:35)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

The exact appearance of the interpreter and its error messages will depend on which version you are using. This might not seem very interesting, but believe me, it is. This is your gateway to hackerdom—your first step in taking control of your computer. In more pragmatic terms, it’s an interactive Python interpreter. Just to see if it’s working, try the following:

>>> print("Hello, world!")

When you press the Enter key, the following output appears:

Hello, world!
>>>

If you are familiar with other computer languages, you may be used to terminating every line with a semicolon. There is no need to do so in Python. A line is a line, more or less. You may add a semicolon if you like, but it won’t have any effect (unless more code follows on the same line), and it is not a common thing to do.

So what happened here? The >>> thingy is the prompt. You can write something in this space, like print "Hello, world!". If you press Enter, the Python interpreter prints out the string “Hello, world!” and you get a new prompt below that.

What if you write something completely different? Try it out:

>>> The Spanish Inquisition
SyntaxError: invalid syntax
>>>

Obviously, the interpreter didn’t understand that.2 (If you are running an interpreter other than IDLE, such as the command-line version for Linux, the error message will be slightly different.) The interpreter also indicates what’s wrong: it will emphasize the word Spanish by giving it a red background (or, in the command-line version, by using a caret, ^).

If you feel like it, play around with the interpreter some more. For some guidance, try entering the command help() at the prompt and pressing Enter . You can press F1 for help about IDLE. Otherwise, let’s press on. After all, the interpreter isn’t much fun when you don’t know what to tell it.

Algo . . . What?

Before we start programming in earnest, I’ll try to give you an idea of what computer programming is. Simply put, it’s telling a computer what to do. Computers can do a lot of things, but they aren’t very good at thinking for themselves. They really need to be spoon-fed the details. You need to feed the computer an algorithm in some language it understands. Algorithmis just a fancy word for a procedure or recipe—a detailed description of how to do something. Consider the following:

SPAM with SPAM, SPAM, Eggs, and SPAM:  First, take some SPAM.
Then add some SPAM, SPAM, and eggs.
If a particularly spicy SPAM is desired, add some SPAM.
Cook until done -- Check every 10 minutes.

Not the fanciest of recipes, but its structure can be quite illuminating. It consists of a series of instructions to be followed in order. Some of the instructions may be done directly (“take some SPAM”), while some require some deliberation (“If a particularly spicy SPAM is desired”), and others must be repeated several times (“Check every 10 minutes.”)

Recipes and algorithms consist of ingredients (objects, things) and instructions (statements). In this example, SPAM and eggs are the ingredients, while the instructions consist of adding SPAM, cooking for a given length of time, and so on. Let’s start with some reasonably simple Python ingredients and see what you can do with them.

Numbers and Expressions

The interactive Python interpreter can be used as a powerful calculator. Try the following:

>>> 2 + 2

This should give you the answer 4. That wasn’t too hard. Well, what about this:

>>> 53672 + 235253
288925

Still not impressed? Admittedly, this is pretty standard stuff. (I’ll assume that you’ve used a calculator enough to know the difference between 1 + 2 * 3 and (1 + 2) * 3.) All the usual arithmetic operators work as expected. Division produces decimal numbers, called floats (or floating-point numbers).

>>> 1 / 2
0.5
>>> 1 / 1
1.0

If you’d rather discard the fractional part and do integer division, you can use a double slash.

>>> 1 // 2
0
>>> 1 // 1
1
>>> 5.0 // 2.4
2.0

In older versions of Python , ordinary division on integers used to work like this double slash. If you’re using Python 2.x, you can get proper division by adding the following statement to the beginning of your program (writing full programs is described later) or simply executing it in the interactive interpreter:

>>> from __future__ import division
Note

In case it’s not entirely clear, the future in the instruction is surrounded by two underscores on both sides: _ _future_ _.

Another alternative, if you’re running an old Python from the command line, is to supply the command-line switch -Qnew. There is a more thorough explanation of the __future__ stuff in the section “Back to the __future__” later in this chapter.

Now you’ve seen the basic arithmetic operators (addition, subtraction, multiplication, and division), but I’ve left out a close relative of integer division.

>>> 1 % 2
1

This is the remainder (modulus) operator. x % y gives the remainder of x divided by y. In other words, it’s the part that’s left over when you use integer division. That is, x % y is the same as x - ((x // y) * y).

>>> 10 // 3
3
>>> 10 % 3
1
>>> 9 // 3
3
>>> 9 % 3
0
>>> 2.75 % 0.5
0.25

Here 10 // 3 is 3 because the result is rounded down. But 3 × 3 is 9, so you get a remainder of 1. When you divide 9 by 3, the result is exactly 3, with no rounding. Therefore, the remainder is 0. This may be useful if you want to check something “every 10 minutes” as in the recipe earlier in the chapter. You can simply check whether minute % 10 is 0. (For a description on how to do this, see the sidebar “Sneak Peek: The if Statement” later in this chapter.) As you can see from the final example, the remainder operator works just fine with floats as well. It even works with negative numbers, and this can be a little confusing.

>>> 10 % 3
1
>>> 10 % -3
-2
>>> -10 % 3
2
>>> -10 % -3
-1

Looking at these examples, it might not be immediately obvious how it works. It’s probably easier to understand if you look at the companion operation of integer division.

>>> 10 // 3
3
>>> 10 // -3
-4
>>> -10 // 3
-4
>>> -10 // -3
3

Given how the division works, it’s not that hard to understand what the remainder must be. The important thing to understand about integer division is that it is rounded down, which for negative numbers is away from zero. That means -10 // 3 is rounded down to -4, not up to -3.

The last operator we’ll look at is the exponentiation (or power) operator.

>>> 2 ** 3
8
>>> -3 ** 2
-9
>>> (-3) ** 2
9

Note that the exponentiation operator binds tighter than the negation (unary minus), so -3**2 is in fact the same as -(3**2). If you want to calculate (-3)**2, you must say so explicitly.

Hexadecimals Octals and Binary

To conclude this section, I should mention that hexadecimal, octal, and binary numbers are written like this:

>>> 0xAF
175
>>> 010
8
>>> 0b1011010010
722

The first digit in both of these is zero. (If you don’t know what this is all about, you probably don’t need this quite yet. Just file it away for later use.)

Variables

Another concept that might be familiar to you is variables. If algebra is but a distant memory, don’t worry: variables in Python are easy to understand. A variable is a name that represents (or refers to) some value. For example, you might want the name x to represent 3. To make it so, simply execute the following:

>>> x = 3

This is called an assignment. We assign the value 3 to the variable x. Another way of putting this is to say that we bind the variable x to the value (or object) 3. After you’ve assigned a value to a variable, you can use the variable in expressions.

>>> x * 2
6

Unlike some other languages, you can’t use a variable before you bind it to something. There is no “default value.”

Note

The simple story is that names, or identifiers, in Python consist of letters, digits, and underscore characters (_). They can’t begin with a digit, so Plan9 is a valid variable name, whereas 9Plan is not.3

Statements

Until now we’ve been working (almost) exclusively with expressions, the ingredients of the recipe. But what about statements—the instructions?

In fact, I’ve cheated. I’ve introduced two types of statements already: the print statement and assignments. What’s the difference between a statement and an expression? You could think of it like this: an expression is something, while a statement does something. For example, 2 * 2 is 4, whereas print(2 * 2) prints 4. The two behave quite similarly, so the difference between them might not be all that clear.

>>> 2 * 2
4
>>> print(2 * 2)
4

As long as you execute this in the interactive interpreter , there’s no difference, but that is only because the interpreter always prints out the values of all expressions (using the same representation as repr—see the section “String Representations, str and repr” later in this chapter). That is not true of Python in general. Later in this chapter, you’ll see how to make programs that run without this interactive prompt; simply putting an expression such as 2 * 2 in your program won’t do anything interesting.4 Putting print(2 * 2) in there, however, will still print out 4.

Note

Actually, print is a function (more on those later in the chapter), so what I’m referring to as a print statement is simply a function call. In Python 2.x, print had a statement type of its own and didn’t use parentheses around its arguments.

The difference between statements and expressions is more obvious when dealing with assignments. Because they are not expressions, they have no values that can be printed out by the interactive interpreter.

>>> x = 3
>>>

You simply get a new prompt immediately. Something has changed, however. We now have a new variable x, which is now bound to the value 3. To some extent, this is a defining quality of statements in general: they change things. For example, assignments change variables, and print statements change how your screen looks.

Assignments are probably the most important type of statement in any programming language, although it may be difficult to grasp their importance right now. Variables may just seem like temporary “storage” (like the pots and pans of a cooking recipe), but the real power of variables is that you don’t need to know what values they hold in order to manipulate them.5

For example, you know that x * y evaluates to the product of x and y, even though you may have no knowledge of what x and y are. So, you may write programs that use variables in various ways without knowing the values they will eventually hold (or refer to) when the program is run.

Getting Input from the User

You’ve seen that you can write programs with variables without knowing their values. Of course, the interpreter must know the values eventually. So how can it be that we don’t? The interpreter knows only what we tell it, right? Not necessarily.

You may have written a program, and someone else may use it. You cannot predict what values users will supply to the program. Let’s take a look at the useful function input. (I’ll have more to say about functions in a minute.)

>>> input("The meaning of life: ")
The meaning of life: 42
'42'

What happens here is that the first line (input(...)) is executed in the interactive interpreter. It prints out the string "The meaning of life: " as a new prompt. I type 42 and press Enter. The resulting value of input is that very number (as a piece of text, or string), which is automatically printed out in the last line. Converting the strings to integers using int, we can construct a slightly more interesting example:

>>> x = input("x: ")
x: 34
>>> y = input("y: ")
y: 42
>>> print(int(x) * int(y))
1428

Here, the statements at the Python prompts (>>>) could be part of a finished program, and the values entered (34 and 42) would be supplied by some user. Your program would then print out the value 1428, which is the product of the two. And you didn’t have to know these values when you wrote the program, right?

Note

Getting input like this is much more useful when you save your programs in a separate file so other users can execute them. You learn how to do that later in this chapter, in the section “Saving and Executing Your Programs.”

Functions

In the “Numbers and Expressions” section, I used the exponentiation operator (**) to calculate powers. The fact is that you can use a function instead, called pow.

>>> 2 ** 3
8
>>> pow(2, 3)
8

A function is like a little program that you can use to perform a specific action. Python has a lot of functions that can do many wonderful things. In fact, you can make your own functions, too (more about that later); therefore, we often refer to standard functions such as pow as built-in functions.

Using a function as I did in the preceding example is called calling the function. You supply it with arguments (in this case, 2 and 3), and it returns a value to you. Because it returns a value, a function call is simply another type of expression, like the arithmetic expressions discussed earlier in this chapter.6 In fact, you can combine function calls and operators to create more complicated expressions (like I did with int, earlier).

>>> 10 + pow(2, 3 * 5) / 3.0
10932.666666666666

Several built-in functions can be used in numeric expressions like this. For example, abs gives the absolute value of a number, and round rounds floating-point numbers to the nearest integer.

>>> abs(-10)
10
>>> 2 // 3
0
>>> round(2 / 3)
1.0

Notice the difference between the two last expressions . Integer division always rounds down, whereas round rounds to the nearest integer, with ties rounded toward the even number. But what if you want to round a given number down? For example, you might know that a person is 32.9 years old, but you would like to round that down to 32 because she isn’t really 33 yet. Python has a function for this (called floor)—it just isn’t available directly. As is the case with many useful functions, it is found in a module.

Modules

You may think of modules as extensions that can be imported into Python to expand its capabilities. You import modules with a special command called (naturally enough) import. The function mentioned in the previous section, floor, is in a module called math.

>>> import math
>>> math.floor(32.9)
32

Notice how this works: we import a module with import and then use the functions from that module by writing module.function. For this operation in particular, you could actually just convert the number into an integer, like I did earlier, with the results from input.

>>> int(32.9)
32
Note

Similar functions exist to convert to other types (for example, str and float). In fact, these aren’t really functions—they’re classes. I’ll have more to say about classes later.

The math module has several other useful functions, though. For example, the opposite of floor is ceil (short for “ceiling”), which finds the smallest integral value larger than or equal to the given number.

>>> math.ceil(32.3)
33
>>> math.ceil(32)
32

If you are sure that you won’t import more than one function with a given name (from different modules), you might not want to write the module name each time you call the function. Then you can use a variant of the import command.

>>> from math import sqrt
>>> sqrt(9)
3.0

After using the from module import function, you can use the function without its module prefix.

Tip

You may, in fact, use variables to refer to functions (and most other things in Python). By performing the assignment foo = math.sqrt, you can start using foo to calculate square roots; for example, foo(4) yields 2.0.

cmath and Complex Numbers

The sqrt function is used to calculate the square root of a number. Let’s see what happens if we supply it with a negative number:

>>> from math import sqrt
>>> sqrt(-1)
Traceback (most recent call last):    ...
ValueError: math domain error

or, on some platforms:

>>> sqrt(-1)
nan
Note

nan is simply a special value meaning “not a number.”

If we restrict ourselves to real numbers and their approximate implementation in the form of floats, we can’t take the square root of a negative number. The square root of a negative number is a so-called imaginary number, and numbers that are the sum of a real and an imaginary part are called complex. The Python standard library has a separate module for dealing with complex numbers.

>>> import cmath
>>> cmath.sqrt(-1)
1j

Notice that I didn’t use from ... import ... here. If I had, I would have lost my ordinary sqrt. Name clashes like these can be sneaky, so unless you really want to use the from version, you should probably stick with a plain import.

The value 1j is an example of an imaginary number. These numbers are written with a trailing j (or J). Complex arithmetic essentially follows from defining 1j as the square root of -1. Without delving too deeply into the topic, let me just show a final example:

>>> (1 + 3j) * (9 + 4j)
(-3 + 31j)

As you can see, the support for complex numbers is built into the language.

Note

There is no separate type for imaginary numbers in Python. They are treated as complex numbers whose real component is zero.

Back to the __future__

It has been rumored that Guido van Rossum (Python’s creator) has a time machine—on more than one occasion when people have requested features in the language, they have found that the features were already implemented. Of course, we aren’t all allowed into this time machine, but Guido has been kind enough to build a part of it into Python, in the form of the magic module __future__. From it, we can import features that will be standard in Python in the future but that aren’t part of the language yet. You saw this in the “Numbers and Expressions” section, and you’ll be bumping into it from time to time throughout this book.

Saving and Executing Your Programs

The interactive interpreter is one of Python’s great strengths. It makes it possible to test solutions and to experiment with the language in real time. If you want to know how something works, just try it! However, everything you write in the interactive interpreter is lost when you quit. What you really want to do is write programs that both you and other people can run. In this section, you learn how to do just that.

First of all, you need a text editor, preferably one intended for programming. (If you use something like Microsoft Word, which I really don’t really recommend, be sure to save your code as plain text.) If you are already using IDLE , you’re in luck. With IDLE, you can simply create a new editor window with File › New File. Another window appears, without an interactive prompt. Whew! Start by entering the following:

print("Hello, world!")

Now select File › Save to save your program (which is, in fact, a plain text file). Be sure to put it somewhere where you can find it later, and give your file any reasonable name, such as hello.py. (The .py ending is significant.)

Got that? Don’t close the window with your program in it. If you did, just open it again (File › Open). Now you can run it with Run › Run Module. (If you aren’t using IDLE, see the next section about running your programs from the command prompt.)

What happens? Hello, world! is printed in the interpreter window, which is exactly what we wanted. The interpreter prompt may be gone (depending on the version you’re using), but you can get it back by pressing Enter (in the interpreter window).

Let’s extend our script to the following:

name = input("What is your name? ")
print("Hello, " + name  + "!")

If you run this (remember to save it first), you should see the following prompt in the interpreter window:

What is your name?

Enter your name (for example, Gumby) and press Enter. You should get something like this:

Hello, Gumby!

Running Your Python Scripts from a Command Prompt

Actually, there are several ways to run your programs . First, let’s assume you have a DOS window or a UNIX shell prompt before you and that the directory containing the Python executable (called python.exe in Windows, and python in UNIX) or the directory containing the executable (in Windows) has been put in your PATH environment variable.7 Also, let’s assume that your script from the previous section (hello.py) is in the current directory. Then you can execute your script with the following command in Windows:

C:>python hello.py

or UNIX:

$ python hello.py

As you can see, the command is the same. Only the system prompt changes.

Making Your Scripts Behave Like Normal Programs

Sometimes you want to execute a Python program (also called a script) the same way you execute other programs (such as your web browser or text editor), rather than explicitly using the Python interpreter. In UNIX, there is a standard way of doing this: have the first line of your script begin with the character sequence #! (called pound bang or shebang) followed by the absolute path to the program that interprets the script (in our case Python). Even if you didn’t quite understand that, just put the following in the first line of your script if you want it to run easily on UNIX:

#!/usr/bin/env python

This should run the script, regardless of where the Python binary is located. If you have more than one version of Python installed, you could use a more specific executable name, such as python3, rather than simply python.

Before you can actually run your script, you must make it executable .

$ chmod a+x hello.py

Now it can be run like this (assuming that you have the current directory in your path):

$ hello.py

If this doesn’t work, try using ./hello.py instead, which will work even if the current directory (.) is not part of your execution path (which a responsible sysadmin would probably tell you it shouldn’t be).

If you like, you can rename your file and remove the py suffix to make it look more like a normal program.

What About Double-Clicking?

In Windows, the suffix (.py) is the key to making your script behave like a program. Try double-clicking the file hello.py you saved in the previous section. If Python was installed correctly, a DOS window appears with the prompt “What is your name?”8 There is one problem with running your program like this, however. Once you’ve entered your name, the program window closes before you can read the result. The window closes when the program is finished. Try changing the script by adding the following line at the end:

input("Press <enter>")

Now, after running the program and entering your name, you should have a DOS window with the following contents:

What is your name? Gumby
Hello, Gumby!
Press <enter>

Once you press the Enter key, the window closes (because the program is finished).

Comments

The hash sign (#) is a bit special in Python. When you put it in your code, everything to the right of it is ignored (which is why the Python interpreter didn’t choke on the /usr/bin/env stuff used earlier). Here is an example:

# Print the circumference of the circle:
print(2 * pi * radius)

The first line here is called a comment, which can be useful in making programs easier to understand—both for other people and for yourself when you come back to old code. It has been said that the first commandment of programmers is “Thou Shalt Comment” (although some less charitable programmers swear by the motto “If it was hard to write, it should be hard to read”). Make sure your comments say significant things and don’t simply restate what is already obvious from the code. Useless, redundant comments may be worse than none. For example, in the following, a comment isn’t really called for:

# Get the user's name:
user_name = input("What is your name?")

It’s always a good idea to make your code readable on its own as well, even without the comments. Luckily, Python is an excellent language for writing readable programs.

Strings

Now what was all that "Hello, " + name + "!" stuff about? The first program in this chapter was simply

print("Hello, world!")

It is customary to begin with a program like this in programming tutorials. The problem is that I haven’t really explained how it works yet. You know the basics of the print statement (I’ll have more to say about that later), but what is "Hello, world!"? It’s called a string (as in “a string of characters”). Strings are found in almost every useful, real-world Python program and have many uses. Their main use is to represent bits of text, such as the exclamation “Hello, world!”

Single-Quoted Strings and Escaping Quotes

Strings are values, just as numbers are:

>>> "Hello, world!"
'Hello, world!'

There is one thing that may be a bit surprising about this example, though: when Python printed out our string, it used single quotes, whereas we used double quotes. What’s the difference? Actually, there is no difference.

>>> 'Hello, world!'
'Hello, world!'

Here, we use single quotes , and the result is the same. So why allow both? Because in some cases it may be useful.

>>> "Let's go!"
"Let's go!"
>>> '"Hello, world!" she said'
'"Hello, world!" she said'

In the preceding code, the first string contains a single quote (or an apostrophe, as we should perhaps call it in this context), and therefore we can’t use single quotes to enclose the string. If we did, the interpreter would complain (and rightly so).

>>> 'Let's go!'
SyntaxError: invalid syntax

Here, the string is 'Let', and Python doesn’t quite know what to do with the following s (or the rest of the line, for that matter).

In the second string, we use double quotes as part of our sentence. Therefore, we have to use single quotes to enclose our string, for the same reasons as stated previously. Or, actually we don’t have to. It’s just convenient. An alternative is to use the backslash character () to escape the quotes in the string, like this:

>>> 'Let's go!'
"Let's go!"

Python understands that the middle single quote is a character in the string and not the end of the string. (Even so, Python chooses to use double quotes when printing out the string.) The same works with double quotes, as you might expect.

>>> ""Hello, world!" she said"
'"Hello, world!" she said'

Escaping quotes like this can be useful, and sometimes necessary. For example, what would you do without the backslash if your string contained both single and double quotes, as in the string 'Let's say "Hello, world!"'?

Note

Tired of backslashes? As you will see later in this chapter, you can avoid most of them by using long strings and raw strings (which can be combined).

Concatenating Strings

Just to keep whipping this slightly tortured example, let me show you another way of writing the same string:

>>> "Let's say " '"Hello, world!"'
'Let's say "Hello, world!"'

I’ve simply written two strings, one after the other, and Python automatically concatenates them (makes them into one string). This mechanism isn’t used very often, but it can be useful at times. However, it works only when you actually write both strings at the same time, directly following one another.

>>> x = "Hello, "
>>> y = "world!"
>>> x y
SyntaxError: invalid syntax

In other words, this is just a special way of writing strings, not a general method of concatenating them. How, then, do you concatenate strings? Just like you add numbers:

>>> "Hello, " + "world!"
'Hello, world!'
>>> x = "Hello, "
>>> y = "world!"
>>> x + y
'Hello, world!'

String Representations, str and repr

Throughout these examples, you have probably noticed that all the strings printed out by Python are still quoted. That’s because it prints out the value as it might be written in Python code, not how you would like it to look for the user. If you use print, however, the result is different.

>>> "Hello, world!"
'Hello, world!'
>>> print("Hello, world!")
Hello, world!

The difference is even more obvious if we sneak in the special linefeed character code .

>>> "Hello,
world!"
'Hello, world!'
>>> print("Hello, world!")
Hello,
world!

Values are converted to strings through two different mechanisms. You can access both mechanisms yourself, by using the functions str and repr.9 With str, you convert a value into a string in some reasonable fashion that will probably be understood by a user, for example, converting any special character codes to the corresponding characters, where possible. If you use repr, however, you will generally get a representation of the value as a legal Python expression.

>>> print(repr("Hello,
world!"))
'Hello, world!'
>>> print(str("Hello, world!"))
Hello,
world!

Long Strings, Raw Strings, and bytes

There are some useful, slightly specialized ways of writing strings. For example, there’s a custom syntax for writing strings that include newlines (long strings) or backslashes (raw strings). In Python 2, there was also a separate syntax for writing strings with special symbols of different kinds, producing objects of the unicode type. The syntax still works but is now redundant, because all strings in Python 3 are Unicode strings. Instead, a new syntax has been introduced to specify a bytes object, roughly corresponding to the old-school strings. As we shall see, these still play an important part in the handling of Unicode encodings.

Long Strings

If you want to write a really long string , one that spans several lines, you can use triple quotes instead of ordinary quotes.

print('''This is a very long string.  It continues here.
And it's not over yet.  "Hello, world!"
Still here.''')

You can also use triple double quotes, """like this""". Note that because of the distinctive enclosing quotes, both single and double quotes are allowed inside, without being backslash-escaped.

Tip

Ordinary strings can also span several lines. If the last character on a line is a backslash, the line break itself is “escaped” and ignored. For example:

print("Hello,  world!")

would print out Hello, world!. The same goes for expressions and statements in general.

>>> 1 + 2 + 
    4 + 5
12
>>> print
    ('Hello, world')
Hello, world

Raw Strings

Raw stringsaren’t too picky about backslashes, which can be very useful sometimes.10 In ordinary strings, the backslash has a special role: it escapes things, letting you put things into your string that you couldn’t normally write directly. For example, as we’ve seen, a newline is written and can be put into a string like this:

>>> print('Hello,
world!')
Hello,
world!

This is normally just dandy, but in some cases, it’s not what you want. What if you wanted the string to include a backslash followed by an n? You might want to put the DOS pathname C: owhere into a string.

>>> path = 'C:
owhere'
>>> path
'C: owhere'

This looks correct, until you print it and discover the flaw.

>>> print(path)
C:
owhere

It’s not exactly what we were after, is it? So what do we do? We can escape the backslash itself.

>>> print('C:\nowhere')
C: owhere

This is just fine. But for long paths, you wind up with a lot of backslashes.

path = 'C:\Program Files\fnord\foo\bar\baz\frozz\bozz'

Raw strings are useful in such cases. They don’t treat the backslash as a special character at all. Every character you put into a raw string stays the way you wrote it.

>>> print(r'C:
owhere')
C: owhere
>>> print(r'C:Program Filesfnordfooarazfrozzozz')
C:Program Filesfnordfooarazfrozzozz

As you can see, raw strings are prefixed with an r. It would seem that you can put anything inside a raw string, and that is almost true. Quotes must be escaped as usual, although that means you get a backslash in your final string, too.

>>> print(r'Let's go!')
Let's go!

The one thing you can’t have in a raw string is a lone, final backslash. In other words, the last character in a raw string cannot be a backslash unless you escape it (and then the backslash you use to escape it will be part of the string, too). Given the previous example, that ought to be obvious. If the last character (before the final quote) is an unescaped backslash, Python won’t know whether or not to end the string.

>>> print(r"This is illegal")
SyntaxError: EOL while scanning string literal

Okay, so it’s reasonable , but what if you want the last character in your raw string to be a backslash? (Perhaps it’s the end of a DOS path, for example.) Well, I’ve given you a whole bag of tricks in this section that should help you solve that problem, but basically you need to put the backslash in a separate string. A simple way of doing that is the following:

>>> print(r'C:Program Filesfooar' '')
C:Program Filesfooar

Note that you can use both single and double quotes with raw strings. Even triple-quoted strings can be raw.

Unicode, bytes, and bytearray

Python strings represent text using a scheme known as Unicode. The way this works for most basic programs is pretty transparent, so if you’d like, you could skip this section for now and read up on the topic as needed. However, as string and text file handling is one of the main uses of Python code, it probably wouldn’t hurt to at least skim this section.

Abstractly, each Unicode character is represented by a so-called code point, which is simply its number in the Unicode standard. This allows you to refer to more than 120,000 characters in 129 writing systems in a way that should be recognizable by any modern software. Of course, your keyboard won’t have hundreds of thousands of keys, so there are general mechanisms for specifying Unicode characters, either by 16- or 32-bit hexadecimal literals (prefixing them with u or U, respectively) or by their Unicode name (using N{ name }).

>>> "u00C6"

'Æ'

>>> "U0001F60A"

'☺'

>>> "This is a cat: N{Cat}"

'This is a cat: A326949_3_En_1_Figa_HTML.jpg '

You can find the various code points and names by searching the Web, using a description of the character you need, or you can use a specific site such as http://unicode-table.com .

The idea of Unicode is quite simple, but it comes with some challenges, one of which is the issue of encoding. All objects are represented in memory or on disk as a series of binary digits—zeroes and ones—grouped in chunks of eight, or bytes, and strings are no exception. In programming languages such as C, these bytes are completely out in the open. Strings are simply sequences of bytes. To interoperate with C, for example, and to write text to files or send it through network sockets, Python has two similar types, the immutable bytes and the mutable bytearray. If you wanted, you could produce a bytes object directly, instead of a string, by using the prefix b:

>>> b'Hello, world!'
b'Hello, world!'

However, a byte can hold only 256 values, quite a bit less than what the Unicode standard requires. Python bytes literals permit only the 128 characters of the ASCII standard, with the remaining 128 byte values requiring escape sequences like xf0 for the hexadecimal value 0xf0 (that is, 240).

It might seem the only difference here is the size of the alphabet available to us. That’s not really accurate, however. At a glance, it might seem like both ASCII and Unicode refer to a mapping between non-negative integers and characters, but there is a subtle difference: where Unicode code points are defined as integers, ASCII characters are defined both by their number and by their binary encoding. One reason this seems completely unremarkable is that the mapping between the integers 0–255 and an eight-digit binary numeral is completely standard, and there is little room to maneuver. The thing is, once we go beyond the single byte, things aren’t that simple. The direct generalization of simply representing each code point as the corresponding binary numeral may not be the way to go. Not only is there the issue of byte order, which one bumps up against even when encoding integer values, there is also the issue of wasted space: if we use the same number of bytes for encoding each code point, all text will have to accommodate the fact that you might want to include a few Anatolian hieroglyphs or a smattering of Imperial Aramaic. There is a standard for such an encoding of Unicode, which is called UTF-32 (for Unicode Transformation Format 32 bits), but if you’re mainly handling text in one of the more common languages of the Internet, for example, this is quite wasteful.

There is an absolutely brilliant alternative, however, devised in large part by computing pioneer Kenneth Thompson. Instead of using the full 32 bits, it uses a variable encoding, with fewer bytes for some scripts than others. Assuming that you’ll use these scripts more often, this will save you space overall, similar to how Morse code saves you effort by using fewer dots and dashes for the more common letters.11 In particular, the ASCII encoding is still used for single-byte encoding, retaining compatibility with older systems. However, characters outside this range use multiple bytes (up to six). Let’s try to encode a string into bytes, using the ASCII, UTF-8, and UTF-32 encodings.

>>> "Hello, world!".encode("ASCII")
b'Hello, world!'
>>> "Hello, world!".encode("UTF-8")
b'Hello, world!'
>>> "Hello, world!".encode("UTF-32")
b'xffxfex00x00Hx00x00x00ex00x00x00lx00x00x00lx00x00x00ox00x00x00,x00x00x00 x00x00x00wx00x00x00ox00x00x00rx00x00x00lx00x00x00dx00x00x00!x00x00x00'

As you can see, the first two are equivalent, while the last one is quite a bit longer. Here’s another example:

>>> len("How long is this?".encode("UTF-8"))
17
>>> len("How long is this?".encode("UTF-32"))
72

The difference between ASCII and UTF-8 appears once we use some slightly more exotic characters:

>>> "Hællå, wørld!".encode("ASCII")
Traceback (most recent call last):
  ...
UnicodeEncodeError: 'ascii' codec can't encode character 'xe6' in position 1: ordinal not in range(128)

The Scandinavian letters here have no encoding in ASCII. If we really need ASCII encoding (which can certainly happen), we can supply another argument to encode, telling it what to do with errors. The normal mode here is 'strict', but there are others you can use to ignore or replace the offending characters.

>>> "Hællå, wørld!".encode("ASCII", "ignore")
b'Hll, wrld!'
>>> "Hællå, wørld!".encode("ASCII", "replace")
b'H?ll?, w?rld!'
>>> "Hællå, wørld!".encode("ASCII", "backslashreplace")
b'H\xe6ll\xe5, w\xf8rld!'
>>> "Hællå, wørld!".encode("ASCII", "xmlcharrefreplace")
b'Hællå, wørld!'

In almost all cases, though, you’ll be better off using UTF-8, which is in fact even the default encoding.

>>> "Hællå, wørld!".encode()
b'Hxc3xa6llxc3xa5, wxc3xb8rld!'

This is slightly longer than for the "Hello, world!" example, whereas the UTF-32 encoding would be of exactly the same length in both cases.

Just like strings can be encoded into bytes, bytes can be decoded into strings.

>>> b'Hxc3xa6llxc3xa5, wxc3xb8rld!'.decode()
'Hællå, wørld!'

As before, the default encoding is UTF-8. We can specify a different encoding, but if we use the wrong one, we’ll either get an error message or end up with a garbled string. The bytes object itself doesn’t know about encoding, so it’s your responsibility to keep track of which one you’ve used.

Rather than using the encode and decode methods, you might want to simply construct the bytesand str (i.e., string) objects, as follows:

>>> bytes("Hællå, wørld!", encoding="utf-8")
b'Hxc3xa6llxc3xa5, wxc3xb8rld!'
>>> str(b'Hxc3xa6llxc3xa5, wxc3xb8rld!', encoding="utf-8")
'Hællå, wørld!'

Using this approach is a bit more general and works better if you don’t know exactly the class of the string-like or bytes-like objects you’re working with—and as a general rule, you shouldn’t be too strict about that.

One of the most important uses for encoding and decoding is when storing text in files on disk. However, Python’s mechanisms for reading and writing files normally do the work for you! As long as you’re okay with having your files in UTF-8 encoding, you don’t really need to worry about it. But if you end up seeing gibberish where you expected text, perhaps the file was actually in some other encoding, and then it can be useful to know a bit about what’s going on. If you’d like to know more about Unicode in Python, check out the HOWTO on the subject.12

Note

Your source code is also encoded, and the default there is UTF-8 as well. If you want to use some other encoding (for example, if your text editor insists on saving as something other than UTF-8), you can specify the encoding with a special comment.

# -*- coding: encoding name -*-

Replace encoding name with whatever encoding you’re using (uppercase or lowercase), such as utf-8 or, perhaps more likely, latin-1, for example.

Finally, we have bytearray, a mutable version of bytes. In a sense, it’s like a string where you can modify the characters—which you can’t do with a normal string. However, it’s really designed more to be used behind the scenes and isn’t exactly user-friendly if used as a string-alike. For example, to replace a character, you have to assign an int in the range 0…255 to it. So if you want to actually insert a character, you have to get its ordinal value, using ord.

>>> x = bytearray(b"Hello!")
>>> x[1] = ord(b"u")
>>> x
bytearray(b'Hullo!')

A Quick Summary

This chapter covered quite a bit of material. Let’s take a look at what you’ve learned before moving on.

  • Algorithms: An algorithm is a recipe telling you exactly how to perform a task. When you program a computer, you are essentially describing an algorithm in a language the computer can understand, such as Python. Such a machine-friendly description is called a program, and it mainly consists of expressions and statements.

  • Expressions: An expression is a part of a computer program that represents a value. For example, 2 + 2 is an expression, representing the value 4. Simple expressions are built from literal values (such as 2 or "Hello") by using operators (such as + or %) and functions (such as pow). More complicated expressions can be created by combining simpler expressions (e.g., (2 + 2) * (3 - 1)). Expressions may also contain variables.

  • Variables: A variable is a name that represents a value. New values may be assigned to variables through assignments such as x = 2. An assignment is a kind of statement.

  • Statements: A statement is an instruction that tells the computer to do something. That may involve changing variables (through assignments), printing things to the screen (such as print("Hello, world!")), importing modules, or doing a host of other stuff.

  • Functions: Functions in Python work just like functions in mathematics: they may take some arguments, and they return a result. (They may actually do lots of interesting stuff before returning, as you will find out when you learn to write your own functions in Chapter 6.)

  • Modules: Modules are extensions that can be imported into Python to extend its capabilities. For example, several useful mathematical functions are available in the math module.

  • Programs: You have looked at the practicalities of writing, saving, and running Python programs.

  • Strings: Strings are really simple—they are just pieces of text, with characters represented as Unicode code points. And yet there is a lot to know about them. In this chapter, you’ve seen many ways to write them, and in Chapter 3 you learn many ways of using them.

New Functions in This Chapter

Functions

Description

abs(number)

Returns the absolute value of a number.

bytes(string, encoding[, errors])

Encodes a given string, with the specified behavior for errors.

cmath.sqrt(number)

Returns the square root; works with negative numbers.

float(object)

Converts a string or number to a floating-point number.

help([object])

Offers interactive help.

input(prompt)

Gets input from the user as a string.

int(object)

Converts a string or number to an integer.

math.ceil(number)

Returns the ceiling of a number as a float.

math.floor(number)

Returns the floor of a number as a float.

math.sqrt(number)

Returns the square root; doesn’t work with negative numbers.

pow(x, y[, z])

Returns x to the power of y (modulo z).

print(object, ...)

Prints out the arguments, separated by spaces.

repr(object)

Returns a string representation of a value.

round(number[, ndigits])

Rounds a number to a given precision, with ties rounded to the even number.

str(object)

Converts a value to a string. If converting from bytes, you may specify encoding and error behavior.

Arguments given in square brackets are optional.

What Now?

Now that you know the basics of expressions, let’s move on to something a bit more advanced: data structures. Instead of dealing with simple values (such as numbers), you’ll see how to bunch them together in more complex structures, such as lists and dictionaries. In addition, you’ll take another close look at strings. In Chapter 5, you learn more about statements, and after that you’ll be ready to write some really nifty programs.

Footnotes

1 Hacking is not the same as cracking, which is a term describing computer crime. The two are often confused, and the usage is gradually changing. Hacking, as I’m using it here, basically means “having fun while programming.”

2 After all, no one expects the Spanish Inquisition . . .

3 The slightly less simple story is that the rules for identifier names are in part based on the Unicode standard, as documented in the Python Language Reference at https://docs.python.org/3/reference/lexical_analysis.html .

4 In case you’re wondering—yes, it does do something. It calculates the product of 2 and 2. However, the result isn’t kept anywhere or shown to the user; it has no side effects, beyond the calculation itself.

5 Note the quotes around storage. Values aren’t stored in variables—they’re stored in some murky depths of computer memory and are referred to by variables. As will become abundantly clear as you read on, more than one variable can refer to the same value.

6 Function calls can also be used as statements if you simply ignore the return value.

7 If you don’t understand this sentence, you should perhaps skip the section. You don’t really need it.

8 This behavior depends on your operating system and the installed Python interpreter. If you’ve saved the file using IDLE in macOS, for example, double-clicking the file will simply open it in the IDLE code editor.

9 Actually, str is a class, just like int. repr, however, is a function.

10 Raw strings can be especially useful when writing regular expressions. You learn more about those in Chapter 10.

11 This is an important method of compression in general, used for example in Huffman coding, a component of several modern compression tools.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset