Hour 5
Data Processing with Numbers and Words

Without an in-depth knowledge of characters, strings, and numbers, you won’t be able to understand any programming language. This doesn’t mean that you must be a math whiz or an English major. You must, however, understand how languages such as Python represent and work with such data.

In this hour, you will learn more about how Python performs its mathematical calculations. In addition, you will learn how Python represents and works with strings. Along the way, you will discover functions and powerful built-in routines that perform many programming chores for you.

The highlights of this hour include the following:

Image Merging strings together

Image Understanding internal representation of strings

Image Mastering the programming operators

Image Digging into Python’s functions

Strings Revisited

Like most other programming languages, Python has excelled in supporting strings. Hardly any programming language before BASIC was created (in the 1960s) supported string data except in a rudimentary form. BASIC was created for beginners, but it offered advanced support for strings.

Even since the first BASIC language, programming languages have been slow to adopt string support beyond a rudimentary level. C++, one of the most used programming languages in the past 50 years, did not directly support strings. To work with strings in C++ (and its earlier rendition, C), programmers had to do some fancy footwork to get around the fact that the fundamental string data type was not supported.

Merging Strings

You can merge, or concatenate (programmers rarely use a simple word when a longer word will confuse more people), two strings together simply by placing a plus sign between them.

The ASCII Table

To fully understand strings, you must understand how your computer represents characters internally. A concept that will come into major play while you program in any language is the ASCII (pronounced ask-ee) table. Although the reason for the ASCII table and how to take advantage of your knowledge of the ASCII table will increase as you hone your programming skills, you should take a few moments for an introduction to this important character-based table.

Years ago, somebody wrote all the various combinations of 8 binary 1s and 0s, from 00000000 to 11111111, and assigned a unique character to each one. The table of characters was standardized and is known today as the ASCII table. Table 5.1 shows a partial listing of the ASCII table, which contains 256 entries in total. ASCII stands for American Standard Code for Information Interchange. Some ASCII tables use only the last 7 bits (called the 7-bit ASCII table) and keep the far-left bit off. As you might surmise, a bit is a 1 or a 0. Eight bits form a byte, which is a character in computer terminology—due to the ASCII table. 7-bit ASCII tables cannot represent as many different characters as can today’s 8-bit ASCII tables.

TABLE 5.1 Every possible character has its own unique ASCII value

Character ASCII Code Decimal Equivalent
Space 00100000 32
0 00110000 48
1 00110001 49
2 00110010 50
3 00110011 51
9 00111001 57
? 00111111 63
A 01000001 65
B 01000010 66
C 01000011 67
A 01100001 97
B 01100010 98

Each ASCII value has a corresponding decimal number associated with it. These values are shown at the right of the 8-bit values in Table 5.1. Therefore, even though the computer represents the character ? as 00111111 (two off switches and six on switches), you can refer, through programming, to that ASCII value as 63, and your computer will know you mean 00111111. One of the advantages of high-level programming languages is that many of them let you use the easier (for people) decimal values, and the programming language converts the value to the 8-bit binary value used inside the computer.

As you can tell from the ASCII values in Table 5.1, every character in the computer, both uppercase and lowercase letters, and even the space, has its own unique ASCII value. The unique ASCII code is the only way that the computer has to differentiate characters.

Every microcomputer uses the ASCII table. Large mainframe computers use a similar table called the EBCDIC (pronounced eb-se-dik) table. EBCDIC stands for Extended Binary Coded Decimal Interchange Code. The ASCII table is the fundamental storage representation of all data and programs that your computer manages. A coding scheme called Unicode is now a popular alternative to ASCII. Unicode spans far more than the 256-character limit of the ASCII or EBCIDIC tables, taking into account languages such as the Japanese-based kanji, and other languages that require numerous characters. Unicode assigns hundreds of numbers to hundreds and even thousands of possible characters, depending on the language being represented.

When you press the letter A on a keyboard, that A is not stored in your computer; rather, the ASCII value of the letter A is stored. As you can see from the ASCII values in Table 5.1, the letter A is represented as 01000001; this means that all eight switches except two are off in every byte of memory that holds a letter A.

Tip

The ASCII table is not very different from another type of coded table you may have heard of. Morse code is a table of representations for letters of the alphabet. Instead of 1s and 0s, the code uses combinations of dashes and dots to represent characters.

As Figure 5.2 shows, when you press the letter A on your keyboard, the A does not go into memory, but the ASCII value 01000001 does. The computer keeps that pattern of on and off switches in that memory location as long as the A is to remain there. As far as the user is concerned, the A is in memory as the letter A, but now you know exactly what happens.

images

FIGURE 5.2
The letter A is not an A after it leaves the keyboard.

As Figure 5.2 illustrates, when you print a document from your word processor, the computer prints each “character” stored in that memory location; the computer’s CPU sends the ASCII code for the letter A to the printer. Just before printing, the printer knows that it must make its output readable to people, so it looks up 01000001 in its own ASCII table and prints the A to paper. From the time the A left the keyboard until right before it printed, it was not an A at all but just a combination of eight 1s and 0s representing an A.

Performing Math with Python

Python performs mathematical calculations in the same way as most other programming languages. Therefore, when you understand the way Python calculates, you’ll understand how virtually every other computer language calculates. Table 5.2 lists the Python math operators with which you should familiarize yourself.

TABLE 5.2 Python math operators are simple

Operator Description
() Groups expressions together
** Exponent
*, /, % Multiplication, division, and modulus
+, - Addition and subtraction

The order of the operators in Table 5.2 is important. If more than one of these operators appears in an expression, Python doesn’t always calculate the values in a left-to-right order. In other words, the expression:

v = 5 + 2 * 3

stores the value 11 in v, not 21, as you might first guess. Python doesn’t perform calculations in a left-to-right order but rather in the order given in Table 5.2. Because multiplication appears before addition in the table, Python computes the 2 * 3 first, resulting in 6; it then computes the answer to 5 + 6 to get the result 11.

Tip

The order in which operators are evaluated is often called operator precedence. Just about every programming language computes expressions based on a precedence table. Different programming languages might use different operators from the ones shown in Table 5.2, although almost all of them use parentheses and the primary math operators (*, /, +, and -) in the same way as Python does.

Parentheses have the highest operator precedence. Any expression enclosed in parentheses is calculated before any other part of the expression. The statement:

v = (5 + 2) * 3

does assign the value 21 to v because the parentheses force the addition of 5 and 2 before its sum, 7, is multiplied by 3.

Python also has an exponentiation operator to raise a number to a particular power, although some programming languages, such as JavaScript, do not have an exponent operator and instead use a built-in function. Instead you can use ** in Python. If you use a ** b in a Python expression, a is the number multiplied by itself b times. For example, if you wanted to know 4 × 4 × 4, sometimes written as 4^3, you would use 4 ** 3 in Python, which would return the answer 64.

You can also raise a number to a fractional power with this operator. For example, the statement:

x = 81 ** 0.5

raises 81 to the one-half power, in effect taking the square root of 81. If this math is getting deep, have no fear; some people program in Python for years and never need to raise a number to a fractional power. But if you need to, you can thank Python for doing the heavy lifting for you.

The forward slash (/) divides one number into another. For example, the statement:

d = 3 / 2 ;

assigns 1.5 to d.

How Computers Really Do Math

At their lowest level, computers cannot subtract, multiply, or divide. Neither can calculators. The world’s largest and fastest supercomputers can only add—that’s it. They perform the addition at the bit level. Binary arithmetic is the only means by which any electronic digital-computing machine can perform arithmetic.

The computer makes you think it can perform all sorts of fancy calculations because it is lightning fast. The computer can only add, but it can do so very quickly.

Suppose you want a computer to add together seven 6s. If you ask the computer (through programming) to perform the calculation:

6 + 6 + 6 + 6 + 6 + 6 + 6

the computer returns the answer 42 immediately. The computer has no problem performing addition. The problems arise when you request that the computer perform another type of calculation, such as this one:

42 – 6 – 6 – 6 – 6 – 6 – 6 – 6

Because the computer can only add, it cannot do the subtraction. However (and this is where the catch comes in), the computer can negate numbers. That is, the computer can take the negative of a number. It can take the negative of 6 and represent (at the bit level) negative 6. After it has done that, it can add –6 to 42 and continue doing so seven times. In effect, the internal calculation becomes this:

42 + (–6) + (–6) + (–6) + (–6) + (–6) + (–6) + (–6)

Adding seven –6s produces the correct result, 0. In reality, the computer is not subtracting. At its bit level, the computer can convert a number to its negative through a process known as two’s complement. The two’s complement of a number is the negative of the number’s original value at the bit level. The computer has in its internal logic circuits the ability to rapidly convert a number to its two’s complement and then carry out the addition of negatives, thereby seemingly performing subtraction.

Note

Here’s another kind of two’s complement: That’s a very fine two you have there.

Because the computer can add and simulate subtraction (through successive addition of negatives), it can simulate multiplying and dividing. To multiply 6 by 7, the computer actually adds 6 together seven times and produces 42. Therefore:

6 * 7

becomes this:

6 + 6 + 6 + 6 + 6 + 6 + 6

To divide 42 by 7, the computer subtracts 7 from 42 (well, it adds the negative of 7 to 42) until it reaches 0 and counts the number of times (6) it took to reach 0, like this:

42 + (–7) + (–7) + (–7) + (–7) + (–7) + (–7)

The computer represents numbers in a manner similar to characters. As Table 5.3 shows, numbers are easy to represent at the binary level. After numbers reach a certain limit (256 to be exact), the computer uses more than 1 byte to represent the number, taking as many memory locations as it needs to represent the extent of the number. The computer, after it is taught to add, subtract, multiply, and divide, can then perform any math necessary as long as a program is supplied to direct it.

TABLE 5.3 The first 20 numbers can be represented in their binary equivalents

Number Binary Equivalent
0 00000000
1 00000001
2 00000010
3 00000011
4 00000100
5 00000101
6 00000110
7 00000111
8 00001000
9 00001001
10 00001010
11 00001011
12 00001100
13 00001101
14 00001110
15 00001111
16 00010000
17 00010001
18 00010010
19 00010011
20 00010100

Note

The first 255 binary numbers overlap the ASCII table values. That is, the binary representation for the letter A is 01000001, and the binary number for 65 is also 01000001. The computer knows by the context of how your programs use the memory location whether the value is the letter A or the number 65.

To see an example of what goes on at the bit level, follow this example to see what happens when you ask the computer to subtract 65 from 65. The result should be 0, and as you can see from the following steps, that is exactly the result at the binary level:

  1. Suppose you want the computer to calculate the following:

     65
    –65
  2. The binary representation for 65 is 01000001, and the two’s complement for 65 is 10111111 (which is –65 in computerese). Therefore, you are requesting that the computer perform this calculation:

      01000001
     +10111111
  3. Because a binary number cannot have the digit 2 (there are only 0s and 1s in binary), the computer carries 1 any time a calculation results in a value of 2; 1 + 1 equals 10 in binary. Although this can be confusing, you can make an analogy with decimal arithmetic. People work in a base-10 numbering system. Binary is known as base-2. There is no single digit to represent 10; we have to reuse two digits already used to form 10, namely 1 and 0. In base 10, 9 + 1 is 10. Therefore, the result of 1 + 1 in binary is 10, or “0 and carry 1 to the next column.” The next column has a 0 and 1, which would add up to one, but you need to include the 1 carried over to the next column, so you are actually adding 1+0+1, which again is 10. So, you would place a zero in that column, and carry the 1 over to the next column. Continue this until you have added all 8 columns. (as you can see in the answer below).

      01000001
    +10111111
    100000000
  4. Because the answer should fit within the same number of bits as the two original numbers (at least for this example—your computer may use more bits to represent numbers), the ninth bit is discarded, leaving the 0 result. This example shows that binary 65 plus binary negative 65 equals 0, as it should.

Tip

While many programming languages use the ASCII table (which you can see at http://www.AsciiTable.com/), Python uses the Unicode table (which you can see at https://en.wikipedia.org/wiki/List_of_Unicode_characters). The difference between the two is covered in the next section.

Using Unicode Characters

The ASCII and Unicode tables have a number of characters in common—specifically the first 128 characters, which correspond to available English letters, both lowercase and uppercase, as well as the 10 single digits and common characters and punctuation marks. For instance, the capital letter A is number 65. The lowercase a is 97. Since you can type letters, numbers, and some special characters on your keyboard, the ASCII table is not needed much for these. However, you cannot use the keyboard to type the Spanish Ñ or the cent sign (¢), under normal circumstances. For this reason, programmers soon expanded the ASCII table to an additional 128 characters—the extended ASCII table. As mentioned earlier this hour, many programming languages allow you to print these characters with built-in methods and functions. Python also has a built-in function—the chr() function—to print these characters, but it uses Unicode instead of ASCII. What’s the difference between the two? With Unicode, you are not limited to 256 characters, which means you can print significantly more characters, which is great as you can support more languages, as well as additional math and scientific symbols. (Again, see the website above for the possibilities). You will probably never use the bulk of these characters, but it’s nice to have the option available.

Why talk about the ASCII table when Python uses Unicode? Because while the first 128 characters are similar (for example, lowercase a is character 97 in both), there are differences in the extended characters (for example, a lowercase n with a tilde over it is 164 in the extended ASCII table, but 164 is a currency symbol in Unicode, and the lowercase n with a tilde over it is 241 in Unicode).

Back to the chr() function. To use the function, you use this format:

chr(A number)

You would then either assign the result to a variable or print it directly. The following print() statement prints a capital N with a tilde over it, followed by a lowercase n, also with a tilde, and then the upside-down question mark that begins questions in Spanish:

print(chr(209),chr(241),chr(191))

These characters would come in handy if you decided to translate your website to Spanish. Here is the output:

Ñ ñ ¿

The first 31 ASCII and Unicode codes represent nonprinting characters. Nonprinting characters cause an action to be performed instead of producing characters.

Overview of Functions

Functions are built-in routines that manipulate numbers, strings, and output. You have already used several Python functions, including print(), input(), and chr(), but you will soon see that Python has a number of other valuable functions. You can accomplish some pretty cool goals with string functions, but before you do, you need to understand a bit about arrays, as they will help you better to use Python’s available functions.

Tip

Other programming languages have built-in methods instead of functions. There are some technical differences between the two, but don’t worry about that at this point.

Understanding Arrays

Just as words and sentences are collections of single letters, strings are collections of characters, including words, spaces, letters, and any other special characters you choose to type or add using the ASCII functions discussed earlier in this lesson. When you assign a string to a variable, such as:

hometown = "Salt Lake City, Utah"

Python considers this an array of characters and lets you access either the whole name or individual elements of that name. If you write a print() statement and send hometown to it:

print(hometown)

as you can probably guess, your output will be:

Salt Lake City, Utah

What is probably not obvious is what would happen if you wrote a print() statement and sent hometown[3] to it, as in this example:

print(hometown[3])

If you did this, you would get a single letter in the dialog box:

t

Now you may have figured out that the number in brackets would refer to a specific letter or character in the string. (Good job if you did!) But even then, you might think, why not the letter l? After all, it is the third character in the name of the city. While that is true, Python and most other programming languages start their counting with 0 instead of 1. So the S is hometown[0], the a is hometown[1], and the t is hometown[3]. You might be wondering what character corresponds to hometown[4]. It is the space, and the L is hometown[5]. When figuring arrays, Python counts every character, including spaces. In this hometown variable, spaces are the 4, 9, and 15 spots in the array, and the comma is the 14.

This might seem confusing, but understanding arrays is important for almost every programming language. (It is more important with some than with others; C, for example, does not have a string data type, so to work with strings in C, you need to create an array of characters.) For now, just know that every single character in a string corresponds to a specific number and that if you’re counting characters to figure it out, you need to start with 0 and include all spaces, characters, letters, and numbers.

The next sections cover some of Python’s string and math functions.

String Functions

Once you’ve defined a string, you can access some built-in Python functions to alter your strings. This section does not cover all the string functions, just some of the more interesting ones. Feel free to explore online or use a Python tutorial to cover some of the others.

Making Strings All Capital or All Lowercase Letters

Two functions can be used to change the case of all the letters in a string. Any non-letter characters are just ignored. The two methods, lower() and upper(), create a new string and leave the original string untouched. Taking the previous example, if the variable hometown is equal to "Salt Lake City, Utah", the following two calls:

newtown1 = hometown.lower();
newtown2 = hometown.upper();

will result in newtown1 holding the string "salt lake city, utah" and newtown2 holding the string "SALT LAKE CITY, UTAH". The original string remains the same.

If you want to change the case of the original string permanently, just set the new string equal to the old name. For example, the statement:

hometown = hometown.upper()

permanently changes "Salt Lake City, Utah" to "SALT LAKE CITY, UTAH".

The string methods discussed in this section must always be called with a period (.) between the name of the string and the specific method.

Replacing Part of a String

Another valuable string function is one that lets you replace part of a string. Say the good folks of Utah decide to be more health-conscious and rename their capital Blue Lake City. You can use the replace() method to create a new string that switches words in a string. You can either make a new string, as demonstrated in the previous section, or overwrite your existing string. If hometown is still equal to "Salt Lake City, Utah", the code line:

hometown = hometown.replace("Salt", "Blue");

results in hometown now being equal to "Blue Lake City, Utah". You don’t have to replace entire words, either. For example:

hometown = hometown.replace("lt", "lty");

results in hometown now being equal to "Salty Lake City, Utah". One additional note about replace. It changes every instance it finds of the substring you were trying to replace. So if your hometown is "New York, New York", calling:

hometown = hometown.replace("New", "Old");

results in hometown being equal to "Old York, Old York". If you want to change New only in the city, and not in the state, you wouldn’t want to call replace. These methods are powerful and useful, but you should experiment with them to ensure that you get the intended results.

Other valuable string functions include len, which returns the number of characters in a string (so with len(hometown), if hometown is New York, New York, the function returns 18); index, which finds where a specific value occurs in a string for the first time; split, which breaks a string into several smaller strings (which is useful for breaking a sentence at spaces, so you get each individual word); and swapcase, which makes the capital letters lowercase and vice versa (for example, if you provided "Salt Lake City, Utah", the method would return "sALT lAKE cITY, uTAH" as the new string). Some of these may seem to have little value, but as you develop scripts and programs, you’ll find them more useful than you’d initially guess.

Numeric Functions

Several of the functions supplied by Python perform math routines, and this section introduces you to some of them. However, to run these functions, you need to first add a line of code to the programs that plan to use them:

import math

When Python sees the import statement, it looks for a module with the name included after the import. Then the collection of functions and methods in that module are available for you to use in your program. You might be wondering why you need to perform this step. Why aren’t the modules already included? It’s a way to make your programs run faster; if you are not going to use those math functions, then they shouldn’t be loaded into memory. Programmers are always looking to make their code run as fast as possible, and shaving time by not including too many built-in functions (or overhead) helps. You might not think it matters in the small programs you are writing, but when a robust program uses millions of lines of code (which some programs do), every shortcut can help.

Once you have imported the math module, you can run the functions included by typing math.NameOfTheFunction. One common numeric function is the math.floor() method. math.floor() returns the integer whole value of the numbers you put in the parentheses. If you put a decimal number inside the parentheses, math.floor() converts it to an integer. For example,

print(math.floor(8.93)

prints an 8 (the method’s return value) on the screen. math.floor() returns a value that is equal to or less than the argument in the parentheses. It does not round numbers up. If you would like to round off to the nearest integer, use the round() method. (This is going to seem a little confusing, but round() is called just as round; you do not call math.round().) If you were to create the following statement:

print(round(8.51));

you would get 9 as an answer.

Tip

With all math functions, you can use a variable or an expression as the function argument.

math.floor() and round() work for negative arguments as well. The following line of code:

print(math.floor(-7.6));

prints -8. This might surprise you until you learn that the complete definition of math.floor(). It returns the highest integer that is less than or equal to the argument in parentheses. The highest integer less than or equal to –7.6 is –8.

You don’t have to be an expert in math to use many of the mathematical functions available with Python. Often, even in business applications, the following method, which is not in the math module, comes in handy:

abs(numeric value)

The abs() method, which is the absolute value method, can be used in many programs. abs() returns the absolute value of its argument. The absolute value of a number is simply the positive representation of a positive or negative number. Whatever argument you pass to abs(), its positive value is returned. For example, the line of code:

print(abs(-5), abs(-5.75), abs(0), abs(5.7))

produces the following output:

5 5.75 0 5.7

The math module includes a lot of advanced math methods, including the following:

math.atan(numeric value);
math.cos(numeric value);
math.sin(numeric value);
math.tan(numeric value);
math.exp(numeric value);
math.log(numeric value);

These are probably some of the least-used functions in Python. Scientific and mathematical programmers are the ones most likely to need them. Thank goodness Python supplies these functions, so programmers don’t have to write their own versions.

The math.atan() function returns the arctangent of the argument in radians. The argument is assumed to be an expression representing an angle of a right triangle. If you’re familiar with trigonometry (and who isn’t, right?), you might know that the result of an arctangent calculation always falls between –π/2 and +π/2, and the math.atan() function requires that its argument also fall within this range. math.cos() always returns the cosine of the angle of the argument, expressed in radians. math.sin() returns the sine of the angle of the argument, expressed in radians. math.tan() returns the tangent of the angle of the argument, expressed in radians.

Tip

If you need to pass an angle expressed in degrees to these functions, convert the angle to radians by multiplying it by (π /180). (π is approximately 3.141592654.)

If you understand these trigonometric functions, you should have no trouble with math.exp() and math.log(). You use them the same way. If you do not understand these mathematical functions, that’s okay. Some people program in Python for years and never need them.

math.exp() returns the base of the natural logarithm (e) raised to a specified power. The argument to math.exp() can be any constant, variable, or expression less than or equal to 88.02969. e is the mathematical expression for the value 2.718282. The following program shows some math.exp() statements:

print(math.exp(1))
print(math.exp(2))
print(math.exp(3))
print(math.exp(4))
print(math.exp(5))

Here is the output produced by these five print() statements:

2.718281828459045
7.38905609893065
20.085536923187668
54.598150033144236
148.4131591025766

Notice the first number; e raised to the first power does indeed equal itself.

math.log() returns the natural logarithm of the argument. The argument to math.log() can be any positive constant, variable, or expression. The following program line demonstrates the math.log() function in use:

print(math.log(3))

Here is the output:

1.0986122886681098

Summary

You now understand how Python calculations work. By utilizing the math operators and by understanding the math hierarchy, you will know how to compose your own calculations.

By understanding the ASCII and Unicode tables, you not only better understand how computers represent characters internally but also understand how to access those Unicode values by using the chr() method. Many of Python’s methods are universal; similar functions exist in many languages that you’ll learn throughout your programming career.

Q&A

Q. In Python, do I have to know all the values that I will assign to variables when I write a program?

A. Data comes from many sources. You will know some of the values that you can assign when you write your programs, but much of your program data will come from users or from data files.

Q. What kinds of data can variables hold?

A. Variables can hold many kinds of data, such as numbers, characters, and character strings. As you learn more about programming, you will see that numeric data comes in all formats and that to master a programming language well, you must also master the kinds of numeric data that are available. As you advance in your Python studies, you will see that Python can also create collections of data known as lists, tuples, and dictionaries.

Workshop

The quiz questions are provided for your further understanding.

Quiz

1. What is the result of the following expression?

(1 + 2) * 4 / 2

2. What is the result of the following expression?

(1 + (10 - (2 + 2)))

3. What is a function?

4. What is the output from the following print() statement?

print(math.floor(-5.6))

5. What is the output from the following print() statement?

print(abs (-2.1) + math.floor(45.1))

6. Write a print statement call that replaces the “Beginning Programming” in “Teach Yourself Beginning Programming” with “Basic Python.”

7. Name three of the trigonometric functions mentioned in this hour.

8. What is the difference between math.round() and math.floor()?

9. What does the following statement print?

print(chr(65),chr(67),chr(69))

Answers

1. 6

2. 7

3. A function is a section of code designed to perform a specific task.

4. -6

5. 47.1

6.

BookName = 'Teach Yourself Beginning Programming'
print(BookName.replace('Beginning Programming','Basic Python'))

7. Three trigonometric functions are sin(), cos(), and tan().

8. math.floor()returns the integer value of a number passed to it. math.round()returns the integer value of the number if the decimal portion is less than .5 or the next highest integer if the decimal portion is .5 or higher.

9. ACE

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset