I.3 Some Fundamental Mathematical Definitions


The concepts discussed in this article occur throughout so much of modern mathematics that it would be inappropriate to discuss them in part III—they are too basic. Many later articles will assume at least some acquaintance with these concepts, so if you have not met them, then reading this article will help you to understand significantly more of the book.

1 The Main Number Systems

Almost always, the first mathematical concept that a child is exposed to is the idea of numbers, and numbers retain a central place in mathematics at all levels. However, it is not as easy as one might think to say what the word “number” means: the more mathematics one learns, the more uses of this word one comes to know, and the more sophisticated one’s concept of number becomes. This individual development parallels a historical development that took many centuries (see FROM NUMBERS TO NUMBER SYSTEMS [II.1]).

The modern view of numbers is that they are best regarded not individually but as parts of larger wholes, called number systems; the distinguishing features of number systems are the arithmetical operations—such as addition, multiplication, subtraction, division, and extraction of roots—that can be performed on them. This view of numbers is very fruitful and provides a springboard into abstract algebra. The rest of this section gives a brief description of the five main number systems.

1.1 The Natural Numbers

The natural numbers, otherwise known as the positive integers, are the numbers familiar even to young children: 1, 2, 3, 4, and so on. It is the natural numbers that we use for the very basic mathematical purpose of counting. The set of all natural numbers is usually denoted Image. (Some mathematicians prefer to include 0 as a natural number as well: for instance, this is the usual convention in logic and set theory. Both conventions are to be found in this book, but it should always be clear which one is being used.)

Of course, the phrase “1, 2, 3, 4, and so on” does not constitute a formal definition, but it does suggest the following basic picture of the natural numbers, one that we tend to take for granted.

(i) Given any natural number n there is another, n+ 1, that comes next—known as the successor of n.

(ii) A list that starts with 1 and follows each number by its successor will include every natural number exactly once and nothing else.

This picture is encapsulated by THE PEANO AXIOMS [III.67].

Given two natural numbers m and n one can add them together or multiply them, obtaining in each case a new natural number. By contrast, subtraction and division are not always possible. If we want to give meaning to expressions such as 8 - 13 or Image, then we must work in a larger number system.

1.2 The Integers

The natural numbers are not the only whole numbers, since they do not include zero or negative numbers, both of which are indispensable to mathematics. One of the first reasons for introducing zero was that it is needed for the normal decimal notation of positive integers—how else could one conveniently write 1005? However, it is now thought of as much more than just a convenience, and the property that makes it significant is that it is an additive identity, which means that adding zero to any number leaves that number unchanged. And while it is not particularly interesting to do to a number something that has no effect, the property itself is interesting and distinguishes zero from all other numbers. An immediate illustration of this is that it allows us to think about negative numbers: if n is a positive integer, then the defining property of -n is that when you add it to n you get zero.

Somebody with little mathematical experience may unthinkingly assume that numbers are for counting and find negative numbers objectionable because the answer to a question beginning “How many” is never negative. However, simple counting is not the only use for numbers, and there are many situations that are naturally modeled by a number system that includes both positive and negative numbers. For example, negative numbers are sometimes used for the amount of money in a bank account, for temperature (in degrees Celsius or Fahrenheit), and for altitude compared with sea level.

The set of all integers—positive, negative, and zero—is usually denoted Image (for the German word “Zahlen,” meaning “numbers”). Within this system, subtraction is always possible: that is, if m and n are integers, then so is m - n.

1.3 The Rational Numbers

So far we have considered only whole numbers. If we form all possible fractions as well, then we obtain the rational numbers. The set of all rational numbers is denoted Image (for “quotients”).

One of the main uses of numbers besides counting is measurement, and most quantities that we measure are ones that can vary continuously, such as length, weight, temperature, and velocity. For these, whole numbers are inadequate.

A more theoretical justification for the rational numbers is that they form a number system in which division is always possible—except by zero. This fact, together with some basic properties of the arithmetical operations, means that Image is a field. What fields are and why they are important will be explained in more detail later (section 2.2).

1.4 The Real Numbers

A famous discovery of the ancient Greeks, often attributed, despite very inadequate evidence, to the school of PYTHAGORAS [VI.1], was that the square root of 2 is not a rational number. That is, there is no fraction p/q such that (p/q)2 = 2. The Pythagorean theorem about right-angled triangles (which was probably known at least a thousand years before Pythagoras) tells us that if a square has sides of length 1, then the length of its diagonal is Image. Consequently, there are lengths that cannot be measured by rational numbers.

This argument seems to give strong practical reasons for extending our number system still further. However, such a conclusion can be resisted: after all, we cannot make any measurements with infinite precision, so in practice we round off to a certain number of decimal places, and as soon as we have done so we have presented our measurement as a rational number. (This point is discussed more fully in NUMERICAL ANALYSIS [IV.21].)

Nevertheless, the theoretical arguments for going beyond the rational numbers are irresistible. If we want to solve polynomial equations, take LOGARITHMS [III.25 §4], do trigonometry, or work with the GAUSSIAN DISTRIBUTION [III.71 §5], to give just four examples from an almost endless list, then irrational numbers will appear everywhere we look. They are not used directly for the purposes of measurement, but they are needed if we want to reason theoretically about the physical world by describing it mathematically. This necessarily involves a certain amount of idealization: it is far more convenient to say that the length of the diagonal of a unit square is Image than it is to talk about what would be observed, and with what degree of certainty, if one tried to measure this length as accurately as possible.

The real numbers can be thought of as the set of all numbers with a finite or infinite decimal expansion. In the latter case, they are defined not directly but by a process of successive approximation. For example, the squares of the numbers 1, 1.4, 1.41, 1.414, 1.4142, 1.41421, . . . , get as close as you like to 2, if you go far enough along the sequence, which is what we mean by saying that the square root of 2 is the infinite decimal 1.41421. . . .

The set of all real numbers is denoted Image. A more abstract view of Image is that it is an extension of the rational number system to a larger field, and in fact the only one possible in which processes of the above kind always give rise to numbers that themselves belong to Image.

Because real numbers are intimately connected with the idea of limits (of successive approximations), a true appreciation of the real number system depends on an understanding of mathematical analysis, which will be discussed in section 5.

1.5 The Complex Numbers

Many polynomial equations, such as the equation x2 = 2, do not have rational solutions but can be solved in Image. However, there are many other equations that cannot be solved even in Image. The simplest example is the equation x2 = - 1, which has no real solution since the square of any real number is positive or zero. In order to get around this problem, mathematicians introduce a symbol, i, which they treat as a number, and they simply stipulate that i2 is to be regarded as equal to -1. The complex number system, denoted Image, is the set of all numbers of the form a + bi, where a and b are real numbers. To add or multiply complex numbers, one treats i as a variable (like x, say), but any occurrences of i2 are replaced by -1. Thus,

(a + bi) + (c + di) = (a + c) + (b + d)i

and

(a + bi)(c + di) = ac + bci + adi + bdi2

                             = (ac - bd) + (bc + ad)i.

There are several remarkable points to note about this definition. First, despite its apparently artificial nature, it does not lead to any inconsistency. Secondly, although complex numbers do not directly count or measure anything, they are immensely useful. Thirdly, and perhaps most surprisingly, even though the number i was introduced to help us solve just one equation, it in fact allows us to solve all polynomial equations. This is the famous FUNDAMENTAL THEOREM OF ALGEBRA [V.13].

One explanation for the utility of complex numbers is that they provide a concise way to talk about many aspects of geometry, via Argand diagrams. These represent complex numbers as points in the plane, the number a+bi corresponding to the point with coordinates (a, b). If r = Image and θ = tan-1 (b/a), then a = r cos θ and b = r sin θ. It turns out that multiplying a complex number z = x + yi by a + bi corresponds to the following geometrical process. First, you associate z with the point (x,y) in the plane. Next, you multiply this point by r, obtaining the point (rx, ry). Finally, you rotate this new point counterclockwise about the origin through an angle of θ. In other words, the effect on the complex plane of multiplication by a + bi is to dilate it by r and then rotate it by θ. In particular, if a2 + b2 = 1, then multiplying by a + bi corresponds to rotating by θ.

For this reason, polar coordinates are at least as good as Cartesian coordinates for representing complex numbers: an alternative way to write a + bi is reiθ, which tells us that the number has distance r from the origin and is positioned at an angle θ around from the positive part of the real axis (in a counterclockwise direction). If z = reiθ with r > 0, then r is called the modulus of z, denoted by | z |, and θ is the argument of z. (Since adding 2π to θ does not change eiθ, it is usually understood that 0 θ < 2π, or sometimes that -π ≤ θ < π.) One final useful definition: if z = x + iy is a complex number, then its complex conjugate, written Image, is the number x - yi. It is easy to check that zImage = x2 + y2 = | z |2.

2 Four Important Algebraic Structures

In the previous section it was emphasized that numbers are best thought of not as individual objects but as members of number systems. A number system consists of some objects (numbers) together with operations (such as addition and multiplication) that can be performed on those objects. As such, it is an example of an algebraic structure. However, there are many very important algebraic structures that are not number systems, and a few of them will be introduced here.

2.1 Groups

If S is a geometrical shape, then a rigid motion of S is a way of moving S in such a way that the distances between the points of S are not changed—squeezing and stretching are not allowed. A rigid motion is a symmetry of S if, after it is completed, S looks the same as it did before it moved. For example, if S is an equilateral triangle, then rotating S through 120° about its center is a symmetry; so is reflecting S about a line that passes through one of the vertices of S and the midpoint of the opposite side.

More formally, a symmetry of S is a function f from S to itself such that the distance between any two points x and y of S is the same as the distance between the transformed points f(x)and f(y).

This idea can be hugely generalized: if S is any mathematical structure, then a symmetry of S is a function from S to itself that preserves its structure. If S is a geometrical shape, then the mathematical structure that should be preserved is the distance between any two of its points. But there are many other mathematical structures that a function may be asked to preserve, most notably algebraic structures of the kind that will soon be discussed. It is fruitful to draw an analogy with the geometrical situation and regard any structure-preserving function as a sort of symmetry.

Because of its extreme generality, symmetry is an all-pervasive concept within mathematics; and wherever symmetries appear, structures known as groups follow close behind. To explain what these are and why they appear, let us return to the example of an equilateral triangle, which has, as it turns out, six possible symmetries.

Why is this? Well, let f be a symmetry of an equilateral triangle with vertices A, B, and C and suppose for convenience that this triangle has sides of length 1. Then f(A), f(B), and f(C) must be three points of the triangle and the distances between these points must all be 1. It follows that f(A), f(B), and f(C) are distinct vertices of the triangle, since the furthest apart any two points can be is 1 and this happens only when the two points are distinct vertices. So f(A), f(B), and f(C) are the vertices A, B, and C in some order. But the number of possible orders of A, B, and C is 6. It is not hard to show that, once we have chosen f(A), f(B), and f(C), the rest of what f does is completely determined. (For example, if X is the midpoint of A and C, then f(X) must be the midpoint of f(A) and f(C) since there is no other point at distance Image from f(A) and f(C).)

Let us refer to these symmetries by writing down in order what happens to the vertices A, B, and C. So, for instance, the symmetry ACB is the one that leaves the vertex A fixed and exchanges B and C, which is achieved by reflecting the triangle in the line that joins A to the midpoint of B and C. There are three reflections like this: ACB, CBA, and BAC. There are also two rotations: BCA and CAB. Finally, there is the “trivial” symmetry, ABC, which leaves all points where they were originally. (The “trivial” symmetry is useful in much the same way as zero is useful for the algebra of integer addition.)

What makes these and other sets of symmetries into groups is that any two symmetries can be composed, meaning that one symmetry followed by another produces a third (since if two operations both preserve a structure then their combination clearly does too). For example, if we follow the reflection BAC by the reflection ACB, then we obtain the rotation CAB. To work this out, one can either draw a picture or use the following kind of reasoning: the first symmetry takes A to B and the second takes B to C, so the combination takes A to C, and similarly B goes to A, and C to B. Notice that the order in which we perform the symmetries matters: if we had started with the reflection ACB and then done the reflection BAC, then we would have obtained the rotation BCA. (If you try to see this by drawing a picture, it is important to think of A, B, and C as labels that stay where they are rather than moving with the triangle—they mark positions that the vertices can occupy.)

We can think of symmetries as “objects” in their own right, and of composition as an algebraic operation, a bit like addition or multiplication for numbers. The operation has the following useful properties: it is ASSOCIATIVE, the trivial symmetry is an IDENTITY ELEMENT, and every symmetry has an INVERSE [I.2 §2.4]. (For example, the inverse of a reflection is itself, since doing the same reflection twice leaves the triangle where it started.) More generally, any set with a binary operation that has these properties is called a group. It is not part of the definition of a group that the binary operation should be commutative, since, as we have just seen, if one is composing two symmetries then it often makes a difference which one goes first. However, if it is commutative then the group is called Abelicin, after the Norwegian mathematician NIELS HENRIK ABEL [VI.33]. The number systems Image, Image, Image, and Image all form Abelian groups with the operation of addition, or under addition, as one usually says. If you remove zero from Image, Image, and Image, then they form Abelian groups under multiplication, but Image does not because of a lack of inverses: the reciprocal of an integer is not usually an integer. Further examples of groups will be given later in this section.

2.2 Fields

Although several number systems form groups, to regard them merely as groups is to ignore a great deal of their algebraic structure. In particular, whereas a group has just one binary operation, the standard number systems have two, namely addition and multiplication (from which further ones, such as subtraction and division, can be derived). The formal definition of a field is quite long: it is a set with two binary operations and there are several axioms that these operations must satisfy. Fortunately, there is an easy way to remember these axioms. You just write down all the basic properties you can think of that are satisfied by addition and multiplication in the number systems Image, Image, and Image.

These properties are as follows. Both addition and multiplication are commutative and associative, and both have identity elements (0 for addition and 1 for multiplication). Every element x has an additive inverse -x and a multiplicative inverse 1/x (except that 0 does not have a multiplicative inverse). It is the existence of these inverses that allows us to define subtraction and division: x - y means x + (-y) and x/y means x · (1/y).

That covers all the properties that addition and multiplication satisfy individually. However, a very general rule when defining mathematical structures is that if a definition splits into parts, then the definition as a whole will not be interesting unless those parts interact. Here our two parts are addition and multiplication, and the properties mentioned so far do not relate them in any way. But one final property, known as the distributive law, does this, and thereby gives fields their special character. This is the rule that tells us how to multiply out brackets: x(y+z) = xy+xz for any three numbers x, y, and z.

Having listed these properties, one may then view the whole situation abstractly by regarding the properties as axioms and saying that a field is any set with two binary operations that satisfy all those axioms. However, when one works in a field, one usually thinks of the axioms not as a list of statements but rather as a general license to do all the algebraic manipulations that one can do when talking about rational, real, and complex numbers.

Clearly, the more axioms one has, the harder it is to find a mathematical structure that satisfies them, and it is indeed the case that fields are harder to come by than groups. For this reason, the best way to understand fields is probably to concentrate on examples. In addition to Image, Image, and Image, one other field stands out as fundamental, namely Imagep, which is the set of integers modulo a prime p, with addition and multiplication also defined modulo p (see MODULAR ARITHMETIC [III.58]).

What makes fields interesting, however, is not so much the existence of these basic examples as the fact that there is an important process of extension that allows one to build new fields out of old ones. The idea is to start with a field Image, find a polynomial P that has no roots in Image, and “adjoin” a new element to Image with the stipulation that it is a root of P. This produces an extended field Image′, which consists of everything that one can produce from this root and from elements of Image using addition and multiplication.

We have already seen an important example of this process: in the field Image, the polynomial P(x) = x2 + 1 has no root, so we adjoined the element i and let Image be the field of all combinations of the form a + bi.

We can apply exactly the same process to the field Image3, in which again the equation x2 + 1 = 0 has no solution. If we do so, then we obtain a new field, which, like Image, consists of all combinations of the form a + bi, but now a and b belong to Image3. Since Image3 has three elements, this new field has nine elements. Another example is the field Image(Image), which consists of all numbers of the form a + bImage, where now a and b are rational numbers. A slightly more complicated example is Image(γ), where γ is a root of the polynomial x3 - x - 1. A typical element of this field has the form a + + 2, with a, b, and c rational. If one is doing arithmetic in Image(γ), then whenever γ3 appears, it can be replaced by γ + 1 (because γ3 - γ - 1 = 0), just as i2 can be replaced by -1 in the complex numbers. For more on why field extensions are interesting, see the discussion of AUTOMORPHISMS in section 4.1.

A second very significant justification for introducing fields is that they can be used to form vector spaces, and it is to these that we now turn.

2.3 Vector Spaces

One of the most convenient ways to represent points in a plane that stretches out to infinity in all directions is to use Cartesian coordinates. One chooses an origin and two directions X and Y, usually at right angles to each other. Then the pair of numbers (a, b) stands for the point you reach in the plane if you go a distance a in direction X and a distance b in direction Y (where if a is a negative number such as -2, this is interpreted as going a distance +2 in the opposite direction to X, and similarly for b).

Another way of saying the same thing is this. Let x and y stand for the unit vectors in directions X and Y, respectively, so their Cartesian coordinates are (1, 0) and (0, 1). Then every point in the plane is a so-called linear combination ax + by of the basis vectors x and y. To interpret the expression ax + by, first rewrite it as a(1, 0) + b(0, 1). Then a times the unit vector (1, 0) is (a,0) and b times the unit vector (0, 1) is (0, b) and when you add (a,0) and (0, b) coordinate by coordinate you get the vector (a, b).

Here is another situation where linear combinations appear. Suppose you are presented with the differential equation (d2y/dx2) + y = 0, and happen to know (or notice) that y = sinx and y = cosx are two possible solutions. Then you can easily check that y = asinx + bcosx is a solution for any pair of numbers a and b. That is, any linear combination of the existing solutions sinx and cosx is another solution. It turns out that all solutions are of this form, so we can regard sinx and cosx as “basis vectors” for the “space” of solutions of the differential equation.

Linear combinations occur in many many contexts throughout mathematics. To give one more example, an arbitrary polynomial of degree 3 has the form ax3 + bx2 + cx + d, which is a linear combination of the four basic polynomials 1, x, x2, and x3.

A vector space is a mathematical structure in which the notion of linear combination makes sense. The objects that belong to the vector space are usually called vectors, unless we are talking about a specific example and are thinking of them as concrete objects such as polynomials or solutions of a differential equation. Slightly more formally, a vector space is a set V such that, given any two vectors υ and w (that is, elements of V) and any two real numbers a and b, we can form the linear combination aυ + bw.

Notice that this linear combination involves objects of two different kinds, the vectors υ and w and the numbers a and b. The latter are known as scalars. The operation of forming linear combinations can be broken up into two constituent parts: addition and scalar multiplication. To form the combination aυ + bw, first multiply the vectors υ and w by the scalars a and b, obtaining the vectors aυ and bw, and then add these resulting vectors to obtain the full combination aυ + bw.

The definition of linear combination must obey certain natural rules. Addition of vectors must be commutative and associative, with an identity (the zero vector) and an inverse for each υ (written -υ). Scalar multiplication must obey a sort of associative law, namely that a(bυ) and (ab)υ are always equal. We also need two distributive laws: (a + b)υ, aυ + bυ and a(υ + w) = aυ + aw for any scalars a and b and any vectors υ and w.

Another context in which linear combinations arise, one that lies at the heart of the usefulness of vector spaces, is the solution of simultaneous equations. Suppose one is presented with the two equations 3x + 2y = 6 and x - y = 7. The usual way to solve such a pair of equations is to try to eliminate either x or y by adding an appropriate multiple of one of the equations to the other: that is, by taking a certain linear combination of the equations. In this case, we can eliminate y by adding twice the second equation to the first, obtaining the equation 5x = 20, which tells us that x = 4 and hence that y = -3. Why were we allowed to combine equations like this? Well, let us write L1 and R1 for the left- and right-hand sides of the first equation, and similarly L2 and R2 for the second. If, for some particular choice of x and y, it is true that L1 = R1 and L2 = R2, then clearly L1 + 2L2 = R1 + 2R2, as the two sides of this equation are merely giving different names to the same numbers.

Given a vector space V, a basis is a collection of vectors υ1,υ2, . . . , υn with the following property: every vector in V can be written in exactly one way as a linear combination a1υ1 + a2υ2 + . . . + an υn. There are two ways in which this can fail: there may be a vector that cannot be written as a linear combination of υ12, . . . , υn or there may be a vector that can be so expressed, but in more than one way. If every vector is a linear combination then we say that the vectors υ1, υ2, . . . , υn span V, and if no vector is a linear combination in more than one way then we say that they are independent. An equivalent definition is that υ1, υ2, . . . , υn are independent if the only way of writing the zero vector as a1υ1 + a2υ2 + . . . + anνn is by taking al = a2 = . . . = an = 0.

The number of elements in a basis is called the dimension of V. It is not immediately obvious that there could not be two bases of different sizes, but it turns out that there cannot, so the concept of dimension makes sense. For the plane, the vectors x and y defined earlier formed a basis, so the plane, as one would hope, has dimension 2. If we were to take more than two vectors, then they would no longer be independent: for example, if we take the vectors (1, 2), (1, 3), and (3, 1), then we can write (0, 0) as the linear combination 8(1,2) – 5(1, 3) – (3, 1). (To work this out one must solve some simultaneous equations—this is typical of calculations in vector spaces.)

The most obvious n-dimensional vector space is the space of all sequences (x1, . . . , xn) of n real numbers. To add this to a sequence (y1, . . . , yn) one simply forms the sequence (x1 + y1, . . . , xn + yn) and to multiply it by a scalar c one forms the sequence (cx1, . . . , cxn). This vector space is denoted Imagen. Thus, the plane with its usual coordinate system is Image2 and three-dimensional space is Image3.

It is not in fact necessary for the number of vectors in a basis to be finite. A vector space that does not have a finite basis is called infinite dimensional. This is not an exotic property: many of the most important vector spaces, particularly spaces where the “vectors” are functions, are infinite dimensional.

There is one final remark to make about scalars. They were defined earlier as real numbers that one uses to make linear combinations of vectors. But it turns out that the calculations one does with scalars, in particular solving simultaneous equations, can all be done in a more general context. What matters is that they should belong to a field, so Image, Image, and Image can all be used as systems of scalars, as indeed can more general fields. If the scalars for a vector space V come from a field Image, then one says that V is a vector space over Image. This generalization is important and useful: see, for example, ALGEBRAIC NUMBERS [IV.1 §17].

2.4 Rings

Another algebraic structure that is very important is a ring. Rings are not quite as central to mathematics as groups, fields, or vector spaces, so a proper discussion of them will be deferred to RINGS, IDEALS, AND MODULES [III.81]. However, roughly speaking, a ring is an algebraic structure that has most, but not necessarily all, of the properties of a field. In particular, the requirements of the multiplicative operation are less strict. The most important relaxation is that nonzero elements of a ring are not required to have multiplicative inverses; but sometimes multiplication is not even required to be commutative. If it is, then the ring itself is said to be commutative—a typical example of a commutative ring is the set Image of all integers. Another is the set of all polynomials with coefficients in some field Image.

3 Creating New Structures Out of Old Ones

An important first step in understanding the definition of some mathematical structure is to have a supply of examples. Without examples, a definition is dry and abstract. With them, one begins to have a feeling for the structure that its definition alone cannot usually provide.

One reason for this is that it makes it much easier to answer basic questions. If you have a general statement about structures of a given type and want to know whether it is true, then it is very helpful if you can test it in a wide range of particular cases. If it passes all the tests, then you have some evidence in favor of the statement. If you are lucky, you may even be able to see why it is true; alternatively, you may find that the statement is true for each example you try, but always for reasons that depend on particular features of the example you are examining. Then you will know that you should try to avoid these features if you want to find a counterexample. If you do find a counterexample, then the general statement is false, but it may still happen that a modification to the statement is true and useful. In that case, the counterexample will help you to find an appropriate modification.

The moral, then, is that examples are important. So how does one find them? There are two completely different approaches. One is to build them from scratch. For example, one might define a group G to be the group of all symmetries of an icosahedron. Another, which is the main topic of this section, is to take some examples that have already been constructed and build new ones out of them. For instance, the group Image2, which consists of all pairs of integers (x,y), with addition defined by the obvious rule (x,y) + (x′, y′) = (x + x′, y + y), is a “product” of two copies of the group Image. As we shall see, this notion of product is very general and can be applied in many other contexts. But first let us look at an even more basic method of finding new examples.

3.1 Substructures

As we saw earlier, the set Image of all complex numbers, with the operations of addition and multiplication, forms one of the most basic examples of a field. It also contains many subfields: that is, subsets that themselves form fields. Take, for example, the set Image(i) of all complex numbers of the form a + bi for which a and b are rational. This is a subset of Image and is also a field. To show this, one must prove that Image(i) is closed under addition, multiplication, and the taking of inverses. That is, if z and w are elements of Image(i), then z + w and zw must be as well, as must -z and 1/z (this last requirement applying only when z ≠ 0). Axioms such as the commutativity and associativity of addition and multiplication are then true in Image(i) for the simple reason that they are true in the larger set Image.

Even though Image(i) is contained in Image, it is a more interesting field in some important ways. But how can this be? Surely, one might think, an object cannot become more interesting when most of it is taken away. But a moment’s further thought shows that it certainly can: for example, the set of all prime numbers contains fascinating mysteries of a kind that one does not expect to encounter in the set of all positive integers. As for fields, THE FUNDAMENTAL THEOREM OF ALGEBRA [V.13] tells us that every polynomial equation has a solution in C. This is very definitely not true in Image(i). So in Image(i), and in many other fields of a similar kind, we can ask which polynomial equations have solutions. This turns out to be a deep and important question that simply does not arise in the larger field Image.

In general, given an example X of an algebraic structure, a substructure of X is a subset Y that has relevant closure properties. For instance, groups have subgroups, vector spaces have subspaces, rings have subrings (and also IDEALS [III.81]), and so on. If the property defining the substructure Y is a sufficiently interesting one, then Y may well be significantly different from X and may therefore be a useful addition to one’s stock of examples.

This discussion has focused on algebra, but interesting substructures abound in analysis and geometry as well. For example, the plane Image2 is not a particularly interesting set, but it has subsets, such as the MANDELBROT SET [IV.14 §2.8], to give just one example, that are still far from fully understood.

3.2 Products

Let G and H be two groups. The product group G × H has as its elements all pairs of the form (g, h) such that g belongs to G and h belongs to H. This definition shows how to build the elements of G × H out of the elements of G and the elements of H. But to define a group we need to do more: we are given binary operations on G and H and we must use them to build a binary operation on G × H. If g1 and g2 are elements of G, let us write g1g2 for the result of applying G’s binary operation to them, as is customary, and let us do the same for H. Then there is an obvious binary operation we can define on the pairs, namely

(g1, h1)(g2, h2) = (g1g2, h1h2).

That is, one applies the binary operation from G to the first coordinate and the binary operation from H to the second.

One can form products of vector spaces in a very similar way. If V and W are two vector spaces, then the elements of V × W are all pairs of the form (v, w) with v in V and w in W. Addition and scalar multiplication are defined by the formulas

(υ1, w1) + (υ2,W2) = (υ1 + υ2, w1 +w2)

and

λ(υ, w) = (λυ, λw).

The dimension of the resulting space is the sum of the dimensions of V and W. (It is actually more usual to denote this space by V Image W and call it the direct sum of V and W. Nevertheless, it is a product construction.)

It is not always possible to define product structures in this simple way. For example, if IF and IF’ are two fields, we might be tempted to define a “product field” Image × Image′ using the formulas

(x1, y1) + (x2, y2) = (x1 + x2, y1 + y2)

and

(x1, y1)(x2, y2) = (x1x2, y1y2).

However, this definition does not give us a field. Most of the axioms hold, including the existence of additive and multiplicative identities—they are (0, 0) and (1, 1), respectively—but the nonzero element (1, 0) does not have a multiplicative inverse, since the product of (1, 0) and (x,y) is (x, 0), which can never equal (1, 1).

Occasionally we can define more complicated binary operations that do make the set Image × Image′ into a field. For instance, if Image = Image′ = Image, then we can define addition as above but define multiplication in a less obvious way as follows:

(x1, y1)(x2, y2) = (x1x2 - y1y2,x1y2 + x2y1).

Then we obtain Image, the field of complex numbers, since the pair (x,y) can be identified with the complex number x + iy. However, this is not a product field in the general sense we are discussing.

Returning to groups, what we defined earlier was the direct product of G and H. However, there are other, more complicated products of groups, which can be used to give a much richer supply of examples. To illustrate this, let us consider the dihedral group D4, which is the group of all symmetries of a square, of which there are eight. If we let R stand for one of the reflections and T for a counterclockwise quarter turn, then every symmetry can be written in the form TiRj, where i is 0, 1, 2, or 3 and j is 0 or 1. (Geometrically, this says that you can produce any symmetry by either rotating through a multiple of 90° or reflecting and then rotating.)

This suggests that we might be able to regard D4 as a product of the group {I, T, T2, T3}, consisting of four rotations, with the group {I, R}, consisting of the identity I and the reflection R. We could even write (Ti, Rj) instead of TiRj. However, we have to be careful. For instance, (TR)(TR) does not equal T2R2 = T2 but I. The correct rule for multiplication can be deduced from the fact that RTR = T-1 (which in geometrical terms is saying that if you reflect the square, rotate it counterclockwise through 90°, and reflect back, then the result is a clockwise rotation through 90°). It turns out to be

Image

For example, the product of (T, R) with (T3, R) is T-2 R2, which equals T2.

This is a simple example of a “semidirect product” of two groups. In general, given two groups G and H, there may be several interesting ways of defining a binary operation on the set of pairs (g, h), and therefore several potentially interesting new groups.

3.3 Quotients

Let us write Image[x] for the set of all polynomials in the variable x with rational coefficients: that is, expressions like 2x4 - Imagex + 6. Any two such polynomials can be added, subtracted, or multiplied together and the result will be another polynomial. This makes Image[x] into a commutative ring, but not a field, because if you divide one polynomial by another then the result is not (necessarily) a polynomial.

We will now convert Image[x] into a field in what may at first seem a rather strange way: by regarding the polynomial x3 - x - 1 as “equivalent” to the zero polynomial. To put this another way, whenever a polynomial involves x3 we will allow ourselves to replace x3 by x + 1, and we will regard the new polynomial that results as equivalent to the old one. For example, writing “∼” for “is equivalent to”:

x5 = x3x2 ∼ (x + 1)x2 = x3 + x2

                        ∼ x + 1 + x2 = x2 + x + 1.

Notice that in this way we can convert any polynomial into one of degree at most 2, since whenever the degree is higher, you can reduce it by taking out x3 from the term of highest degree and replacing it by x + 1, just as we did above.

Notice also that whenever we do such a replacement, the difference between the old polynomial and the new one is a multiple of x3 - x - 1. For example, when we replaced x3 x2 by (x + 1) x2 the difference was (x3- x - 1) x2. Therefore, what our process amounts to is this: two polynomials are equivalent if and only if their difference is a multiple of the polynomial x3 - x - 1.

Now the reason Image[x] was not a field was that nonconstant polynomials do not have multiplicative inverses. For example, it is obvious that one cannot multiply x2 by a polynomial and obtain the polynomial 1. However, we can obtain a polynomial that is equivalent to 1 if we multiply by 1 + x - x2. Indeed, the product of the two is

x2 + x3 - x4x2 + x + 1 - (x + 1) x = 1.

It turns out that all polynomials that are not equivalent to zero (that is, are not multiples of x3 - x - 1) have multiplicative inverses in this generalized sense. (To find an inverse for a polynomial P one applies the generalized EUCLID ALGORITHM [III.22] to find polynomials Q and R such that PQ + R (x3 - x - 1) = 1. The reason we obtain 1 on the right-hand side is that x3 - x - 1 cannot be factorized in Image[x] and P is not a multiple of x3 - x - 1, so their highest common factor is 1. The inverse of P is then Q.)

In what sense does this mean that we have a field? After all, the product of x2 and 1 + x - x2 was not 1: it was merely equivalent to 1. This is where the notion of quotients comes in. We simply decide that when two polynomials are equivalent, we will regard them as equal, and we denote the resulting mathematical structure by Image[x]/(x3 - x - 1). This structure turns out to be a field, and it turns out to be important as the smallest field that contains Image and also has a root of the polynomial X3 - X - 1. What is this root? It is simply x. This is a slightly subtle point because we are now thinking of polynomials in two different ways: as elements of Image[x]/(x3 - x - 1)(at least when equivalent ones are regarded as equal), and also as functions defined on Image[x]/(x3 - x - 1). So the polynomial X3 - X - 1 is not the zero polynomial, since for example it takes the value 5 when X = 2 and the value x6 - x2 - 1 ∼ (x + 1)2 - x2 - 1 ∼ 2 x when X = x2.

You may have noticed a strong similarity between the discussion of the field Image[x]/(x3 - x - 1) and the discussion of the field Image(γ) at the end of section 2.2. And indeed, this is no coincidence: they are two different ways of describing the same field. However, thinking of the field as Image[x]/(x3 - x - 1) brings significant advantages, as it converts questions about a mysterious set of complex numbers into more approachable questions about polynomials.

What does it mean to “regard two mathematical objects as equal” when they are not equal? A formal answer to this question uses the notion of equivalence relations and equivalence classes (discussed in THE LANGUAGE AND GRAMMAR OF MATHEMATICS [I.2 §2.3]): one says that the elements of Image[x]/(x3 - x - 1) are not in fact polynomials but equivalence classes of polynomials. However, to understand the notion of a quotient it is much easier to look at an example with which we are all familiar, namely the set Image of rational numbers. If we are trying to explain carefully what a rational number is, then we may start by saying that a typical rational number has the form a/b, where a and b are integers and b is not 0. And it is possible to define the set of rational numbers to be the set of all such expressions, with the rules

Image

and

Image

However, there is one very important further remark we must make, which is that we do not regard all such expressions as different: for example, Image and Image are supposed to be the same rational number. So we define two expressions Image and Image to be equivalent if ad = bc and we regard equivalent expressions as denoting the same number. Notice that the expressions can be genuinely different, but we think of them as denoting the same object.

If we do this, then we must be careful whenever we define functions and binary operations. For example, suppose we tried to define a binary operation “Image” on Image by the natural-looking formula

Image

This definition turns out to have a very serious flaw. To see why, let us apply it to the fractions Image and Image. Then it gives us the answer Image. Now let us replace Image by the equivalent fraction Image and apply the formula again. This time it gives us the answer Image, which is different. Thus, although the formula defines a perfectly good binary operation on the set of expressions of the form Image, it does not make any sense as a binary operation on the set of rational numbers.

In general, it is essential to check that if you put equivalent objects in then you get equivalent objects out. For example, when defining addition and multiplication for the field Image[x]/(x3 - x - 1), one must check that if P and P′ differ by a multiple of x3 - x - 1, and Q and Q′ also differ by a multiple of x3 - x - 1, then so do P + Q and P′ + Q′, and so do PQ and P′Q′. This is an easy exercise.

An important example of a quotient construction is that of a quotient group. If G is a group and H is a subgroup of G, then it is natural to try to do what we did for polynomials and define g1 and g2 to be equivalent if Image g2 (the obvious notion of the “difference” between g1 and g2) belongs to H. The equivalence class of an element g is easily seen to be the set of all elements gh such that h ∈ H, which is usually written gH. (It is called a left coset of H.)

There is a natural candidate for a binary operation * on the set of all left cosets: g1H * g2H = g1g2H. In other words, given two left cosets, pick elements g1 and g2 from each, form the product g1g2, and take the left coset g1g2H. Once again, it is important to check that if you pick different elements from the original cosets, then you will still get the coset g1g2H. It turns out that this is not always the case: one needs the additional assumption that H is a normal subgroup, which means that if h is any element of H, then ghg- 1 is an element of H for every element g of G. Elements of the form ghg-1 are called conjugates of h; thus, a normal subgroup is a subgroup that is “closed under conjugation.”

If H is a normal subgroup, then the set of left cosets forms a group under the binary operation just defined. This group is written G/H and is called the quotient of G by H. One can regard G as a product of H and G/H (though it may be a somewhat complicated product), so if you understand both H and G/H, then for many purposes you understand G. Therefore, groups G that do not have normal subgroups (other than G itself and the subgroup that consists of just the identity element) have a special role, a bit like the role of prime numbers in number theory. They are called simple groups. (See THE CLASSIFICATION OF FINITE SIMPLE GROUPS [V.7].)

Why is the word “quotient” used? Well, a quotient is normally what you get when you divide one number by another, so to understand the analogy let us think about dividing 21 by 3. We can think of this as dividing up twenty-one objects into sets of three objects each and asking how many sets we get. This can be described in terms of equivalence as follows. Let us call two objects equivalent if they belong to the same one of the seven sets. Then there can be at most seven inequivalent objects. So when we regard equivalent objects as the same, we “divide out by the equivalence,” obtaining a “quotient set” that has seven elements.

A rather different use of quotients leads to an elegant definition of the mathematical shape known as a torus: that is, the shape of the surface of a doughnut (of the kind that has a hole). We start with the plane, Image2, and define two points (x,y) and (x′, y′) to be equivalent if x - x′ and y - y′ are both integers. Suppose that we regard any two equivalent points as the same and that we start at a point (x,y) and move right until we reach the point (x + 1, y). This point is “the same” as (x,y), since the difference is (1, 0). Therefore, it is as though the entire plane has been wrapped around a vertical cylinder of circumference 1 and we have gone around this cylinder once. If we now apply the same argument to the y-coordinate, noting that (x,y) is always “the same” point as (x,y + 1), then we find that this cylinder is itself “folded around” so that if you go “upwards” by a distance of 1 then you get back to where you started. But that is what a torus is: a cylinder that is folded back into itself. (This is not the only way of defining a torus, however. For example, it can be defined as the product of two circles.)

Many other important objects in modern geometry are defined using quotients. It often happens that the object one starts with is extremely big, but that at the same time the equivalence relation is very generous, in the sense that it is easy for one object to be equivalent to another. In that case the number of “genuinely distinct” objects can be quite small. This is a rather loose way of talking, since it is not really the number of distinct objects that is interesting so much as the complexity of the set of these objects. It might be better to say that one often starts with a hopelessly large and complicated structure but “divides out most of the mess” and ends up with a quotient object that has a structure that is simple enough to be manageable while still conveying important information. Good examples of this are the FUNDAMENTAL GROUP [IV.6 §2] and the HOMOLOGY AND COHOM0LOGY GROUPS [IV.6 §4] of a topological space; an even better example is the notion of a MODULI SPACE [IV.8].

Many people find the idea of a quotient somewhat difficult to grasp, but it is of major importance throughout mathematics, which is why it has been discussed at some length here.

4 Functions between Algebraic Structures

One rule with almost no exceptions is that mathematical structures are not studied in isolation: as well as the structures themselves one looks at certain functions defined on those structures. In this section we shall see which functions are worth considering, and why. (For a discussion of functions in general, see THE LANGUAGE AND GRAMMAR OF MATHEMATICS [I.2 §2.2].)

4.1 Homomorphisms, Isomorphisms, and Automorphisms

If X and Y are two examples of a particular mathematical structure, such as a group, field, or vector space, then, as was suggested in the discussion of symmetry in section 2.1, there is a class of functions from X to Y of particular interest, namely the functions that “preserve the structure.” Roughly speaking, a function f : XY is said to preserve the structure of X if, given any relationship between elements of X that is expressed in terms of that structure, there is a corresponding relationship between the images of those elements that is expressed in terms of the structure of Y. For example, if X and Y are groups and a, b, and c are elements of X such that ab = c, then, if f is to preserve the algebraic structure of X, f(a)f(b) must equal f(c) in Y. (Here, as is usual, we are using the same notation for the binary operations that make X and Y groups as is normally used for multiplication.) Similarly, if X and Y are fields, with binary operations that we shall write using the standard notation for addition and multiplication, then a function f : XY will be interesting only if f(a) + f(b) = f(c) whenever a + b = c and f(a) f(b) = f(c) whenever ab = c. For vector spaces, the functions of interest are ones that preserve linear combinations: if V and W are vector spaces, then f(av + bw) should always equal a f(v) + bf(w).

A function that preserves structure is called a homomorphism, though homomorphisms of particular mathematical structures often have their own names: for example, a homomorphism of vector spaces is called a linear map.

There are some useful properties that a homomorphism may have if we are lucky. To see why further properties can be desirable, consider the following example. Let X and Y be groups and let f : X → Y be the function that takes every element of X to the identity element e of Y. Then, according to the definition above, f preserves the structure of X, since whenever ab = c, we have f(a)f(b) = ee = e = f (c). However, it seems more accurate to say that f has collapsed the structure. One can make this idea more precise: although f(a)f(b) = f(c) whenever ab = c, the converse does not hold: it is perfectly possible for f(a)f(b) to equal f (c) without ab equaling c, and indeed that happens in the example just given.

An isomorphism between two structures X and Y is a homomorphism f : X → Y that has an inverse g : YX that is also a homomorphism. For most algebraic structures, if f has an inverse g, then g is automatically a homomorphism; in such cases we can simply say that an isomorphism is a homomorphism that is also a BIJECTION [I.2 §2.2]. That is, f is a one-to-one correspondence between X and Y that preserves structure.1

If X and Y are fields, then these considerations are less interesting: it is a simple exercise to show that every homomorphism f : XY that is not identically zero is automatically an isomorphism between X and its image f (X), that is, the set of all values taken by the function f. So structure cannot be collapsed without being lost. (The proof depends on the fact that the zero in Y has no multiplicative inverse.)

In general, if there is an isomorphism between two algebraic structures X and Y, then X and Y are said to be isomorphic (coming from the Greek words for “same” and “shape”). Loosely, the word “isomorphic” means “the same in all essential respects,” where what counts as essential is precisely the algebraic structure. What is absolutely not essential is the nature of the objects that have the structure: for example, one group might consist of certain complex numbers, another of integers modulo a prime p, and a third of rotations of a geometrical figure, and they could all turn out to be isomorphic. The idea that two mathematical constructions can have very different constituent parts and yet in a deeper sense be “the same” is one of the most important in mathematics.

An automorphism of an algebraic structure X is an isomorphism from X to itself. Since it is hardly surprising that X is isomorphic to itself, one might ask what the point is of automprphisms. The answer is that automorphisms are precisely the algebraic symmetries alluded to in our discussion of groups. An automorphism of X is a function from X to itself that preserves the structure (which now comes in the form of statements like ab = c). The composition of two automorphisms is clearly a third, and as a result the automorphisms of a structure X form a group. Although the individual automorphisms may not be of much interest, the group certainly is, as it often encapsulates what one really wants to know about a structure X that is too complicated to analyze directly.

A spectacular example of this is when X is a field. To illustrate, let us take the example of Image(Image). If f : Image(Image) → Image(Image) is an automorphism, then f(1) = 1. (This follows easily from the fact that 1 is the only multiplicative identity.) It follows that f(2) = f(1 + 1) = f(1) + f(l) = 1 + 1 = 2. Continuing like this, we can show that f(n) = n for every positive integer n. Then f(n) + f(-n) = f(n + (-n)) = f(0) = 0, so f(-n) = f(n) = -n. Finally, f(p/q) = f(p)/f(q) = p/q when p and q are integers with q ≠ 0. So f takes every rational number to itself. What can we say about f(Image)? Well, f(Image) f(Image) = f(Image · Image) = f(2) = 2, but this implies only that f(Image) is Image or -Image. It turns out that both choices are possible: one automorphism is the “trivial” one, f(a + b Image) = a + bImage, and the other is the more interesting one, f(a + bImage) = a - bImage. This observation demonstrates that there is no algebraic difference between the two square roots; in this sense, the field Image(Image) does not know which square root of 2 is positive and which negative. These two automorphisms form a group, which is isomorphic to the group consisting of the elements ±1 under multiplication, or the group of integers modulo 2, or the group of symmetries of an isosceles triangle that is not equilateral, or. . . . The list is endless.

The automorphism groups associated with certain field extensions are called Galois groups, and are a vital component of the proof of THE INSOLUBILITY OF THE QUINTIC [V.21], as well as large parts of ALGEBRAIC NUMBER THEORY [IV.1].

An important concept associated with a homomorphism φ between algebraic structures is that of a kernel. This is defined to be the set of all elements x of X such that φ(x) is the identity element of Y (where this means the additive identity if X and Y are structures that involve both additive and multiplicative binary operations). The kernel of a homomorphism tends to be a substructure of X with interesting properties. For instance, if G and K are groups, then the kernel of a homomorphism from G to K is a normal subgroup of G; and conversely, if H is a normal subgroup of G, then the quotient map, which takes each element g to the left coset gH, is a homomorphism from G to the quotient group G/H with kernel H. Similarly, the kernel of any ring homomorphism is an IDEAL [III.81], and every ideal I in a ring R is the kernel of a “quotient map” from R to R/I. (This quotient construction is discussed in more detail in RINGS, IDEALS, AND MODULES [III.81].)

4.2 Linear Maps and Matrices

Homomorphisms between vector spaces have a distinctive geometrical property: they send straight lines to straight lines. For this reason they are called linear maps, as was mentioned in the previous subsection. From a more algebraic point of view, the structure that linear maps preserve is that of linear combinations: a function f from one vector space to another is a linear map if f(au + bυ) = af(u) + bf(υ) for every pair of vectors u, υ ∈ V and every pair of scalars a and b. from this one can deduce the more general assertion that f (a1υ1 + . . . + anυn) is always equal to a1f(υ1) + . . . + anf(υ)n

Suppose that we wish to define a linear map from V to W. How much information do we need to provide? In order to see what sort of answer is required, let us begin with a similar but slightly easier question: how much information is needed to specify a point in space? The answer is that, once one has devised a sensible coordinate system, three numbers will suffice. If the point is not too far from Earth’s surface then one might wish to use its latitude, its longitude, and its height above sea level, for instance. Can a linear map from V to W similarly be specified by just a few numbers?

The answer is that it can, at least if V and W are finite dimensional. Suppose that V has a basis υ1, . . . , υn, that W has a basis w1, . . . , wm, and that f : V → W is the linear map we would like to specify. Since every vector in V can be written in the form a1υ1 + . . . + an υn and since f(a1υ1 + . . . + an υn) is always equal to a1 f(υ1) + . . . + an f(υn), once we decide what f(υ1), . . . , f(υn) are we have specified f completely. But each vector f(υj) is a linear combination of the basis vectors w1, . . . , wm: that is, it can be written in the form

f(υj) = a1 j w1 + . . . + amjwm.

Thus, to specify an individual f(υj) needs m numbers, the scalars a1j, . . . , amj. Since there are n different vectors υj, the linear map is determined by the mn numbers aij, where i runs from 1 to m and j from 1 to n.

These numbers can be written in an array, as follows:

Image

An array like this is called a matrix It is important to note that a different choice of basis vectors for V and W would lead to a different matrix, so one often talks of the matrix of f relative to a given pair of bases (a basis for V and a basis for W).

Now suppose that f is a linear map from V to W and that g is a linear map from U to V. Then fg stands for the linear map from U to W obtained by doing first g, then f. If the matrices of f and g, relative to certain bases of U, V, and W, are A and B, then what is the matrix of fg? To work it out, one takes a basis vector uk of U and applies to it the function g, obtaining a linear combination b1kυ1 + . . . + bnk υn of the basis vectors of V. To this linear combination one applies the function f, obtaining a rather complicated linear combination of linear combinations of the basis vectors w1, . . . , wm of W.

Pursuing this idea, one can calculate that the entry in row i and column j of the matrix P of fg is ai1 b1j + ai2 b2j + . . . + ainbnj. This matrix P is called the product of A and B and is written AB. If you have not seen this definition then you will find it hard to grasp, but the main point to remember is that there is a way of calculating the matrix for fg from the matrices A and B of f and g, and that this matrix is denoted AB. Matrix multiplication of this kind is associative but not commutative. That is, A(BC) is always equal to (AB)C but AB is not necessarily the same as BA. The associativity follows from the fact that composition of the underlying linear maps is associative: if A, B, and C are the matrices of f, g, and h, respectively, then A(BC) is the matrix of the linear map “do h-then-g, then f” and (AB)C is the matrix of the linear map “do h, then g-then-f,” and these are the same linear map.

Let us now confine our attention to automorphisms from a vector space V to itself. These are linear maps f: V → V that can be inverted; that is, for which there exists a linear map g : V → V such that fg(v) = g f(v) = v for every vector v in V. These we can think of as “symmetries” of the vector space V, and as such they form a group under composition. If V is n dimensional and the scalars come from the field Image, then this group is called GLn(Image). The letters “G” and “L” stand for “general” and “linear;” some of the most important and difficult problems in mathematics arise when one tries to understand the structure of the general linear groups (and related groups) for certain interesting fields Image (see REPRESENTATION THEORY [IV.9 §§5,6]).

While matrices are very useful, many interesting linear maps are between infinite-dimensional vector spaces, and we close this section with two examples for the reader who is familiar with elementary calculus. (There will be a brief discussion of calculus later in this article.) For the first, let V be the set of all functions from Image to Image that can be differentiated and let W be the set of all functions from Image to Image. These can be made into vector spaces in a simple way: if f and g are functions, then their sum is the function h defined by the formula h(x) = f(x) + g(x), and if a is a real number then af is the function k defined by the formula k(x) = af(x). (So, for example, we could regard the polynomial x2 + 3x + 2 as a linear combination of the functions x2, x, and the constant function 1.) Then differentiation is a linear map (from V to W), since the derivative (af + bg)′ is af′ + bg′. This is clearer if we write D f for the derivative of f: then we are saying that D(af + b g) = a D f + b D g.

A second example uses integration. Let V be another vector space of functions, and let u be a function of two variables. (The functions involved have to have certain properties for the definition to work, but let us ignore the technicalities.) Then we can define a linear map T on the space V by the formula

Image

Definitions like this one can be hard to take in, because they involve holding in one’s mind three different levels of complexity. At the bottom we have real numbers, denoted by x and y. In the middle are functions like f, u, and Tf, which turn real numbers (or pairs of them) into real numbers. At the top is another function, T, but the “objects” that it transforms are themselves functions: it turns a function like f into a different function Tf. This is just one example where it is important to think of a function as a single, elementary “thing” rather than as a process of transformation. (See the discussion of functions in THE LANGUAGE AND GRAMMAR OF MATHEMATICS [I.2 §2.2].) Another remark that may help to clarify the definition is that there is a very close analogy between the role of the two-variable function u(x,y) and the role of a matrix aij (which can itself be thought of as a function of the two integer variables i and j). Functions like u are sometimes called kernels (which should not be confused with kernels of homomorphisms). For more about linear maps between infinite-dimensional spaces, see OPERATOR ALGEBRAS [IV.15] and LINEAR OPERATORS [III.50].

4.3 Eigenvalues and Eigenvectors

Let V be a vector space and let S : VV be a linear map from V to itself. An eigenvector of S is a nonzero vector v in V such that Sv is proportional to v that is, Sv = λv for some scalar λ. The scalar in question is called the eigenvalue corresponding to v This simple pair of definitions is extraordinarily important: it is hard to think of any branch of mathematics where eigenvectors and eigenvalues do not have a major part to play. But what is so interesting about Siv being proportional to v? A rather vague answer is that in many cases the eigenvectors and eigenvalues associated with a linear map contain all the information one needs about the map, and in a very convenient form. Another answer is that linear maps occur in many different contexts, and questions that arise in those contexts often turn out to be questions about eigenvectors and eigenvalues, as the following two examples illustrate.

First, imagine that you are given a linear map T from a vector space V to itself and want to understand what happens if you perform the map repeatedly. One approach would be to pick a basis of V, work out the corresponding matrix A of T, and calculate the powers of A by matrix multiplication. The trouble is that the calculation will be messy and uninformative, and it does not really give much insight into the linear map.

However, it often happens that one can pick a very special basis, consisting only of eigenvectors, and in that case understanding the powers of T becomes easy. Indeed, suppose that the basis vectors are v1, v2, . . . , vn and that each vi is an eigenvector with corresponding eigenvalue λi. That is, suppose that T(vi) = λivi for every i. If w is any vector in V, then there is exactly one way of writing it in the form a1v1 + . . . + anvn, and then

T(w) = λ1a1 v1 + . . . + λnanvn.

Roughly speaking, this says that T stretches the part of w in direction vi by a factor of λi. But now it is easy to say what happens if we apply T not just once but m times to w. The result will be

Image

In other words, now the amount by which we stretch in the vi direction is Image, and that is all there is to it.

Why should one be interested in doing linear maps over and over again? There are many reasons: one fairly convincing one is that this sort of calculation is exactly what Google does in order to put Web sites into a useful order. Details can be found in THE MATHEMATICS OF ALGORITHM DESIGN [VII.5].

The second example concerns the interesting property of the EXPONENTIAL FUNCTION [III.25] ex: that its derivative is the same function. In other words, if f(x) = ex, then f′ (x) = f(x). Now differentiation, as we saw earlier, can be thought of as a linear map, and if f′ (x) = f(x) then this map leaves the function f unchanged, which says that f is an eigenvector with eigenvalue 1. More generally, if g(x) = eλi, then g′(x) = λeλx = λg(x), so g is an eigenvector of the differentiation map, with eigenvalue λ. Many linear differential equations can be thought of as asking for eigenvectors of linear maps defined using differentiation. (Differentiation and differential equations will be discussed in the next section.)

5 Basic Concepts of Mathematical Analysis

Mathematics took a huge leap forward in sophistication with the invention of calculus, and the notion that one can specify a mathematical object indirectly by means of better and better approximations. These ideas form the basis of a broad area of mathematics known as analysis, and the purpose of this section is to help the reader who is unfamiliar with them. However, it will not be possible to do full justice to the subject, and what is written here will be hard to understand without at least some prior knowledge of calculus.

5.1 Limits

In our discussion of real numbers (section 1.4) there was a brief discussion of the square root of 2. How do we know that 2 has a square root? One answer is the one given there: that we can calculate its decimal expansion. If we are asked to be more precise, we may well end up saying something like this. The real numbers 1, 1.4, 1.41, 1.414, 1.4142, 1.41421, . . . , which have terminating decimal expansions (and are therefore rational), approach another real number x = 1.4142135. . . . We cannot actually write down x properly because it has an infinite decimal expansion but we can at least explain how its digits are defined: for example, the third digit after the decimal point is a 4 because 1.414 is the largest multiple of 0.001 that squares to less than 2. It follows that the squares of the original numbers, 1, 1.96, 1.9881, 1.999396, 1.99996164, 1.9999899241, . . . , approach 2, and this is why we are entitled to say that x2 = 2.

Suppose that we are asked to determine the length of a curve drawn on a piece of paper, and that we are given a ruler to help us. We face a problem: the ruler is straight and the curve is not. One way of tackling the problem is as follows. First, draw a few points P0, P1, P2, . . . , Pn along the curve, with P0 at one end and Pn at the other. Next, measure the distance from P0 to P1, the distance from P1 to P2, and so on up to Pn. Finally, add all these distances up. The result will not be an exactly correct answer, but if there are enough points, spaced reasonably evenly, and if the curve does not wiggle too much, then our procedure will give us a good notion of the “approximate length” of the curve. Moreover, it gives us a way to define what we mean by the “exact length”: suppose that, as we take more and more points, we find that the approximate lengths, in the sense just defined, approach some number l. Then we say that l is the length of the curve.

In both these examples there is a number that we reach by means of better and better approximations. I used the word “approach” in both cases, but this is rather vague, and it is important to make it precise. Let a1, a2, a3, . . . be a sequence of real numbers. What does it mean to say that these numbers approach a specified real number l?

The following two examples are worth bearing in mind. The first is the sequence Image In a sense, the numbers in this sequence approach 2, since each one is closer to 2 than the one before, but it is clear that this is not what we mean. What matters is not so much that we get closer and closer, but that we get arbitrarily close, and the only number that is approached in this stronger sense is the obvious “limit,” 1.

A second sequence illustrates this in a different way: Image Here, we would like to say that the numbers approach 0, even though it is not true that each one is closer than the one before. Nevertheless, it is true that eventually the sequence gets as close as you like to 0 and remains at least that close.

This last phrase serves as a definition of the mathematical notion of a limit: the limit of the sequence of numbers a1, a2, a3, . . . is l if eventually the sequence gets as close as you like to l and remains that close. However, in order to meet the standards of precision demanded by mathematics, we need to know how to translate English words like “eventually” into mathematics, and for this we need QUANTIFIERS [I.2 §3.2].

Suppose δ is a positive number (which one usually imagines as small). Let us say that an is δ-close to l if |an - l|, the difference between an and l, is less than δ. What would it mean to say that eventually the sequence gets δ-close to l and stays there? It means that from some point onwards, all the an are δ-close to l. And what is the meaning of “from some point onwards”? It is that there is some number N (the point in question) with the property that an is δ-close to l from N onwards—that is, for every n that is greater than or equal to N. In symbols:

Image

It remains to capture the idea of “as close as you like.” What this means is that the above sentence is true for any δ you might wish to specify. In symbols:

Image

Finally, let us stop using the nonstandard phrase “δ-close”:

Image

This sentence is not particularly easy to understand. Unfortunately (and interestingly in the light of the discussion in [I.2 §4]), using a less symbolic language does not necessarily make things much easier: “Whatever positive δ you choose, there is some number N such that for all bigger numbers n the difference between an and l is less than δ.”

The notion of limit applies much more generally than just to real numbers. If you have any collection of mathematical objects and can say what you mean by the distance between any two of those objects, then you can talk of a sequence of those objects having a limit. Two objects are now called δ-close if the distance between them is less than δ, rather than the difference. (The idea of distance is discussed further in METRIC SPACES [III.56].) For example, a sequence of points in space can have a limit, as can a sequence of functions. (In the second case it is less obvious how to define distance—there are many natural ways to do it.) A further example comes in the theory of fractals (see DYNAMICS [IV.14]): the very complicated shapes that appear there are best defined as limits of simpler ones.

Two other ways of saying “the limit of the sequence a1, a2, . . . is l” are “an converges to l” and “an tends to l.” One sometimes says that this happens as n tends to infinity. Any sequence that has a limit is called convergent. If an converges to l then one often writes an Image l.

5.2 Continuity

Suppose you want to know the approximate value of π2. Perhaps the easiest thing to do is to press a π button on a calculator, which displays 3.1415927, and then an x2 button, after which it displays 9.8696044. Of course, one knows that the calculator has not actually squared π: instead it has squared the number 3.1415927. (If it is a good one, then it may have secretly used a few more digits of π without displaying them, but not infinitely many.) Why does it not matter that the calculator has squared the wrong number?

A first answer is that it was only an approximate value of π2 that was required. But that is not quite a complete explanation: how do we know that if x is a good approximation to π then x2 is a good approximation to π2? Here is how one might show this. If x is a good approximation to π, then we can write x = π + δ for some very small number δ (which could be negative). Then x2 = π2 + 2δπ + δ2. Since δ is small, so is 2δπ + δ2, so x2 is indeed a good approximation to π2.

What makes the above reasoning work is that the function that takes a number x to its square is continuous. Roughly speaking, this means that if two numbers are close, then so are their squares.

To be more precise about this, let us return to the calculation of π2, and imagine that we wish to work it out to a much greater accuracy—so that the first hundred digits after the decimal point are correct, for example. A calculator will not be much help, but what we might do is find a list of the digits of π (on the Internet you can find sites that tell you at least the first fifty million), use this to define a new x that is a much better approximation to π, and then calculate the new x2 by getting a computer to do the necessary long multiplication.

How close to π do we need x to be for x2 to be within 10-100 of π2? To answer this, we can use our earlier argument. Let x = π + δ again. Then x22 = 2δπ+δ2, and an easy calculation shows that this has modulus less than 10-100 if δ has modulus less than 10-100. So we will be all right if we take the first 101 digits of π after the decimal point.

More generally, however accurate we wish our estimate of π2 to be, we can achieve this accuracy if we are prepared to make x a sufficiently good approximation to π. In mathematical parlance, the function f(x) = x2 is continuous at π.

Let us try to say this more symbolically. The statement “x2 = π2 to within an accuracy of Image” means that |x2 - π2| < Image. To capture the phrase “however accurate,” we need this to be true for every positive Image, so we should start by saying ∀Image > 0. Now let us think about the words “if we are prepared to make x a sufficiently good approximation to π.” The thought behind them is that there is some δ > 0 for which the approximation is guaranteed to be accurate to within Image as long as x is within δ of π. That is, there exists a δ > 0 such that if |x - π| < δ then it is guaranteed that |x2 - π2| < Image. Putting everything together, we end up with the following symbolic sentence:

Image

To put that in words: “Given any positive number Image there is a positive number δ such that if |x - π| is less than δ then |x2 - π2| is less than Image.” Earlier, we found a δ that worked when Image was chosen to be 10-100: it was 10-101.

What we have just shown is that the function f(x) = x2 is continuous at the point x = π. Now let us generalize this idea: let f be any function and let a be any real number. We say that f is continuous at a if

Image

This says that however accurate an estimate for f(a) you wish f(x) to be, you can achieve this accuracy if you are prepared to make x a sufficiently good approximation to a. The function f is said to be continuous if it is continuous at every a. Roughly speaking, what this means is that f has no “sudden jumps.” (It also rules out certain kinds of very rapid oscillations that would also make accurate estimates difficult.)

As with limits, the idea of continuity applies in much more general contexts, and for the same reason. Let f be a function from a set X to a set Y, and suppose that we have two notions of distance, one for elements of X and the other for elements of Y. Using the expression d(x, a) to denote the distance between x and a, and similarly for d(f(x),f(a)), one says that f is continuous at a if

Image

and that f is continuous if it is continuous at every a in X. In other words, we replace differences such as |x - a| by distances such as d(x, a).

Like homomorphisms (which are discussed in section 4.1 above), continuous functions can be regarded as preserving a certain sort of structure. It can be shown that a function f is continuous if and only if, whenever an Image x, we also have f(an) Image f(x). That is, continuous functions are functions that preserve the structure provided by convergent sequences and their limits.

5.3 Differentiation

The derivative of a function f at a value a is usually presented as a number that measures the rate of change of f(x) as x passes through a. The purpose of this section is to promote a slightly different way of regarding it, one that is more general and that opens the door to much of modern mathematics. This is the idea of differentiation as linear approximation.

Intuitively speaking, to say that f´(a) = m is to say that if one looks through a very powerful microscope at the graph of f in a tiny region that includes the point (a, f(a)), then what one sees is almost exactly a straight line of gradient m. In other words, in a sufficiently small neighborhood of the point a, the function f is approximately linear. We can even write down a formula for the linear function g that approximates f:

Image

This is the equation of the straight line of gradient m that passes through the point (a, f(a)). Another way of writing it, which is a little clearer, is

Image

and to say that g approximates f in a small neighborhood of a is to say that f(a + h) is approximately equal to f(a) + mh when h is small.

One must be a little careful here: after all, if f does not jump suddenly, then, when h is small, f(a + h) will be close to f(a) and mh will be small, so f(a + h) is approximately equal to f(a) + mh. This line of reasoning seems to work regardless of the value of m, and yet we wanted there to be something special about the choice m = f´(a). What singles out that particular value is that f(a + h) is not just close to f(a) + mh, but so close that the difference Image(h) = f(a+h) - f(a) - mh is small compared with h. That is, Image(h)/h Image 0 as h Image 0. (This is a slightly more general notion of limit than the one discussed in section 5.1. It means that you can make Image(h)/h as small as you like if you make h small enough.)

The reason these ideas can be generalized is that the notion of a linear map is much more general than simply a function from Image to Image of the form g(x) = mx + c. Many functions that arise naturally in mathematics—and also in science, engineering, economics, and many other areas—are functions of several variables, and can therefore be regarded as functions defined on a vector space of dimension greater than 1. As soon as we look at them this way, we can ask ourselves whether, in a small neighborhood of a point, they can be approximated by linear maps. It is very useful if they can: a general function can behave in very complicated ways, but if it can be approximated by a linear function, then at least in small regions of n-dimensional space its behavior is much easier to understand. In this situation one can use the machinery of linear algebra and matrices, which leads to calculations that are feasible, especially if one has the help of a computer.

Imagine, for instance, a meteorologist interested in how the direction and speed of the wind change as one looks at different parts of some three-dimensional region above Earth’s surface. Wind behaves in complicated, chaotic ways, but to get some sort of handle on this behavior one can describe it as follows. To each point (x,y,z) in the region (think of x and y as horizontal coordinates and z as a vertical one) one can associate a vector (u, v, w) representing the velocity of the wind at that point: u, v, and w are the components of the velocity in the x-, y-, and z-directions.

Now let us change the point (x,y,z) very slightly by choosing three small numbers h, k, and l and looking at (x + h, y + k, z + l). At this new point, we would expect the wind vector to be slightly different as well, so let us write it (u + p, v + q, w + r). How does the small change (p, q, r) in the wind vector depend on the small change (h, k, l) in the position vector? Provided the wind is not too turbulent and h, k, and l are small enough, we expect the dependence to be roughly linear: that is how nature seems to work. In other words, we expect there to be some linear map T such that (p, q, r) is roughly T(h, k, l) when h, k, and l are small. Notice that each of p, q, and r depends on each of h, k, and l, so nine numbers will be needed in order to specify this linear map. In fact, we can express it in matrix form:

Image

The matrix entries aij express individual dependencies. For example, if x and z are held fixed, then we are setting h = l = 0, from which it follows that the rate of change of u as just y varies is given by the entry a12. That is, a12 is the partial derivativeu/∂y at the point (x, y, z).

This tells us how to calculate the matrix, but from the conceptual point of view it is easier to use vector notation. Write x for (x,y,z), u(x) for (u, v, w), h for (h, k, l), and p for (p, q, r). Then what we are saying is that

Image = T(h) + Image(h)

for some vector Image(h) that is small relative to h. Alternatively, we can write

u(x + h) = u(x) + T(h) + Image(h),

a formula that is closely analogous to our earlier formula g(x + h) = g(x) + mh + Image(h). This tells us that if we add a small vector h to x, then u(x) will change by roughly T(h).

More generally, let u be a function from Imagen to Imagem. Then u is defined to be differentiable at a point x Image Imagen if there is a linear map T : ImagenImagem such that, once again, the formula

u(x + h) = u(x) + T(h) + Image(h)

1. Let us see how this claim is proved for groups. If X and Y are groups, f : XY is a homomorphism with inverse g : YX, and υ,ν, and w are elements of Y with υν = w, then we must show that g(υ) g(ν) = g(w). To do this, let a = g(u), b = g(ν), and d = g (ω). Since f and g are inverse functions, f(a) = u, f(b) = ν, and f(d) w. Now let c = ab. Then w = uv = f(c)f(b) = f (c), since f is a homomorphism. But then f (c) = f (d), which implies that c = d (just apply the function g to f (c) and f (d)). Therefore cb = d, which tells us that g(υ)g(ν) = g(w), as we needed to show.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset