Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 7 Linear Algebra: Matrices, Vectors, Determinants. Linear Systems

Linear algebra is a fairly extensive subject that covers vectors and matrices, determinants, systems of linear equations, vector spaces and linear transformations, eigenvalue problems, and other topics. As an area of study it has a broad appeal in that it has many applications in engineering, physics, geometry, computer science, economics, and other areas. It also contributes to a deeper understanding of mathematics itself.

Matrices, which are rectangular arrays of numbers or functions, and vectors are the main tools of linear algebra. Matrices are important because they let us express large amounts of data and functions in an organized and concise form. Furthermore, since matrices are single objects, we denote them by single letters and calculate with them directly. All these features have made matrices and vectors very popular for expressing scientific and mathematical ideas.

The chapter keeps a good mix between applications (electric networks, Markov processes, traffic flow, etc.) and theory. Chapter 7 is structured as follows: Sections 7.1 and 7.2 provide an intuitive introduction to matrices and vectors and their operations, including matrix multiplication. The next block of sections, that is, Secs. 7.3–7.5 provide the most important method for solving systems of linear equations by the Gauss elimination method. This method is a cornerstone of linear algebra, and the method itself and variants of it appear in different areas of mathematics and in many applications. It leads to a consideration of the behavior of solutions and concepts such as rank of a matrix, linear independence, and bases. We shift to determinants, a topic that has declined in importance, in Secs. 7.6 and 7.7. Section 7.8 covers inverses of matrices. The chapter ends with vector spaces, inner product spaces, linear transformations, and composition of linear transformations. Eigenvalue problems follow in Chap. 8.

COMMENT. Numeric linear algebra (Secs. 20.1–20.5) can be studied immediately after this chapter.

Prerequisite: None.

Sections that may be omitted in a short course: 7.5, 7.9.

References and Answers to Problems: App. 1 Part B, and App. 2.

7.1 Matrices, Vectors: Addition and Scalar Multiplication

The basic concepts and rules of matrix and vector algebra are introduced in Secs. 7.1 and 7.2 and are followed by linear systems (systems of linear equations), a main application, in Sec. 7.3.

Let us first take a leisurely look at matrices before we formalize our discussion. A matrix is a rectangular array of numbers or functions which we will enclose in brackets. For example,

are matrices. The numbers (or functions) are called entries or, less commonly, elements of the matrix. The first matrix in (1) has two rows, which are the horizontal lines of entries. Furthermore, it has three columns, which are the vertical lines of entries. The second and third matrices are square matrices, which means that each has as many rows as columns—3 and 2, respectively. The entries of the second matrix have two indices, signifying their location within the matrix. The first index is the number of the row and the second is the number of the column, so that together the entry's position is uniquely identified. For example, a₂₃ (read a two three) is in Row 2 and Column 3, etc. The notation is standard and applies to all matrices, including those that are not square.

Matrices having just a single row or column are called vectors. Thus, the fourth matrix in (1) has just one row and is called a row vector. The last matrix in (1) has just one column and is called a column vector. Because the goal of the indexing of entries was to uniquely identify the position of an element within a matrix, one index suffices for vectors, whether they are row or column vectors. Thus, the third entry of the row vector in (1) is denoted by a₃.

Matrices are handy for storing and processing data in applications. Consider the following two common examples.

EXAMPLE 1 Linear Systems, a Major Application of Matrices

We are given a system of linear equations, briefly a linear system, such as

where x₁, x₂, x₃ are the unknowns. We form the coefficient matrix, call it A, by listing the coefficients of the unknowns in the position in which they appear in the linear equations. In the second equation, there is no unknown x₂, which means that the coefficient of x₂ is 0 and hence in matrix A, a₂₂ = 0, Thus,

by augmenting A with the right sides of the linear system and call it the augmented matrix of the system.

Since we can go back and recapture the system of linear equations directly from the augmented matrix Ã, Ã contains all the information of the system and can thus be used to solve the linear system. This means that we can just use the augmented matrix to do the calculations needed to solve the system. We shall explain this in detail in Sec. 7.3. Meanwhile you may verify by substitution that the solution is .

The notation x₁, x₂, x₃ for the unknowns is practical but not essential; we could choose x, y, z or some other letters.

EXAMPLE 2 Sales Figures in Matrix Form

Sales figures for three products I, II, III in a store on Monday (Mon), Tuesday (Tues), may for each week be arranged in a matrix

If the company has 10 stores, we can set up 10 such matrices, one for each store. Then, by adding corresponding entries of these matrices, we can get a matrix showing the total sales of each product on each day. Can you think of other data which can be stored in matrix form? For instance, in transportation or storage problems? Or in listing distances in a network of roads?

General Concepts and Notations

Let us formalize what we just have discussed. We shall denote matrices by capital boldface letters A, B, C, …, or by writing the general entry in brackets; thus A = [a_jk], and so on. By an m × n matrix (read m by n matrix) we mean a matrix with m rows and n columns—rows always come first! m × n is called the size of the matrix. Thus an m × n matrix is of the form

The matrices in (1) are of sizes 2 × 3, 3 × 3, 2 × 2, 1 × 3, and 2 × 1, respectively.

Each entry in (2) has two subscripts. The first is the row number and the second is the column number. Thus a₂₁ is the entry in Row 2 and Column 1.

If m = n we call A an n × n square matrix. Then its diagonal containing the entries a₁₁, a₂₂, …, a_nn is called the main diagonal of A. Thus the main diagonals of the two square matrices in (1) are a₁₁, a₂₂, a₃₃ and e^−x, 4x, respectively.

Square matrices are particularly important, as we shall see. A matrix of any size m × n is called a rectangular matrix; this includes square matrices as a special case.

Vectors

A vector is a matrix with only one row or column. Its entries are called the components of the vector. We shall denote vectors by lowercase boldface letters a, b, … or by its general component in brackets, a = [a_j], and so on. Our special vectors in (1) suggest that a (general) row vector is of the form

A column vector is of the form

Addition and Scalar Multiplication of Matrices and Vectors

What makes matrices and vectors really useful and particularly suitable for computers is the fact that we can calculate with them almost as easily as with numbers. Indeed, we now introduce rules for addition and for scalar multiplication (multiplication by numbers) that were suggested by practical applications. (Multiplication of matrices by matrices follows in the next section.) We first need the concept of equality.

DEFINITION Equality of Matrices

Two matrices A = [a_jk] and B = [b_jk] are equal, written A = B, if and only if they have the same size and the corresponding entries are equal, that is, a₁₁ = b₁₁, and a₁₂ = b₁₂, so on. Matrices that are not equal are called different. Thus, matrices of different sizes are always different.

EXAMPLE 3 Equality of Matrices

Let

Then

The following matrices are all different. Explain!

DEFINITION Addition of Matrices

The sum of two matrices A = [a_jk] and B = [b_jk] of the same size is written A + B and has the entries a_jk + b_jk obtained by adding the corresponding entries of A and B. Matrices of different sizes cannot be added.

As a special case, the sum a + b of two row vectors or two column vectors, which must have the same number of components, is obtained by adding the corresponding components.

EXAMPLE 4 Addition of Matrices and Vectors

A in Example 3 and our present A cannot be added. If a = [5 7 2] and b = [−6 2 0], then a + b = [−1 9 2].

An application of matrix addition was suggested in Example 2. Many others will follow.

DEFINITION Scalar Multiplication (Multiplication by a Number)

The product of any m × n matrix A = [a_jk] and any scalar c (number c) is written cA and is the m × n matrix cA = [ca_jk] obtained by multiplying each entry of A by c.

Here (−1)A is simply written −A and is called the negative of A. Similarly, (−k)A is written −kA. Also, A + (−B) is written A − B and is called the difference of A and B (which must have the same size!).

EXAMPLE 5 Scalar Multiplication

If a matrix B shows the distances between some cities in miles, 1.609B gives these distances in kilometers.

Rules for Matrix Addition and Scalar Multiplication. From the familiar laws for the addition of numbers we obtain similar laws for the addition of matrices of the same size m × n, namely,

Here 0 denotes the zero matrix (of size m × n), that is, the m × n matrix with all entries zero. If m = 1 or n = 1, this is a vector, called a zero vector.

Hence matrix addition is commutative and associative [by (3a) and (3b)]. Similarly, for scalar multiplication we obtain the rules

PROBLEM SET 7.1

1–7 GENERAL QUESTIONS

Equality. Give reasons why the five matrices in Example 3 are all different.
Double subscript notation. If you write the matrix in Example 2 in the form A = [a_jk], what is a₃₁? a₁₃? a₂₆? a₃₃?
Sizes. What sizes do the matrices in Examples 1, 2, 3, and 5 have?
Main diagonal. What is the main diagonal of A in Example 1? Of A and B in Example 3?
Scalar multiplication. If A in Example 2 shows the number of items sold, what is the matrix B of units sold if a unit consists of (a) 5 items and (b) 10 items?
If a 12 × 12 matrix A shows the distances between 12 cities in kilometers, how can you obtain from A the matrix B showing these distances in miles?
Addition of vectors. Can you add: A row and a column vector with different numbers of components? With the same number of components? Two row vectors with the same number of components but different numbers of zeros? A vector and a scalar? A vector with four components and a 2 × 2 matrix?

8–16 ADDITION AND SCALAR MULTIPLICATION OF MATRICES AND VECTORS

Let

Find the following expressions, indicating which of the rules in (3) or (4) they illustrate, or give reasons why they are not defined.

8. 2A + 4B, 4B + 2A, 0A + B, 0.4B − 4.2A
9. 3A, 0.5B, 3A + 0.5B, 3A + 0.5B + C
10. (4 · 3)A, 4(3A), 14B − 3B, 11B
11. 8C + 10D, 2(5D + 4C), 0.6C − 0.6D, 0.6(C − D)
12. (C + D) + E, (D + E) + C, 0(C − E) + 4D, A − 0C
13. (2 · 7)C, 2(7C), −D + 0E, E − D + C + u
14.
15. (u + v) − w, u + (v − w), C + 0w, 0E + u − v
16. 15v − 3w − 0u, −3w + 15v, D − u + 3C, 8.5w − 11.1u + 0.4v
17. Resultant of forces. If the above vectors u, v, w represent forces in space, their sum is called their resultant. Calculate it.
18. Equilibrium. By definition, forces are in equilibrium if their resultant is the zero vector. Find a force p such that the above u, v, w, and p are in equilibrium.
19. General rules. Prove (3) and (4) for general 2 × 3 matrices and scalars c and k.
20. TEAM PROJECT. Matrices for Networks. Matrices have various engineering applications, as we shall see. For instance, they can be used to characterize connections in electrical networks, in nets of roads, in production processes, etc., as follows.
(a) Nodal Incidence Matrix. The network in Fig. 155 consists of six branches (connections) and four nodes (points where two or more branches come together). One node is the reference node (grounded node, whose voltage is zero). We number the other nodes and number and direct the branches. This we do arbitrarily. The network can now be described by a matrix A = [a_jk], where

A is called the nodal incidence matrix of the network. Show that for the network in Fig. 155 the matrix A has the given form.

Fig. 155. Network and nodal incidence matrix in Team Project 20(a)

(b) Find the nodal incidence matrices of the networks in Fig. 156.

Fig. 156. Electrical networks in Team Project 20(b)

(c) Sketch the three networks corresponding to the nodal incidence matrices

(d) Mesh Incidence Matrix. A network can also be characterized by the mesh incidence matrix M = [m_jk], where

and a mesh is a loop with no branch in its interior (or in its exterior). Here, the meshes are numbered and directed (oriented) in an arbitrary fashion. Show that for the network in Fig. 157, the matrix M has the given form, where Row 1 corresponds to mesh 1, etc.

Fig. 157. Network and matrix M in Team Project 20(d)

7.2 Matrix Multiplication

Matrix multiplication means that one multiplies matrices by matrices. Its definition is standard but it looks artificial. Thus you have to study matrix multiplication carefully, multiply a few matrices together for practice until you can understand how to do it. Here then is the definition. (Motivation follows later.)

DEFINITION Multiplication of a Matrix by a Matrix

The product C = AB (in this order) of an m × n matrix A = [a_jk] times an r × p matrix B = [b_jk] is defined if and only if r = n and is then the m × p matrix C = [c_jk] with entries

The condition r = n means that the second factor, B, must have as many rows as the first factor has columns, namely n. A diagram of sizes that shows when matrix multiplication is possible is as follows:

The entry c_jk in (1) is obtained by multiplying each entry in the jth row of A by the corresponding entry in the kth column of B and then adding these n products. For instance, c₂₁ = a₂₁b₁₁ + a₂₂b₂₁ + … + a_2nb_n1, and so on. One calls this briefly a multiplication of rows into columns. For n = 3, this is illustrated by

where we shaded the entries that contribute to the calculation of entry c₂₁ just discussed.

Matrix multiplication will be motivated by its use in linear transformations in this section and more fully in Sec. 7.9.

Let us illustrate the main points of matrix multiplication by some examples. Note that matrix multiplication also includes multiplying a matrix by a vector, since, after all, a vector is a special matrix.

EXAMPLE 1 Matrix Multiplication

Here c₁₁ = 3 · 2 + 5 · 5 + (−1) · 9 = 22, and so on. The entry in the box is c₂₃ = 4 · 3 + 0 · 7 + 2 · 1 = 14. The product BA is not defined.

EXAMPLE 2 Multiplication of a Matrix and a Vector

EXAMPLE 3 Products of Row and Column Vectors

EXAMPLE 4 CAUTION! Matrix Multiplication Is Not Commutative, AB ≠ BA in General

This is illustrated by Examples 1 and 2, where one of the two products is not even defined, and by Example 3, where the two products have different sizes. But it also holds for square matrices. For instance,

It is interesting that this also shows that AB = 0 does not necessarily imply BA = 0 or A = 0 or B = 0. We shall discuss this further in Sec. 7.8, along with reasons when this happens.

Our examples show that in matrix products the order of factors must always be observed very carefully. Otherwise matrix multiplication satisfies rules similar to those for numbers, namely.

provided A, B, and C are such that the expressions on the left are defined; here, k is any scalar. (2b) is called the associative law. (2c) and (2d) are called the distributive laws.

Since matrix multiplication is a multiplication of rows into columns, we can write the defining formula (1) more compactly as

where a_j is the jth row vector of A and b_k is the kth column vector of B, so that in agreement with (1),

EXAMPLE 5 Product in Terms of Row and Column Vectors

If A = [a_jk] is of size 3 × 3 and B = [b_jk] is of size 3 × 4, then

Taking a₁ = [3 5 −1], a₂ = [4 0 2], etc., verify (4) for the product in Example 1.

Parallel processing of products on the computer is facilitated by a variant of (3) for computing C = AB, which is used by standard algorithms (such as in Lapack). In this method, A is used as given, B is taken in terms of its column vectors, and the product is computed columnwise; thus,

Columns of B are then assigned to different processors (individually or several to each processor), which simultaneously compute the columns of the product matrix Ab₁, Ab₂, etc.

EXAMPLE 6 Computing Products Columnwise by (5)

To obtain

from (5), calculate the columns

of AB and then write them as a single matrix, as shown in the first formula on the right.

Motivation of Multiplication by Linear Transformations

Let us now motivate the “unnatural” matrix multiplication by its use in linear transformations. For n = 2 variables these transformations are of the form

and suffice to explain the idea. (For general n they will be discussed in Sec. 7.9.) For instance, (6*) may relate an x₁x₂-coordinate system to a y₁y₂-coordinate system in the plane. In vectorial form we can write (6*) as

Now suppose further that the x₁x₂-system is related to a w₁w₂-system by another linear transformation, say,

Then the y₁y₂-system is related to the w₁w₂-system indirectly via the x₁x₂-system, and we wish to express this relation directly. Substitution will show that this direct relation is a linear transformation, too, say,

Indeed, substituting (7) into (6), we obtain

Comparing this with (8), we see that

This proves that C = AB with the product defined as in (1). For larger matrix sizes the idea and result are exactly the same. Only the number of variables changes. We then have m variables y and n variables x and p variables w. The matrices A, B, and C = AB then have sizes m × n, n × p, and m × p, respectively. And the requirement that C be the product AB leads to formula (1) in its general form. This motivates matrix multiplication.

Transposition

We obtain the transpose of a matrix by writing its rows as columns (or equivalently its columns as rows). This also applies to the transpose of vectors. Thus, a row vector becomes a column vector and vice versa. In addition, for square matrices, we can also “reflect” the elements along the main diagonal, that is, interchange entries that are symmetrically positioned with respect to the main diagonal to obtain the transpose. Hence a₁₂ becomes a₂₁, a₃₁ becomes a₁₃, and so forth. Example 7 illustrates these ideas. Also note that, if A is the given matrix, then we denote its transpose by A^T.

EXAMPLE 7 Transposition of Matrices and Vectors

A little more compactly, we can write

Furthermore, the transpose [6 2 3]^T of the row vector [6 2 3] is the column vector

DEFINITION Transposition of Matrices and Vectors

The transpose of an m × n matrix A = [a_jk] is the n × m matrix A^T (read A transpose) that has the first row of A as its first column, the second row of A as its second column, and so on. Thus the transpose of A in (2) is A^T = [a_kj], written out

As a special case, transposition converts row vectors to column vectors and conversely.

Transposition gives us a choice in that we can work either with the matrix or its transpose, whichever is more convenient.

Rules for transposition are

CAUTION! Note that in (10d) the transposed matrices are in reversed order. We leave the proofs as an exercise in Probs. 9 and 10.

Special Matrices

Certain kinds of matrices will occur quite frequently in our work, and we now list the most important ones of them.

Symmetric and Skew-Symmetric Matrices. Transposition gives rise to two useful classes of matrices. Symmetric matrices are square matrices whose transpose equals the matrix itself. Skew-symmetric matrices are square matrices whose transpose equals minus the matrix. Both cases are defined in (11) and illustrated by Example 8.

EXAMPLE 8 Symmetric and Skew-Symmetric Matrices

For instance, if a company has three building supply centers C₁, C₂, C₃, then A could show costs, say, a_jj for handling 1000 bags of cement at center C_j, and a_jk(j ≠ k) the cost of shipping 1000 bags from C_j to C_k to. Clearly, a_jk = a_kj if we assume shipping in the opposite direction will cost the same.

Symmetric matrices have several general properties which make them important. This will be seen as we proceed.

Triangular Matrices. Upper triangular matrices are square matrices that can have nonzero entries only on and above the main diagonal, whereas any entry below the diagonal must be zero. Similarly, lower triangular matrices can have nonzero entries only on and below the main diagonal. Any entry on the main diagonal of a triangular matrix may be zero or not.

EXAMPLE 9 Upper and Lower Triangular Matrices

Diagonal Matrices. These are square matrices that can have nonzero entries only on the main diagonal. Any entry above or below the main diagonal must be zero.

If all the diagonal entries of a diagonal matrix S are equal, say, c, we call S a scalar matrix because multiplication of any square matrix A of the same size by S has the same effect as the multiplication by a scalar, that is,

In particular, a scalar matrix, whose entries on the main diagonal are all 1, is called a unit matrix (or identity matrix) and is denoted by I_n or simply by I. For I, formula (12) becomes

EXAMPLE 10 Diagonal Matrix D. Scalar Matrix S. Unit Matrix I

Some Applications of Matrix Multiplication

EXAMPLE 11 Computer Production. Matrix Times Matrix

Supercomp Ltd produces two computer models PC1086 and PC1186. The matrix A shows the cost per computer (in thousands of dollars) and B the production figures for the year 2010 (in multiples of 10,000 units.) Find a matrix C that shows the shareholders the cost per quarter (in millions of dollars) for raw material, labor, and miscellaneous.

Solution.

Since cost is given in multiples of $1000 and production in multiples of 10,000 units, the entries of C are multiples of $10 millions; thus c₁₁ = 13.2 means $132 million, etc.

EXAMPLE 12 Weight Watching. Matrix Times Vector

Suppose that in a weight-watching program, a person of 185 lb burns 350 cal/hr in walking (3 mph), 500 in bicycling (13 mph), and 950 in jogging (5.5 mph). Bill, weighing 185 lb, plans to exercise according to the matrix shown. Verify the calculations (W = Walking, B = Bicycling, J = Jogging).

EXAMPLE 13 Markov Process. Powers of a Matrix. Stochastic Matrix

Suppose that the 2004 state of land use in a city of 60 mi² of built-up area is

Find the states in 2009, 2014, and 2019, assuming that the transition probabilities for 5-year intervals are given by the matrix A and remain practically the same over the time considered.

A is a stochastic matrix, that is, a square matrix with all entries nonnegative and all column sums equal to 1. Our example concerns a Markov process,¹ that is, a process for which the probability of entering a certain state depends only on the last state occupied (and the matrix A), not on any earlier state.

Solution. From the matrix A and the 2004 state we can compute the 2009 state,

To explain: The 2009 figure for C equals times the probability 0.7 that C goes into C, plus times the probability 0.1 that I goes into C, plus times the probability 0 that R goes into C. Together,

Similarly, the new R is 46.5%. We see that the 2009 state vector is the column vector

where the column vector x = [25 20 55]^T is the given 2004 state vector. Note that the sum of the entries of y is 100 [%]. Similarly, you may verify that for 2014 and 2019 we get the state vectors

Answer. In 2009 the commercial area will be 19.5% (11.7 mi²), the industrial 34% (20.4 mi²), and the residential 46.5% (27.9 mi²). For 2014 the corresponding figures are 17.05%, 43.80%, and 39.15%. For 2019 they are 16.315%, 50.660%, and 33.025%. (In Sec. 8.2 we shall see what happens in the limit, assuming that those probabilities remain the same. In the meantime, can you experiment or guess?)

PROBLEM SET 7.2

1–10 GENERAL QUESTIONS

Multiplication. Why is multiplication of matrices restricted by conditions on the factors?
Square matrix. What form does a 3 × 3 matrix have if it is symmetric as well as skew-symmetric?
Product of vectors. Can every 3 × 3 matrix be represented by two vectors as in Example 3?
Skew-symmetric matrix. How many different entries can a 4 × 4 skew-symmetric matrix have? An n × n skew-symmetric matrix?
Same questions as in Prob. 4 for symmetric matrices.
Triangular matrix. If U₁, U₂ are upper triangular and L₁, L₂ are lower triangular, which of the following are triangular?
Idempotent matrix, defined by A² = A. Can you find four 2 × 2 idempotent matrices?
Nilpotent matrix, defined by B^m = 0 for some m. Can you find three 2 × 2 nilpotent matrices?
Transposition. Can you prove (10a)–(10c) for 3 × 3 matrices? For m × n matrices?
Transposition. (a) Illustrate (10d) by simple examples. (b) Prove (10d).

11–20 MULTIPLICATION, ADDITION, AND TRANSPOSITION OF MATRICES AND VECTORS

Let

Showing all intermediate results, calculate the following expressions or give reasons why they are undefined:

11. AB, AB^T, BA, B^TA
12. AA^T, A², BB^T, B²
13. CC^T, BC, CD, C^TB
14. 3A − 2B, (3A − 2B)^T, 3A^T − 2B^T, (3A − 2B)^Ta^T
15. Aa, Aa^T, (Ab)^T, b^TA^T
16. BC, BC^T, Bb, b^TB
17. ABC, ABa, ABb, Ca^T
18. ab, ba, aA, Bb
19. 1.5a + 3.0b, 1.5a^T + 3.0b, (A − B)b, Ab − Bb
20. b^TAb, aBa^T, aCC^T, C^Tba
21. General rules. Prove (2) for 2 × 2 matrices A = [a_jk], B = [b_jk], C = [c_jk], and a general scalar.
22. Product. Write AB in Prob. 11 in terms of row and column vectors.
23. Product. Calculate AB in Prob. 11 columnwise. See Example 1.
24. Commutativity. Find all 2 × 2 matrices A = [a_jk] that commute with B = [b_jk], where b_jk = j + k.
25. TEAM PROJECT. Symmetric and Skew-Symmetric Matrices. These matrices occur quite frequently in applications, so it is worthwhile to study some of their most important properties.
(a) Verify the claims in (11) that a_kj = a_jk for a symmetric matrix, and a_kj = −a_jk for a skew-symmetric matrix. Give examples.

(b) Show that for every square matrix C the matrix C + C^T is symmetric and C − C^T is skew-symmetric. Write C in the form C = S + T, where S is symmetric and T is skew-symmetric and find S and T in terms of C. Represent A and B in Probs. 11–20 in this form.

(c) A linear combination of matrices A, B, C, …, M of the same size is an expression of the form

where a, …, m are any scalars. Show that if these matrices are square and symmetric, so is (14); similarly, if they are skew-symmetric, so is (14).

(d) Show that AB with symmetric A and B is symmetric if and only if A and B commute, that is, AB = BA.

(e) Under what condition is the product of skew-symmetric matrices skew-symmetric?

26–30 FURTHER APPLICATIONS

26. Production. In a production process, let N mean “no trouble” and T “trouble.” Let the transition probabilities from one day to the next be 0.8 for N → N, hence 0.2 for N → T, and 0.5 for T → N, hence 0.5 for T → T.
If today there is no trouble, what is the probability of N two days after today? Three days after today?
27. CAS Experiment. Markov Process. Write a program for a Markov process. Use it to calculate further steps in Example 13 of the text. Experiment with other stochastic 3 × 3 matrices, also using different starting values.
28. Concert subscription. In a community of 100,000 adults, subscribers to a concert series tend to renew their subscription with probability 90% and persons presently not subscribing will subscribe for the next season with probability 0.2%. If the present number of subscribers is 1200, can one predict an increase, decrease, or no change over each of the next three seasons?
29. Profit vector. Two factory outlets F₁ and F₂ in New York and Los Angeles sell sofas (S), chairs (C), and tables (T) with a profit of $35, $62, and $30, respectively. Let the sales in a certain week be given by the matrix

Introduce a “profit vector” p such that the components of v = Ap give the total profits of F₁ and F₂.
30. TEAM PROJECT. Special Linear Transformations. Rotations have various applications. We show in this project how they can be handled by matrices.
(a) Rotation in the plane. Show that the linear transformation y = Ax with

is a counterclockwise rotation of the Cartesian x₁x₂-coordinate system in the plane about the origin, where θ is the angle of rotation.

(b) Rotation through nθ. Show that in (a)

Is this plausible? Explain this in words.

(c) Addition formulas for cosine and sine. By geometry we should have

Derive from this the addition formulas (6) in App. A3.1.

(d) Computer graphics. To visualize a three-dimensional object with plane faces (e.g., a cube), we may store the position vectors of the vertices with respect to a suitable x₁x₂x₃-coordinate system (and a list of the connecting edges) and then obtain a two-dimensional image on a video screen by projecting the object onto a coordinate plane, for instance, onto the x₁x₂-plane by setting x₃ = 0. To change the appearance of the image, we can impose a linear transformation on the position vectors stored. Show that a diagonal matrix D with main diagonal entries 3, 1, gives from an x = [x_j] the new position vector y = Dx, where y₁ = 3x₁ (stretch in the x₁-direction by a factor 3), y₂ = x₂ (unchanged), (contraction in the x₃-direction). What effect would a scalar matrix have?

(e) Rotations in space. Explain y = Ax geometrically when A is one of the three matrices

What effect would these transformations have in situations such as that described in (d)?

7.3 Linear Systems of Equations. Gauss Elimination

We now come to one of the most important use of matrices, that is, using matrices to solve systems of linear equations. We showed informally, in Example 1 of Sec. 7.1, how to represent the information contained in a system of linear equations by a matrix, called the augmented matrix. This matrix will then be used in solving the linear system of equations. Our approach to solving linear systems is called the Gauss elimination method. Since this method is so fundamental to linear algebra, the student should be alert.

A shorter term for systems of linear equations is just linear systems. Linear systems model many applications in engineering, economics, statistics, and many other areas. Electrical networks, traffic flow, and commodity markets may serve as specific examples of applications.

Linear System, Coefficient Matrix, Augmented Matrix

A linear system of m equations in n unknowns x₁, …, x_n is a set of equations of the form

The system is called linear because each variable x_j appears in the first power only, just as in the equation of a straight line. a₁₁, …, a_mn are given numbers, called the coefficients of the system. b₁, …, b_m on the right are also given numbers. If all the b_j are zero, then (1) is called a homogeneous system. If at least one b_j is not zero, then (1) is called a nonhomogeneous system.

A solution of (1) is a set of numbers x₁, …, x_n that satisfies all the m equations. A solution vector of (1) is a vector x whose components form a solution of (1). If the system (1) is homogeneous, it always has at least the trivial solution x₁ = 0, …, x_n = 0.

Matrix Form of the Linear System (1). From the definition of matrix multiplication we see that the m equations of (1) may be written as a single vector equation

where the coefficient matrix A = [a_jk] is the m × n matrix

are column vectors. We assume that the coefficients a_jk are not all zero, so that A is not a zero matrix. Note that x has n components, whereas b has m components. The matrix

is called the augmented matrix of the system (1). The dashed vertical line could be omitted, as we shall do later. It is merely a reminder that the last column of Ã did not come from matrix A but came from vector b. Thus, we augmented the matrix A.

Note that the augmented matrix Ã determines the system (1) completely because it contains all the given numbers appearing in (1).

EXAMPLE 1 Geometric Interpretation. Existence and Uniqueness of Solutions

If m = n = 2, we have two equations in two unknowns x₁, x₂

If we interpret x₁, x₂ as coordinates in the x₁x₂-plane, then each of the two equations represents a straight line, and (x₁, x₂) is a solution if and only if the point P with coordinates x₁, x₂ lies on both lines. Hence there are three possible cases (see Fig. 158 on next page):

Precisely one solution if the lines intersect
Infinitely many solutions if the lines coincide
No solution if the lines are parallel

For instance,

Fig. 158. Three equations in three unknowns interpreted as planes in space

If the system is homogenous, Case (c) cannot happen, because then those two straight lines pass through the origin, whose coordinates (0, 0) constitute the trivial solution. Similarly, our present discussion can be extended from two equations in two unknowns to three equations in three unknowns. We give the geometric interpretation of three possible cases concerning solutions in Fig. 158. Instead of straight lines we have planes and the solution depends on the positioning of these planes in space relative to each other. The student may wish to come up with some specific examples.

Our simple example illustrated that a system (1) may have no solution. This leads to such questions as: Does a given system (1) have a solution? Under what conditions does it have precisely one solution? If it has more than one solution, how can we characterize the set of all solutions? We shall consider such questions in Sec. 7.5.

First, however, let us discuss an important systematic method for solving linear systems.

Gauss Elimination and Back Substitution

The Gauss elimination method can be motivated as follows. Consider a linear system that is in triangular form (in full, upper triangular form) such as

(Triangular means that all the nonzero entries of the corresponding coefficient matrix lie above the diagonal and form an upside-down 90° triangle.) Then we can solve the system by back substitution, that is, we solve the last equation for the variable, x₂ = −26/13 = −2, and then work backward, substituting x₂ = −2 into the first equation and solving it for x₁, obtaining . This gives us the idea of first reducing a general system to triangular form. For instance, let the given system be

We leave the first equation as it is. We eliminate x₁ from the second equation, to get a triangular system. For this we add twice the first equation to the second, and we do the same operation on the rows of the augmented matrix. This gives −4x₁ + 4x₁ + 3x₂ + 10x₂ = −30 + 2 · 2, that is,

where Row 2 + 2 Row 1 means “Add twice Row 1 to Row 2” in the original matrix. This is the Gauss elimination (for 2 equations in 2 unknowns) giving the triangular form, from which back substitution now yields x₂ = −2 and x₁ = 6, as before.

Since a linear system is completely determined by its augmented matrix, Gauss elimination can be done by merely considering the matrices, as we have just indicated. We do this again in the next example, emphasizing the matrices by writing them first and the equations behind them, just as a help in order not to lose track.

EXAMPLE 2 Gauss Elimination. Electrical Network

Solve the linear system

Derivation from the circuit in Fig. 159 (Optional). This is the system for the unknown currents x₁ = i₁, x₂ = i₂, x₃ = i₃ in the electrical network in Fig. 159. To obtain it, we label the currents as shown, choosing directions arbitrarily; if a current will come out negative, this will simply mean that the current flows against the direction of our arrow. The current entering each battery will be the same as the current leaving it. The equations for the currents result from Kirchhoff's laws:

Kirchhoff's Current Law (KCL).At any point of a circuit, the sum of the inflowing currents equals the sum of the outflowing currents.

Kirchhoff's Voltage Law (KVL).In any closed loop, the sum of all voltage drops equals the impressed electromotive force.

Node P gives the first equation, node Q the second, the right loop the third, and the left loop the fourth, as indicated in the figure.

Fig. 159. Network in Example 2 and equations relating the currents

Solution by Gauss Elimination. This system could be solved rather quickly by noticing its particular form. But this is not the point. The point is that the Gauss elimination is systematic and will work in general, also for large systems. We apply it to our system and then do back substitution. As indicated, let us write the augmented matrix of the system first and then the system itself:

Step 1. Elimination of x₁

Call the first row of A the pivot row and the first equation the pivot equation. Call the coefficient 1 of its x₁-term the pivot in this step. Use this equation to eliminate x₁ (get rid of x₁) in the other equations. For this, do:

Add 1 times the pivot equation to the second equation.

Add −20 times the pivot equation to the fourth equation.

This corresponds to row operations on the augmented matrix as indicated in BLUE behind the new matrix in (3). So the operations are performed on the preceding matrix. The result is

Step 2. Elimination of x₂

The first equation remains as it is. We want the new second equation to serve as the next pivot equation. But since it has no x₂-term (in fact, it is 0 = 0), we must first change the order of the equations and the corresponding rows of the new matrix. We put 0 = 0 at the end and move the third equation and the fourth equation one place up. This is called partial pivoting (as opposed to the rarely used total pivoting, in which the order of the unknowns is also changed). It gives

To eliminate x₂, do:

Add −3 times the pivot equation to the third equation.

The result is

Back Substitution. Determination of x₃, x₂, x₁ (in this order)

Working backward from the last to the first equation of this “triangular” system (4), we can now readily find x₃, then x₂, and then x₁:

where A stands for “amperes.” This is the answer to our problem. The solution is unique.

Elementary Row Operations. Row-Equivalent Systems

Example 2 illustrates the operations of the Gauss elimination. These are the first two of three operations, which are called

Elementary Row Operations for Matrices:

Interchange of two rows

Addition of a constant multiple of one row to another row

Multiplication of a row by a nonzero constant c

CAUTION! These operations are for rows, not for columns! They correspond to the following

Elementary Operations for Equations:

Interchange of two equations

Addition of a constant multiple of one equation to another equation

Multiplication of an equation by a nonzero constant c

Clearly, the interchange of two equations does not alter the solution set. Neither does their addition because we can undo it by a corresponding subtraction. Similarly for their multiplication, which we can undo by multiplying the new equation by 1/c (since c ≠ 0), producing the original equation.

We now call a linear system S₁ row-equivalent to a linear system S₂ if S₁ can be obtained from S₂ by (finitely many!) row operations. This justifies Gauss elimination and establishes the following result.

THEOREM 1 Row-Equivalent Systems

Row-equivalent linear systems have the same set of solutions.

Because of this theorem, systems having the same solution sets are often called equivalent systems. But note well that we are dealing with row operations. No column operations on the augmented matrix are permitted in this context because they would generally alter the solution set.

A linear system (1) is called overdetermined if it has more equations than unknowns, as in Example 2, determined if m = n, as in Example 1, and underdetermined if it has fewer equations than unknowns.

Furthermore, a system (1) is called consistent if it has at least one solution (thus, one solution or infinitely many solutions), but inconsistent if it has no solutions at all, as x₁ + x₂ = 1, x₁ + x₂ = 0 in Example 1, Case (c).

Gauss Elimination: The Three Possible Cases of Systems

We have seen, in Example 2, that Gauss elimination can solve linear systems that have a unique solution. This leaves us to apply Gauss elimination to a system with infinitely many solutions (in Example 3) and one with no solution (in Example 4).

EXAMPLE 3 Gauss Elimination if Infinitely Many Solutions Exist

Solve the following linear system of three equations in four unknowns whose augmented matrix is

Solution. As in the previous example, we circle pivots and box terms of equations and corresponding entries to be eliminated. We indicate the operations in terms of equations and operate on both equations and matrices.

Step 1. Elimination of x₁ from the second and third equations by adding

This gives the following, in which the pivot of the next step is circled.

Step 2. Elimination of x₂ from the third equation of (6) by adding

This gives

Back Substitution. From the second equation, x₂ = 1 − x₃ + 4x₄. From this and the first equation, x₁ = 2 − x₄. Since x₃ and x₄ remain arbitrary, we have infinitely many solutions. If we choose a value of x₃ and a value of x₄, then the corresponding values of x₁ and x₂ are uniquely determined.

On Notation. If unknowns remain arbitrary, it is also customary to denote them by other letters t₁, t₂, …. In this example we may thus write x₁ = 2 − x₄ = 2 − t₂, x₂ = 1 − x₃ + 4x₄ = 1 − t₁ + 4t₂, x₃ = t₁ (first arbitrary unknown), x₄ = t₂ (second arbitrary unknown).

EXAMPLE 4 Gauss Elimination if no Solution Exists

What will happen if we apply the Gauss elimination to a linear system that has no solution? The answer is that in this case the method will show this fact by producing a contradiction. For instance, consider

Step 1. Elimination of x₁ from the second and third equations by adding

This gives

Step 2. Elimination of x₂ from the third equation gives

The false statement 0 = 12 shows that the system has no solution.

Row Echelon Form and Information From It

At the end of the Gauss elimination the form of the coefficient matrix, the augmented matrix, and the system itself are called the row echelon form. In it, rows of zeros, if present, are the last rows, and, in each nonzero row, the leftmost nonzero entry is farther to the right than in the previous row. For instance, in Example 4 the coefficient matrix and its augmented in row echelon form are

Note that we do not require that the leftmost nonzero entries be 1 since this would have no theoretic or numeric advantage. (The so-called reduced echelon form, in which those entries are 1, will be discussed in Sec. 7.8.)

The original system of m equations in n unknowns has augmented matrix [A|b]. This is to be row reduced to matrix [R|f]. The two systems Ax = b and Rx = f are equivalent: if either one has a solution, so does the other, and the solutions are identical.

At the end of the Gauss elimination (before the back substitution), the row echelon form of the augmented matrix will be

Here, r m, r₁₁ ≠ 0, and all entries in the blue triangle and blue rectangle are zero.

The number of nonzero rows, r, in the row-reduced coefficient matrix R is called the rank of R and also the rank of A. Here is the method for determining whether Ax = b has solutions and what they are:

No solution. If r is less than m (meaning that R actually has at least one row of all 0s) and at least one of the numbers f_r+1, f_r+2, …, f_m is not zero, then the system Rx = f is inconsistent: No solution is possible. Therefore the system Ax = b is inconsistent as well. See Example 4, where r = 2 < m = 3 and f_r+1 = f₃ = 12.

If the system is consistent (either r = m, or r < m and all the numbers f_r+1, f_r+2, …, f_m are zero), then there are solutions.

(b) Unique solution. If the system is consistent and r = n, there is exactly one solution, which can be found by back substitution. See Example 2, where r = n = 3 and m = 4.
(c) Infinitely many solutions. To obtain any of these solutions, choose values of x_r+1, …, x_n arbitrarily. Then solve the rth equation for x_r (in terms of those arbitrary values), then the (r − 1)st equation for x_r−1, and so on up the line. See Example 3.

Orientation. Gauss elimination is reasonable in computing time and storage demand. We shall consider those aspects in Sec. 20.1 in the chapter on numeric linear algebra. Section 7.4 develops fundamental concepts of linear algebra such as linear independence and rank of a matrix. These in turn will be used in Sec. 7.5 to fully characterize the behavior of linear systems in terms of existence and uniqueness of solutions.

PROBLEM SET 7.3

1–14 GAUSS ELIMINATION

Solve the linear system given explicitly or by its augmented matrix. Show details.

4x − 6y = −11
−3x + 8y = 10
Equivalence relation. By definition, an equivalence relation on a set is a relation satisfying three conditions: (named as indicated)
1. Each element A of the set is equivalent to itself (Reflexivity).
2. If A is equivalent to B, then B is equivalent to A (Symmetry).
3. If A is equivalent to B and B is equivalent to C, then A is equivalent to C (Transitivity).
Show that row equivalence of matrices satisfies these three conditions. Hint. Show that for each of the three elementary row operations these conditions hold.
CAS PROJECT. Gauss Elimination and Back Substitution. Write a program for Gauss elimination and back substitution (a) that does not include pivoting and (b) that does include pivoting. Apply the programs to Probs. 11–14 and to some larger systems of your choice.

17–21 MODELS OF NETWORKS

In Probs. 17–19, using Kirchhoff's laws (see Example 2) and showing the details, find the currents:

17.
18.
19.
20. Wheatstone bridge. Show that if R_x/R₃ = R₁/R₂ in the figure, then I = 0. (R₀ is the resistance of the instrument by which I is measured.) This bridge is a method for determining R_x, R₁, R₂, R₃ are known. R₃ is variable. To get R_x, make I = 0 by varying R₃. Then calculate R_x = R₃R₁/R₂.
21. Traffic flow. Methods of electrical circuit analysis have applications to other fields. For instance, applying the analog of Kirchhoff's Current Law, find the traffic flow (cars per hour) in the net of one-way streets (in the directions indicated by the arrows) shown in the figure. Is the solution unique?
22. Models of markets. Determine the equilibrium solution (D₁ = S₁, D₂ = S₂) of the two-commodity market with linear model (D, S, P = demand, supply, price; index 1 = first commodity, index 2 = second commodity)
23. Balancing a chemical equation x₁C₃H₈ + x₂O₂ → x₃CO₂ + x₄H₂O means finding integer x₁, x₂, x₃, x₄ such that the numbers of atoms of carbon (C), hydrogen (H), and oxygen (O) are the same on both sides of this reaction, in which propane C₃H₈ and O₂ give carbon dioxide and water. Find the smallest positive integers x₁, …, x₄.
24. PROJECT. Elementary Matrices. The idea is that elementary operations can be accomplished by matrix multiplication. If A is an m × n matrix on which we want to do an elementary operation, then there is a matrix E such that EA is the new matrix after the operation. Such an E is called an elementary matrix. This idea can be helpful, for instance, in the design of algorithms. (Computationally, it is generally preferable to do row operations directly, rather than by multiplication by E.)
(a) Show that the following are elementary matrices, for interchanging Rows 2 and 3, for adding −5 times the first row to the third, and for multiplying the fourth row by 8.

Apply E₁, E₂, E₃ to a vector and to a 4 × 3 matrix of your choice. Find B = E₃E₂E₁A, where A = [a_jk] is the general 4 × 2 matrix. Is B equal to C = E₁E₂E₃A?

(b) Conclude that E₁, E₂, E₃ are obtained by doing the corresponding elementary operations on the 4 × 4 unit matrix. Prove that if M is obtained from A by an elementary row operation, then

where E is obtained from the n × n unit matrix I_n by the same row operation.

7.4 Linear Independence. Rank of a Matrix. Vector Space

Since our next goal is to fully characterize the behavior of linear systems in terms of existence and uniqueness of solutions (Sec. 7.5), we have to introduce new fundamental linear algebraic concepts that will aid us in doing so. Foremost among these are linear independence and the rank of a matrix. Keep in mind that these concepts are intimately linked with the important Gauss elimination method and how it works.

Linear Independence and Dependence of Vectors

Given any set of m vectors a₍₁₎, …, a_(m) (with the same number of components), a linear combination of these vectors is an expression of the form

where c₁, c₂, …, c_m are any scalars. Now consider the equation

Clearly, this vector equation (1) holds if we choose all c_j's zero, because then it becomes 0 = 0. If this is the only m-tuple of scalars for which (1) holds, then our vectors a₍₁₎, …, a_(m) are said to form a linearly independent set or, more briefly, we call them linearly independent. Otherwise, if (1) also holds with scalars not all zero, we call these vectors linearly dependent. This means that we can express at least one of the vectors as a linear combination of the other vectors. For instance, if (1) holds with, say, c₁ ≠ 0, we can solve (1) for a₍₁₎:

(Some k_j's may be zero. Or even all of them, namely, if a₍₁₎ = 0.)

Why is linear independence important? Well, if a set of vectors is linearly dependent, then we can get rid of at least one or perhaps more of the vectors until we get a linearly independent set. This set is then the smallest “truly essential” set with which we can work. Thus, we cannot express any of the vectors, of this set, linearly in terms of the others.

EXAMPLE 1 Linear Independence and Dependence

The three vectors

are linearly dependent because

Although this is easily checked by vector arithmetic (do it!), it is not so easy to discover. However, a systematic method for finding out about linear independence and dependence follows below.

The first two of the three vectors are linearly independent because c₁a₍₁₎ + c₂a₍₂₎ = 0 implies c₂ = 0 (from the second components) and then c₁ = 0 (from any other component of a₍₁₎.

Rank of a Matrix

DEFINITION

The rank of a matrix A is the maximum number of linearly independent row vectors of A. It is denoted by rank A.

Our further discussion will show that the rank of a matrix is an important key concept for understanding general properties of matrices and linear systems of equations.

EXAMPLE 2 Rank

The matrix

has rank 2, because Example 1 shows that the first two row vectors are linearly independent, whereas all three row vectors are linearly dependent.

Note further that rank A = 0 if and only if A = 0. This follows directly from the definition.

We call a matrix A₁ row-equivalent to a matrix A₂ if A₁ can be obtained from A₂ by (finitely many!) elementary row operations.

Now the maximum number of linearly independent row vectors of a matrix does not change if we change the order of rows or multiply a row by a nonzero c or take a linear combination by adding a multiple of a row to another row. This shows that rank is invariant under elementary row operations:

THEOREM 1 Row-Equivalent Matrices

Row-equivalent matrices have the same rank.

Hence we can determine the rank of a matrix by reducing the matrix to row-echelon form, as was done in Sec. 7.3. Once the matrix is in row-echelon form, we count the number of nonzero rows, which is precisely the rank of the matrix.

EXAMPLE 3 Determination of Rank

For the matrix in Example 2 we obtain successively

The last matrix is in row-echelon form and has two nonzero rows. Hence rank A = 2, as before.

Examples 1–3 illustrate the following useful theorem (with p = 3, n = 3, and the rank of the matrix = 2).

THEOREM 2 Linear Independence and Dependence of Vectors

Consider p vectors that each have n components. Then these vectors are linearly independent if the matrix formed, with these vectors as row vectors, has rank p. However, these vectors are linearly dependent if that matrix has rank less than p.

Further important properties will result from the basic

THEOREM 3 Rank in Terms of Column Vectors

The rank r of a matrix A equals the maximum number of linearly independent column vectors of A.

Hence A and its transpose A^T have the same rank.

PROOF

In this proof we write simply “rows” and “columns” for row and column vectors. Let A be an m × n matrix of rank A = r. Then by definition of rank, A has r linearly independent rows which we denote by v₍₁₎, …, v_(r) (regardless of their position in A), and all the rows a₍₁₎, …, a_(m) of A are linear combinations of those, say,

These are vector equations for rows. To switch to columns, we write (3) in terms of components as n such systems, with k = 1, …, n,

and collect components in columns. Indeed, we can write (4) as

where k = 1, …, n. Now the vector on the left is the kth column vector of A. We see that each of these n columns is a linear combination of the same r columns on the right. Hence A cannot have more linearly independent columns than rows, whose number is rank A = r. Now rows of A are columns of the transpose A^T. For A^T our conclusion is that A_T cannot have more linearly independent columns than rows, so that A cannot have more linearly independent rows than columns. Together, the number of linearly independent columns of A must be r, the rank of A. This completes the proof.

EXAMPLE 4 Illustration of Theorem 3

The matrix in (2) has rank 2. From Example 3 we see that the first two row vectors are linearly independent and by “working backward” we can verify that Row 3 = 6 Row Row 2. Similarly, the first two columns are linearly independent, and by reducing the last matrix in Example 3 by columns we find that

Combining Theorems 2 and 3 we obtain

THEOREM 4 Linear Dependence of Vectors

Consider p vectors each having n components. If n < p, then these vectors are linearly dependent.

PROOF

The matrix A with those p vectors as row vectors has p rows and n < p columns; hence by Theorem 3 it has rank A n < p, which implies linear dependence by Theorem 2.

Vector Space

The following related concepts are of general interest in linear algebra. In the present context they provide a clarification of essential properties of matrices and their role in connection with linear systems.

Consider a nonempty set V of vectors where each vector has the same number of components. If, for any two vectors a and b in V, we have that all their linear combinations αa + βb (α, β any real numbers) are also elements of V, and if, furthermore, a and b satisfy the laws (3a), (3c), (3d), and (4) in Sec. 7.1, as well as any vectors a, b, c in V satisfy (3b) then V is a vector space. Note that here we wrote laws (3) and (4) of Sec. 7.1 in lowercase letters a, b, c, which is our notation for vectors. More on vector spaces in Sec. 7.9.

The maximum number of linearly independent vectors in V is called the dimension of V and is denoted by dim V. Here we assume the dimension to be finite; infinite dimension will be defined in Sec. 7.9.

A linearly independent set in V consisting of a maximum possible number of vectors in V is called a basis for V. In other words, any largest possible set of independent vectors in V forms basis for V. That means, if we add one or more vector to that set, the set will be linearly dependent. (See also the beginning of Sec. 7.4 on linear independence and dependence of vectors.) Thus, the number of vectors of a basis for V equals dim V.

The set of all linear combinations of given vectors a₍₁₎, …, a_(p) with the same number of components is called the span of these vectors. Obviously, a span is a vector space. If in addition, the given vectors a₍₁₎, …, a_(p) are linearly independent, then they form a basis for that vector space.

This then leads to another equivalent definition of basis. A set of vectors is a basis for a vector space V if (1) the vectors in the set are linearly independent, and if (2) any vector in V can be expressed as a linear combination of the vectors in the set. If (2) holds, we also say that the set of vectors spans the vector space V.

By a subspace of a vector space V we mean a nonempty subset of V (including V itself) that forms a vector space with respect to the two algebraic operations (addition and scalar multiplication) defined for the vectors of V.

EXAMPLE 5 Vector Space, Dimension, Basis

The span of the three vectors in Example 1 is a vector space of dimension 2. A basis of this vector space consists of any two of those three vectors, for instance, a₍₁₎, a₍₂₎, or a₍₁₎, a₍₃₎, etc.

We further note the simple

THEOREM 5 Vector Space Rⁿ

The vector space Rⁿ consisting of all vectors with n components (n real numbers) has dimension n.

PROOF

A basis of n vectors is a₍₁₎ = [1 0 … 0], a₍₂₎ = [0 1 0 … 0], …, a_(n) = [0 … 0 1].

For a matrix A, we call the span of the row vectors the row space of A. Similarly, the span of the column vectors of A is called the column space of A.

Now, Theorem 3 shows that a matrix A has as many linearly independent rows as columns. By the definition of dimension, their number is the dimension of the row space or the column space of A. This proves

THEOREM 6 Row Space and Column Space

The row space and the column space of a matrix A have the same dimension, equal to rank A.

Finally, for a given matrix A the solution set of the homogeneous system Ax = 0 is a vector space, called the null space of A, and its dimension is called the nullity of A. In the next section we motivate and prove the basic relation

PROBLEM SET 7.4

1–10 RANK, ROW SPACE, COLUMN SPACE

Find the rank. Find a basis for the row space. Find a basis for the column space. Hint. Row-reduce the matrix and its transpose. (You may omit obvious factors from the vectors of these bases.)

CAS Experiment. Rank. (a) Show experimentally that the n × n matrix A = [a_jk] with a_jk = j + k − 1 has rank 2 for any n. (Problem 20 shows n = 4.) Try to prove it.
(b) Do the same when a_jk = j + k + c, where c is any positive integer.

(c) What is rank A if a_jk = 2^j+k−2? Try to find other large matrices of low rank independent of n.

12–16 GENERAL PROPERTIES OF RANK

Show the following:

12. rank B^TA^T = rank AB. (Note the order!)
13. rank A = rank B does not imply rank A² = rank B². (Give a counterexample.)
14. If A is not square, either the row vectors or the column vectors of A are linearly dependent.
15. If the row vectors of a square matrix are linearly independent, so are the column vectors, and conversely.
16. Give examples showing that the rank of a product of matrices cannot exceed the rank of either factor.

17–25 LINEAR INDEPENDENCE

Are the following sets of vectors linearly independent? Show the details of your work.

17. [3 4 0 2], [2 −1 3 7], [1 16 −12 −22]
18.
19. [0 1 1]. [1 1 1], [0 0 1]
20. [1 2 3 4], [2 3 4 5], [3 4 5 6], [4 5 6 7]
21. [2 0 0 7], [2 0 0 8], [2 0 0 9], [2 0 1 0]
22. [0.4 −0.2 0.2], [0 0 0], [3.0 −0.6 1.5]
23. [9 8 7 6 5], [9 7 5 3 1]
24. [4 −1 3], [0 8 1], [1 3 −5], [2 6 1]
25. [6 0 −1 3], [2 2 5 0] [−4 −4 −4 −4]
26. Linearly independent subset. Beginning with the last of the vectors [3 0 1 2], [6 1 0 0], [12 1 2 4], [6 0 2 4], and [9 0 1 2] omit one after another until you get a linearly independent set.

27–35 VECTOR SPACE

Is the given set of vectors a vector space? Give reasons. If your answer is yes, determine the dimension and find a basis. (υ₁, υ₂, … denote components.)

27. All vectors in R₃ with υ₁ − υ₂ + 2υ₃ = 0
28. All vectors in R³ with 3υ₂ + υ₃ = k
29. All vectors in R² with υ₁ υ₂
30. All vectors in Rⁿ with the first n − 2 components zero
31. All vectors in R⁵ with positive components
32. All vectors in R³ with 3υ₁ − 2υ₂ + υ₃ = 0, 4υ₁ + 5υ₂ = 0
33. All vectors in R³ with 3υ₁ − υ₃ = 0, 2υ₁ + 3υ₂ − 4υ₃ = 0
34. All vectors in Rⁿ with |υ_j| = 1 for j = 1, …, n
35. All vectors in R⁴ with υ₁ = 2υ₂ = 3ν₃ = 4υ₄

7.5 Solutions of Linear Systems: Existence, Uniqueness

Rank, as just defined, gives complete information about existence, uniqueness, and general structure of the solution set of linear systems as follows.

A linear system of equations in n unknowns has a unique solution if the coefficient matrix and the augmented matrix have the same rank n, and infinitely many solutions if that common rank is less than n. The system has no solution if those two matrices have different rank.

To state this precisely and prove it, we shall use the generally important concept of a submatrix of A. By this we mean any matrix obtained from A by omitting some rows or columns (or both). By definition this includes A itself (as the matrix obtained by omitting no rows or columns); this is practical.

THEOREM 1 Fundamental Theorem for Linear Systems

(a) Existence. A linear system of m equations in n unknowns x₁, …, x_n

is consistent, that is, has solutions, if and only if the coefficient matrix A and the augmented matrix Ã have the same rank. Here,

(b) Uniqueness. The system (1) has precisely one solution if and only if this common rank r of A and Ã equals n.

(c) Infinitely many solutions.If this common rank r is less than n, the system (1) has infinitely many solutions. All of these solutions are obtained by determining r suitable unknowns (whose submatrix of coefficients must have rank r) in terms of the remaining n − r unknowns, to which arbitrary values can be assigned. (See Example 3 in Sec. 7.3.)

(d) Gauss elimination (Sec. 7.3).If solutions exist, they can all be obtained by the Gauss elimination. (This method will automatically reveal whether or not solutions exist; see Sec. 7.3.)

PROOF

(a) We can write the system (1) in vector form Ax = b or in terms of column vectors c₍₁₎, …, c_(n) of A:

Ã is obtained by augmenting A by a single column b. Hence, by Theorem 3 in Sec. 7.4, rank Ã equals rank A or rank A + 1. Now if (1) has a solution x, then (2) shows that b must be a linear combination of those column vectors, so that Ã and A have the same maximum number of linearly independent column vectors and thus the same rank.

Conversely, if rank Ã = rank A, then b must be a linear combination of the column vectors of A, say,

since otherwise rank Ã = rank A + 1. But (2*) means that (1) has a solution, namely, x₁ = α₁, …, x_n = α_n, as can be seen by comparing (2*) and (2).

(b) If rank A = n, the n column vectors in (2) are linearly independent by Theorem 3 in Sec. 7.4. We claim that then the representation (2) of b is unique because otherwise

This would imply (take all terms to the left, with a minus sign)

and by linear independence. But this means that the scalars x₁, …, x_n in (2) are uniquely determined, that is, the solution of (1) is unique.

(c) If rank A = rank Ã = r < n, then by Theorem 3 in Sec. 7.4 there is a linearly independent set K of r column vectors of A such that the other n − r column vectors of A are linear combinations of those vectors. We renumber the columns and unknowns, denoting the renumbered quantities by ˆ, so that {ĉ₍₁₎, …, ĉ_(r)} is that linearly independent set K. Then (2) becomes

are linear combinations of the vectors of K, and so are the vectors . Expressing these vectors in terms of the vectors of K and collecting terms, we can thus write the system in the form

with , where β_j results from the n − r terms ; here, j = 1, …, r. Since the system has a solution, there are y₁, …, y_r satisfying (3). These scalars are unique since K is linearly independent. Choosing fixes the β_j and corresponding , where j = 1, …, r.

(d) This was discussed in Sec. 7.3 and is restated here as a reminder.

The theorem is illustrated in Sec. 7.3. In Example 2 there is a unique solution since rank Ã = rank A = n = 3 (as can be seen from the last matrix in the example). In Example 3 we have rank Ã = rank A = 2 < n = 4 and can choose x₃ and x₄ arbitrarily. In Example 4 there is no solution because rank A = 2 < rank Ã = 3.

Homogeneous Linear System

Recall from Sec. 7.3 that a linear system (1) is called homogeneous if all the b_j's are zero, and nonhomogeneous if one or several b_j's are not zero. For the homogeneous system we obtain from the Fundamental Theorem the following results.

THEOREM 2 Homogeneous Linear System

A homogeneous linear system

always has the trivial solution x₁ = 0, …, x_n = 0. Nontrivial solutions exist if and only if rank A < n. If rank A = r < n, these solutions, together with x = 0, form a vector space (see Sec. 7.4) of dimension n − r called the solution space of (4).

In particular, if x₍₁₎ and x₍₂₎ are solution vectors of (4), then x = c₁x₍₁₎ + c₂x₍₂₎ with any scalars c₁ and c₂ is a solution vector of (4). (This does not hold for nonhomogeneous systems. Also, the term solution space is used for homogeneous systems only.)

PROOF

The first proposition can be seen directly from the system. It agrees with the fact that b = 0 implies that rank Ã = rank A, so that a homogeneous system is always consistent. If rank A = n, the trivial solution is the unique solution according to (b) in Theorem 1. If rank A < n, there are nontrivial solutions according to (c) in Theorem 1. The solutions form a vector space because if x₍₁₎ and x₍₂₎ are any of them, then Ax₍₁₎ = 0, Ax₍₂₎ = 0, and this implies A(x₍₁₎ + x₍₂₎) = Ax₍₁₎ + Ax₍₂₎ = 0 as well as A(cx₍₁₎) = cAx₍₁₎ = 0, where c is arbitrary. If rank A = r < n, Theorem 1 (c) implies that we can choose n − r suitable unknowns, call them x_r+1, …, x_n, in an arbitrary fashion, and every solution is obtained in this way. Hence a basis for the solution space, briefly called a basis of solutions of (4), is y₍₁₎, …, y_(n−r), where the basis vector y_(j) is obtained by choosing x_r+j = 1 and the other x_r+1, …, x_n zero; the corresponding first r components of this solution vector are then determined. Thus the solution space of (4) has dimension n − r. This proves Theorem 2.

The solution space of (4) is also called the null space of A because Ax = 0 for every x in the solution space of (4). Its dimension is called the nullity of A. Hence Theorem 2 states that

where n is the number of unknowns (number of columns of A).

Furthermore, by the definition of rank we have rank A m in (4). Hence if m < n, then rank A < n. By Theorem 2 this gives the practically important

THEOREM 3 Homogeneous Linear System with Fewer Equations Than Unknowns

A homogeneous linear system with fewer equations than unknowns always has nontrivial solutions.

Nonhomogeneous Linear Systems

The characterization of all solutions of the linear system (1) is now quite simple, as follows.

THEOREM 4 Nonhomogeneous Linear System

If a nonhomogeneous linear system (1) is consistent, then all of its solutions are obtained as

where x₀ is any (fixed) solution of (1) and x_h runs through all the solutions of the corresponding homogeneous system (4).

PROOF

The difference x_h = x − x₀ of any two solutions of (1) is a solution of (4) because Ax_h = A(x − x₀) = Ax − Ax₀ = b − b = 0. Since x is any solution of (1), we get all the solutions of (1) if in (6) we take any solution x₀ of (1) and let x_h vary throughout the solution space of (4).

This covers a main part of our discussion of characterizing the solutions of systems of linear equations. Our next main topic is determinants and their role in linear equations.

7.6 For Reference: Second- and Third-Order Determinants

We created this section as a quick general reference section on second- and third-order determinants. It is completely independent of the theory in Sec. 7.7 and suffices as a reference for many of our examples and problems. Since this section is for reference, go on to the next section, consulting this material only when needed.

A determinant of second order is denoted and defined by

So here we have bars (whereas a matrix has brackets).

Cramer's rule for solving linear systems of two equations in two unknowns

with D as in (1), provided

The value D = 0 appears for homogeneous systems with nontrivial solutions.

PROOF

We prove (3). To eliminate x₂ multiply (2a) by a₂₂ and (2b) by −a₁₂ and add,

Similarly, to eliminate x₁ multiply (2a) by −a₂₁ and (2b) by a₁₁ and add,

Assuming that D = a₁₁a₂₂ − a₁₂a₂₁ ≠ 0, dividing, and writing the right sides of these two equations as determinants, we obtain (3).

EXAMPLE 1 Cramer's Rule for Two Equations

Third-Order Determinants

A determinant of third order can be defined by

Note the following. The signs on the right are + − +. Each of the three terms on the right is an entry in the first column of D times its minor, that is, the second-order determinant obtained from D by deleting the row and column of that entry; thus, for a₁₁ delete the first row and first column, and so on.

If we write out the minors in (4), we obtain

Cramer's Rule for Linear Systems of Three Equations

with the determinant D of the system given by (4) and

Note that D₁, D₂, D₃ are obtained by replacing Columns 1, 2, 3, respectively, by the column of the right sides of (5).

Cramer's rule (6) can be derived by eliminations similar to those for (3), but it also follows from the general case (Theorem 4) in the next section.

7.7 Determinants. Cramer's Rule

Determinants were originally introduced for solving linear systems. Although impractical in computations, they have important engineering applications in eigenvalue problems (Sec. 8.1), differential equations, vector algebra (Sec. 9.3), and in other areas. They can be introduced in several equivalent ways. Our definition is particularly for dealing with linear systems.

A determinant of ordern is a scalar associated with an n × n (hence square !) matrix A = [a_jk], and is denoted by

For n = 1, this determinant is defined by

For n 2 by

Here,

and M_jk is a determinant of order n − 1 namely, the determinant of the submatrix of A obtained from A by omitting the row and column of the entry a_jk, that is, the jth row and the kth column.

In this way, D is defined in terms of n determinants of order n − 1, each of which is, in turn, defined in terms of n − 1 determinants of order n − 2, and so on—until we finally arrive at second-order determinants, in which those submatrices consist of single entries whose determinant is defined to be the entry itself.

From the definition it follows that we mayexpand D by any row or column, that is, choose in (3) the entries in any row or column, similarly when expanding the C_jk's in (3), and so on.

This definition is unambiguous, that is, it yields the same value for D no matter which columns or rows we choose in expanding. A proof is given in App. 4.

Terms used in connection with determinants are taken from matrices. In D we have n² entries a_jk, also n rows and n columns, and a main diagonal on which a₁₁, a₂₂, …, a_nn stand. Two terms are new:

M_jk is called the minor of a_jk in D, and C_jk the cofactor of a_jk in D.

For later use we note that (3) may also be written in terms of minors

EXAMPLE 1 Minors and Cofactors of a Third-Order Determinant

In (4) of the previous section the minors and cofactors of the entries in the first column can be seen directly. For the entries in the second row the minors are

and the cofactors are C₂₁ = −M₂₁, C₂₂ = +M₂₂, and C₂₃ = −M₂₃. Similarly for the third row—write these down yourself. And verify that the signs in C_jk form a checkerboard pattern

EXAMPLE 2 Expansions of a Third-Order Determinant

This is the expansion by the first row. The expansion by the third column is

Verify that the other four expansions also give the value −12.

EXAMPLE 3 Determinant of a Triangular Matrix

Inspired by this, can you formulate a little theorem on determinants of triangular matrices? Of diagonal matrices?

General Properties of Determinants

There is an attractive way of finding determinants (1) that consists of applying elementary row operations to (1). By doing so we obtain an “upper triangular” determinant (see Sec. 7.1, for definition with “matrix” replaced by “determinant”) whose value is then very easy to compute, being just the product of its diagonal entries. This approach is similar (but not the same !) to what we did to matrices in Sec. 7.3. In particular, be aware that interchanging two rows in a determinant introduces a multiplicative factor of −1 to the value of the determinant! Details are as follows.

THEOREM 1 Behavior of an nth-Order Determinant under Elementary Row Operations

(a) Interchange of two rows multiplies the value of the determinant by −1.

(b) Addition of a multiple of a row to another row does not alter the value of the determinant.

(c) Multiplication of a row by a nonzero constant c multiplies the value of the determinant by c. (This holds also when c = 0, but no longer gives an elementary row operation.)

PROOF

(a) By induction. The statement holds for n = 2 because

We now make the induction hypothesis that (a) holds for determinants of order n − 1 2 and show that it then holds for determinants of order n. Let D be of order n. Let E be obtained from D by the interchange of two rows. Expand D and E by a row that is not one of those interchanged, call it the jth row. Then by (4a),

where N_jk is obtained from the minor M_jk of a_jk in D by the interchange of those two rows which have been interchanged in D (and which N_jk must both contain because we expand by another row!). Now these minors are of order n − 1. Hence the induction hypothesis applies and gives N_jk = −M_jk. Thus E = −D by (5).

(b) Add c times Row i to Row j. Let be the new determinant. Its entries in Row j are a_jk + ca_ik. If we expand by this Row j, we see that we can write it as , where D₁ = D has in Row j the a_jk, whereas D₂ has in that Row j the a_jk from the addition. Hence D₂ has a_jk in both Row i and Row j. Interchanging these two rows gives D₂ back, but on the other hand it gives −D₂ by (a). Together D₂ = −D₂ = 0, so that .

(c) Expand the determinant by the row that has been multiplied.

CAUTION! det (cA) cⁿ det A (not c det A). Explain why.

EXAMPLE 4 Evaluation of Determinants by Reduction to Triangular Form

Because of Theorem 1 we may evaluate determinants by reduction to triangular form, as in the Gauss elimination for a matrix. For instance (with the blue explanations always referring to the preceding determinant)

THEOREM 2 Further Properties of nth-Order Determinants

(a)–(c) in Theorem 1 hold also for columns.

(d) Transposition leaves the value of a determinant unaltered.

(e) A zero row or column renders the value of a determinant zero.

(f) Proportional rows or columns render the value of a determinant zero. In particular, a determinant with two identical rows or columns has the value zero.

PROOF

(a)–(e) follow directly from the fact that a determinant can be expanded by any row column. In (d), transposition is defined as for matrices, that is, the jth row becomes the jth column of the transpose.

(f) If Row j = c times Row i, then D = cD₁, where D₁ has Row j = Row i. Hence an interchange of these rows reproduces D₁, but it also gives −D₁ by Theorem 1(a). Hence D₁ = 0 and D = cD₁ = 0. Similarly for columns.

It is quite remarkable that the important concept of the rank of a matrix A, which is the maximum number of linearly independent row or column vectors of A (see Sec. 7.4), can be related to determinants. Here we may assume that rank A > 0 because the only matrices with rank 0 are the zero matrices (see Sec. 7.4).

THEOREM 3 Rank in Terms of Determinants

Consider an m × n matrix A = [a_jk]:

A has rank r 1 if and only if A has an r × r submatrix with a nonzero determinant.
The determinant of any square submatrix with more than r rows, contained in A (if such a matrix exists!) has a value equal to zero.

Furthermore, if m = n, we have:

(3) An n × n square matrix A has rank n if and only if

PROOF

The key idea is that elementary row operations (Sec. 7.3) alter neither rank (by Theorem 1 in Sec. 7.4) nor the property of a determinant being nonzero (by Theorem 1 in this section). The echelon form Â of A (see Sec. 7.3) has r nonzero row vectors (which are the first r row vectors) if and only if rank A = r. Without loss of generality, we can assume that r 1. Let be the r × r submatrix in the left upper corner of Â (so that the entries of are in both the first r rows and r columns of Â). Now is triangular, with all diagonal entries r_jj nonzero. Thus, det . Also det R ≠ 0 for the corresponding r × r submatrix R of A because results from R by elementary row operations. This proves part (1).

Similarly, det S = 0 for any square submatrix S of r + 1 or more rows perhaps contained in A because the corresponding submatrix of Â must contain a row of zeros (otherwise we would have rank A r + 1), so that det by Theorem 2. This proves part (2). Furthermore, we have proven the theorem for an m × n matrix.

For an n × n square matrix A we proceed as follows. To prove (3), we apply part (1) (already proven!). This gives us that rank A = n 1 if and only if A contains an n × n submatrix with nonzero determinant. But the only such submatrix contained in our square matrix A, is A itself, hence det A ≠ 0. This proves part (3).

Cramer's Rule

Theorem 3 opens the way to the classical solution formula for linear systems known as Cramer's rule,² which gives solutions as quotients of determinants. Cramer's rule is not practical in computations for which the methods in Secs. 7.3 and 20.1–20.3 are suitable. However, Cramer's rule is of theoretical interest in differential equations (Secs. 2.10 and 3.3) and in other theoretical work that has engineering applications.

THEOREM 4 Cramer's Theorem (Solution of Linear Systems by Determinants)

(a) If a linear system of n equations in the same number of unknowns x₁, …, x_n

has a nonzero coefficient determinant D = det A, the system has precisely one solution. This solution is given by the formulas

where D_k is the determinant obtained from D by replacing in D the kth column by the column with the entries b₁, …, b_n.

(b)Hence if the system (6) is homogeneous and D ≠ 0, it has only the trivial solution x₁ = 0, x₂ = 0, …, x_n = 0. If D = 0, the homogeneous system also has nontrivial solutions.

PROOF

The augmented matrix of the system (6) is of size n × (n + 1). Hence its rank can be at most n. Now if

then rank A = n by Theorem 3. Thus rank Ã = rank A. Hence, by the Fundamental Theorem in Sec. 7.5, the system (6) has a unique solution.

Let us now prove (7). Expanding D by its kth column, we obtain

where C_ik is the cofactor of entry a_ik in D. If we replace the entries in the kth column of D by any other numbers, we obtain a new determinant, say, . Clearly, its expansion by the kth column will be of the form (9), with a_1k, …, a_nk replaced by those new numbers and the cofactors C_ik as before. In particular, if we choose as new numbers the entries a_1l, …, a_nl of the lth column of D (where l ≠ k), we have a new determinant which has the column [a_1l … a_nl]^T twice, once as its lth column, and once as its kth because of the replacement. Hence by Theorem 2(f). If we now expand by the column that has been replaced (the kth column), we thus obtain

We now multiply the first equation in (6) by C_1k on both sides, the second by C_2k, …, the last by C_nk and add the resulting equations. This gives

Collecting terms with the same x_j, we can write the left side as

From this we see that x_k is multiplied by

Equation (9) shows that this equals D. Similarly, x₁ is multiplied by

Equation (10) shows that this is zero when l ≠ k. Accordingly, the left side of (11) equals simply x_kD so that (11) becomes

Now the right side of this is D_k as defined in the theorem, expanded by its kth column, so that division by D gives (7). This proves Cramer's rule.

If (6) is homogeneous and D ≠ 0, then each D_k has a column of zeros, so that D_k = 0 by Theorem 2(e), and (7) gives the trivial solution.

Finally, if (6) is homogeneous and D = 0, then rank A < n by Theorem 3, so that nontrivial solutions exist by Theorem 2 in Sec. 7.5.

EXAMPLE 5 Illustration of Cramer's Rule (Theorem 4)

For n = 2, see Example 1 of Sec. 7.6. Also, at the end of that section, we give Cramer's rule for a general linear system of three equations.

Finally, an important application for Cramer's rule dealing with inverse matrices will be given in the next section.

PROBLEM SET 7.7

1–6 GENERAL PROBLEMS

General Properties of Determinants. Illustrate each statement in Theorems 1 and 2 with an example of your choice.
Second-Order Determinant. Expand a general second-order determinant in four possible ways and show that the results agree.
Third-Order Determinant. Do the task indicated in Theorem 2. Also evaluate D by reduction to triangular form.
Expansion Numerically Impractical. Show that the computation of an nth-order determinant by expansion involves n! multiplications, which if a multiplication takes 10⁻⁹ sec would take these times:
Multiplication by Scalar. Show that det(kA) = kⁿ det A (not k det A). Give an example.
Minors, cofactors. Complete the list in Example 1.

7–15 EVALUATION OF DETERMINANTS

Showing the details, evaluate:

7.
8.
9.
10.
11.
12.
13.
14.
15.
16. CAS EXPERIMENT. Determinant of Zeros and Ones. Find the value of the determinant of the n × n matrix A_n with main diagonal entries all 0 and all others 1. Try to find a formula for this. Try to prove it by induction. Interpret A₃ and A₄ as incidence matrices (as in Problem Set 7.1 but without the minuses) of a triangle and a tetrahedron, respectively; similarly for an n-simplex, having n vertices and n(n − 1)/2 edges (and spanning Rⁿ⁻¹, n = 5, 6, …).

17–19 RANK BY DETERMINANTS

Find the rank by Theorem 3 (which is not very practical) and check by row reduction. Show details.

17.
18.
19.
20. TEAM PROJECT. Geometric Applications: Curves and Surfaces Through Given Points. The idea is to get an equation from the vanishing of the determinant of a homogeneous linear system as the condition for a nontrivial solution in Cramer's theorem. We explain the trick for obtaining such a system for the case of a line L through two given points P₁: (x₁, y₁) and P₂: (x₂, y₂). The unknown line is ax + by = −c, say. We write it as ax + by + c · 1 = 0. To get a nontrivial solution a, b, c, the determinant of the “coefficients” x, y, 1 must be zero. The system is

(a) Line through two points. Derive from D = 0 in (12) the familiar formula

(b) Plane. Find the analog of (12) for a plane through three given points. Apply it when the points are (1, 1, 1), (3, 2, 6), (5, 0, 5).

(c) Circle. Find a similar formula for a circle in the plane through three given points. Find and sketch the circle through (2, 6), (6, 4), (7, 1).

(d) Sphere. Find the analog of the formula in (c) for a sphere through four given points. Find the sphere through (0, 0, 5), (4, 0, 1), (0, 4, 1), (0, 0, −3) by this formula or by inspection.

(e) General conic section. Find a formula for a general conic section (the vanishing of a determinant of 6th order). Try it out for a quadratic parabola and for a more general conic section of your own choice.

21–25 CRAMER'S RULE

Solve by Cramer's rule. Check by Gauss elimination and back substitution. Show details.

21.
22.
23.
24.
25.

7.8 Inverse of a Matrix. Gauss–Jordan Elimination

In this section we consider square matrices exclusively.

The inverse of an n × n matrix A = [a_jk] is denoted by A⁻¹ and is an n × n matrix such that

where I is the n × n unit matrix (see Sec. 7.2).

If A has an inverse, then A is called a nonsingular matrix. If A has no inverse, then A is called a singular matrix.

If A has an inverse, the inverse is unique.

Indeed, if both B and C are inverses of A, then AB = I and CA = I, so that we obtain the uniqueness from

We prove next that A has an inverse (is nonsingular) if and only if it has maximum possible rank n. The proof will also show that Ax = b implies x = A⁻¹b provided A⁻¹ exists, and will thus give a motivation for the inverse as well as a relation to linear systems. (But this will not give a good method of solving Ax = b numerically because the Gauss elimination in Sec. 7.3 requires fewer computations.)

THEOREM 1 Existence of the Inverse

The inverse A⁻¹ of an n × n matrix A exists if and only if rank A = n, thus (by Theorem 3, Sec. 7.7) if and only if det A ≠ 0. Hence A is nonsingular if rank A = n, and is singular if. rank A < n.

PROOF

Let A be a given n × n matrix and consider the linear system

If the inverse A⁻¹ exists, then multiplication from the left on both sides and use of (1) gives

This shows that (2) has a solution x, which is unique because, for another solution u, we have Au = b, so that u = A⁻¹b = x. Hence A must have rank n by the Fundamental Theorem in Sec. 7.5.

Conversely, let rank A = n. Then by the same theorem, the system (2) has a unique solution x for any b. Now the back substitution following the Gauss elimination (Sec. 7.3) shows that the components x_j of x are linear combinations of those of b. Hence we can write

with B to be determined. Substitution into (2) gives

for any b. Hence C = AB = I, the unit matrix. Similarly, if we substitute (2) into (3) we get

for any x (and b = Ax). Hence BA = I. Together, B = A⁻¹ exists.

Determination of the Inverse by the Gauss–Jordan Method

To actually determine the inverse A⁻¹ of a nonsingular n × n matrix A, we can use a variant of the Gauss elimination (Sec. 7.3), called the Gauss–Jordan elimination.³ The idea of the method is as follows.

Using A, we form n linear systems

where the vectors e₍₁₎, …, e_(n) are the columns of the n × n unit matrix I; thus, e₍₁₎ = [1 0 … 0]^T, e₍₂₎ = [0 1 0 … 0]^T, etc. These are n vector equations in the unknown vectors x₍₁₎, …, x_(n). We combine them into a single matrix equation AX = I, with the unknown matrix X having the columns x₍₁₎, …, x_(n). Correspondingly, we combine the n augmented matrices [A e₍₁₎], …, [A e_(n)] into one wide n × 2n “augmented matrix” . Now multiplication of AX = I by A⁻¹ from the left gives X = A⁻¹I = A⁻¹. Hence, to solve AX = I for X, we can apply the Gauss elimination to . This gives a matrix of the form [U H] with upper triangular U because the Gauss elimination triangularizes systems. The Gauss–Jordan method reduces U by further elementary row operations to diagonal form, in fact to the unit matrix I. This is done by eliminating the entries of U above the main diagonal and making the diagonal entries all 1 by multiplication (see Example 1). Of course, the method operates on the entire matrix [U H], transforming H into some matrix K, hence the entire [U H] to [I K]. This is the “augmented matrix” of IX = K. Now IX = X = A⁻¹, as shown before. By comparison, K = A⁻¹, so that we can read A⁻¹ directly from [I K].

The following example illustrates the practical details of the method.

EXAMPLE 1 Finding the Inverse of a Matrix by Gauss–Jordan Elimination

Determine the inverse A⁻¹ of

Solution. We apply the Gauss elimination (Sec. 7.3) to the following n × 2n = 3 × 6 matrix, where BLUE always refers to the previous matrix.

This is [U H] as produced by the Gauss elimination. Now follow the additional Gauss–Jordan steps, reducing U to I, that is, to diagonal form with entries 1 on the main diagonal.

The last three columns constitute A⁻¹. Check:

Hence AA⁻¹ = I. Similarly, A⁻¹A = I.

Formulas for Inverses

Since finding the inverse of a matrix is really a problem of solving a system of linear equations, it is not surprising that Cramer's rule (Theorem 4, Sec. 7.7) might come into play. And similarly, as Cramer's rule was useful for theoretical study but not for computation, so too is the explicit formula (4) in the following theorem useful for theoretical considerations but not recommended for actually determining inverse matrices, except for the frequently occurring 2 × 2 case as given in (4*).

THEOREM 2 Inverse of a Matrix by Determinants

The inverse of a nonsingular n × n matrix A = [a_jk] is given by

where C_jk is the cofactor of a_jk in det A (see Sec. 7.7). (CAUTION! Note well that in A⁻¹, the cofactor C_jk occupies the same place as a_kj (not a_jk) does in A.)

In particular, the inverse of

PROOF

We denote the right side of (4) by B and show that BA = I. We first write

and then show that G = I. Now by the definition of matrix multiplication and because of the form of B in (4), we obtain (CAUTION! C_sk not C_ks)

Now (9) and (10) in Sec. 7.7 show that the sum (…) on the right is D = det A when l = k, and is zero when l ≠ k. Hence

In particular, for n = 2 we have in (4), in the first row, C₁₁ = a₂₂, C₂₁ = −a₁₂ and, in the second row, C₁₂ = −a₂₁, C₂₂ = a₁₁. This gives (4*).

The special case n = 2 occurs quite frequently in geometric and other applications. You may perhaps want to memorize formula (4*). Example 2 gives an illustration of (4*).

EXAMPLE 2 Inverse of a 2 × 2 Matrix by Determinants

EXAMPLE 3 Further Illustration of Theorem 2

Using (4), find the inverse of

Solution. We obtain A = −1(−7) − 1 · 13 + 2 · 8 = 10, and in (4),

so that by (4), in agreement with Example 1,

Diagonal matrices A = [a_jk], a_jk = 0 when j ≠ k, have an inverse if and only if all a_jj ≠ 0. Then A⁻¹ is diagonal, too, with entries 1/a₁₁, …, 1/a_nn.

PROOF

For a diagonal matrix we have in (4)

EXAMPLE 4 Inverse of a Diagonal Matrix

Let

Then we obtain the inverse A⁻¹ by inverting each individual diagonal element of A, that is, by taking , and as the diagonal entries of A⁻¹, that is,

Products can be inverted by taking the inverse of each factor and multiplying these inverses in reverse order,

Hence for more than two factors,

PROOF

The idea is to start from (1) for AC instead of A, that is, AC(AC)⁻¹ = I, and multiply it on both sides from the left, first by A⁻¹, which because of A⁻¹A = I gives

and then multiplying this on both sides from the left, this time by C⁻¹ and by using C⁻¹C = I.

This proves (7), and from it, (8) follows by induction.

We also note that the inverse of the inverse is the given matrix, as you may prove,

Unusual Properties of Matrix Multiplication. Cancellation Laws

Section 7.2 contains warnings that some properties of matrix multiplication deviate from those for numbers, and we are now able to explain the restricted validity of the so-called cancellation laws [2] and [3] below, using rank and inverse, concepts that were not yet available in Sec. 7.2. The deviations from the usual are of great practical importance and must be carefully observed. They are as follows.

[1] Matrix multiplication is not commutative, that is, in general we have

[2] AB = 0 does not generally imply A = 0 or B = 0 (or BA = 0); for example,

[3] AC = AD does not generally imply C = D (even when A ≠ 0).

Complete answers to [2] and [3] are contained in the following theorem.

THEOREM 3 Cancellation Laws

Let A, B, C be n × n matrices. Then:

If rank A = n and AB = AC, then B = C.
Let rank A = n, then AB = 0 implies B = 0. Hence if, AB = 0, but A ≠ 0 as well as, B ≠ 0 then rank A < n and rank B < n.
If A is singular, so are BA and AB.

PROOF

(a) The inverse of A exists by Theorem 1. Multiplication by A⁻¹ from the left gives A⁻¹AB = A⁻¹AC, hence B = C.

(b) Let rank A = n. Then A⁻¹ exists, and AB = 0 implies A⁻¹AB = B = 0. Similarly when rank B = n. This implies the second statement in (b).

(c₁) Rank A < n by Theorem 1. Hence Ax = 0 has nontrivial solutions by Theorem 2 in Sec. 7.5. Multiplication by B shows that these solutions are also solutions of BAx = 0, so that rank (BA) < n by Theorem 2 in Sec. 7.5 and BA is singular by Theorem 1.

(c₂) A^T is singular by Theorem 2(d) in Sec. 7.7. Hence B^TA^T is singular by part (c₁), and is equal to (AB)^T by (10d) in Sec. 7.2. Hence AB is singular by Theorem 2(d) in Sec. 7.7.

Determinants of Matrix Products

The determinant of a matrix product AB or BA can be written as the product of the determinants of the factors, and it is interesting that det AB = det BA, although AB ≠ BA in general. The corresponding formula (10) is needed occasionally and can be obtained by Gauss–Jordan elimination (see Example 1) and from the theorem just proved.

THEOREM 4 Determinant of a Product of Matrices

For any n × n matrices A and B,

PROOF

If A or B is singular, so are AB and BA by Theorem 3(c), and (10) reduces to 0 = 0 by Theorem 3 in Sec. 7.7.

Now let A and B be nonsingular. Then we can reduce A to a diagonal matrix Â = [a_jk] by Gauss–Jordan steps. Under these operations, det A retains its value, by Theorem 1 in Sec. 7.7, (a) and (b) [not (c)] except perhaps for a sign reversal in row interchanging when pivoting. But the same operations reduce AB to ÂB with the same effect on det(AB). Hence it remains to prove (10) for ÂB; written out,

We now take the determinant (ÂB). On the right we can take out a factor â₁₁ from the first row, â₂₂ from the second, …, â_nn from the nth. But this product â₁₁â₂₂ … â_nn equals Â because Â is diagonal. The remaining determinant is det B. This proves (10) for det (AB), and the proof for det (BA) follows by the same idea.

This completes our discussion of linear systems (Secs. 7.3–7.8). Section 7.9 on vector spaces and linear transformations is optional. Numeric methods are discussed in Secs. 20.1–20.4, which are independent of other sections on numerics.

PROBLEM SET 7.8

1–10 INVERSE

Find the inverse by Gauss–Jordan (or by (4*) if n = 2). Check by using (1).

11–18 SOME GENERAL FORMULAS

11. Inverse of the square. Verify (A²)⁻¹ = (A⁻¹)² for A in Prob. 1.
12. Prove the formula in Prob. 11.
13. Inverse of the transpose. Verify (A^T)⁻¹ = (A⁻¹)^T for A in Prob. 1.
14. Prove the formula in Prob. 13.
15. Inverse of the inverse. Prove that (A⁻¹)⁻¹ = A.
16. Rotation. Give an application of the matrix in Prob. 2 that makes the form of the inverse obvious.
17. Triangular matrix. Is the inverse of a triangular matrix always triangular (as in Prob. 5)? Give reason.
18. Row interchange. Same task as in Prob. 16 for the matrix in Prob. 7.

19–20 FORMULA (4)

Formula (4) is occasionally needed in theory. To understand it, apply it and check the result by Gauss–Jordan:

19. In Prob. 3
20. In Prob. 6

7.9 Vector Spaces, Inner Product Spaces, Linear Transformations Optional

We have captured the essence of vector spaces in Sec. 7.4. There we dealt with special vector spaces that arose quite naturally in the context of matrices and linear systems. The elements of these vector spaces, called vectors, satisfied rules (3) and (4) of Sec. 7.1 (which were similar to those for numbers). These special vector spaces were generated by spans, that is, linear combination of finitely many vectors. Furthermore, each such vector had n real numbers as components. Review this material before going on.

We can generalize this idea by taking all vectors with n real numbers as components and obtain the very important real n-dimensional vector space Rⁿ. The vectors are known as “real vectors.” Thus, each vector in Rⁿ is an ordered n-tuple of real numbers.

Now we can consider special values for n. For n = 2, we obtain R², the vector space of all ordered pairs, which correspond to the vectors in the plane. For n = 3, we obtain R³, the vector space of all ordered triples, which are the vectors in 3-space. These vectors have wide applications in mechanics, geometry, and calculus and are basic to the engineer and physicist.

Similarly, if we take all ordered n-tuples of complex numbers as vectors and complex numbers as scalars, we obtain the complex vector space Cⁿ, which we shall consider in Sec. 8.5.

Furthermore, there are other sets of practical interest consisting of matrices, functions, transformations, or others for which addition and scalar multiplication can be defined in an almost natural way so that they too form vector spaces.

It is perhaps not too great an intellectual jump to create, from the concrete model Rⁿ the abstract concept of a real vector spaceV by taking the basic properties (3) and (4) in Sec. 7.1 as axioms. In this way, the definition of a real vector space arises.

DEFINITION Real Vector Space

A nonempty set V of elements a, b, … is called a real vector space (or real linear space), and these elements are called vectors (regardless of their nature, which will come out from the context or will be left arbitrary) if, in V, there are defined two algebraic operations (called vector addition and scalar multiplication) as follows.

I. Vector addition associates with every pair of vectors a and b of V a unique vector of V, called the sum of a and b and denoted by a + b, such that the following axioms are satisfied.

I.1Commutativity. For any two vectors a and b of V,

I.2Associativity. For any three vectors a, b, c of V,

I.3 There is a unique vector in V, called the zero vector and denoted by 0, such that for every a in V,

I.4 For every a in V there is a unique vector in V that is denoted by −a and is such that

II. Scalar multiplication. The real numbers are called scalars. Scalar multiplication associates with every a in V and every scalar c a unique vector of V, called the product of c and a and denoted by ca (or ac) such that the following axioms are satisfied.

II.1 Distributivity. For every scalar c and vectors a and b in V,

II.2 Distributivity. For all scalars c and k and every a in V,

II.3 Associativity. For all scalars c and k and every a in V,

II.4 For every a in V,

If, in the above definition, we take complex numbers as scalars instead of real numbers, we obtain the axiomatic definition of a complex vector space.

Take a look at the axioms in the above definition. Each axiom stands on its own: It is concise, useful, and it expresses a simple property of V. There are as few axioms as possible and together they express all the desired properties of V. Selecting good axioms is a process of trial and error that often extends over a long period of time. But once agreed upon, axioms become standard such as the ones in the definition of a real vector space.

The following concepts related to a vector space are exactly defined as those given in Sec. 7.4. Indeed, a linear combination of vectors a₍₁₎, …, a_(m) in a vector space V is an expression

These vectors form a linearly independent set (briefly, they are called linearly independent) if

implies that c₁ = 0, …, c_m = 0. Otherwise, if (1) also holds with scalars not all zero, the vectors are called linearly dependent.

Note that (1) with m = 1 is ca = 0 and shows that a single vector a is linearly independent if and only if a ≠ 0.

V has dimension n, or is n-dimensional, if it contains a linearly independent set of n vectors, whereas any set of more than n vectors in V is linearly dependent. That set of n linearly independent vectors is called a basis for V. Then every vector in V can be written as a linear combination of the basis vectors. Furthermore, for a given basis, this representation is unique (see Prob. 2).

EXAMPLE 1 Vector Space of Matrices

The real 2 × 2 matrices form a four-dimensional real vector space. A basis is

because any 2 × 2 matrix A = [a_jk] has a unique representation A = a₁₁B₁₁ + a₁₂B₁₂ + a₂₁B₂₁ + a₂₂B₂₂. Similarly, the real m × n matrices with fixed m and n form an mn-dimensional vector space. What is the dimension of the vector space of all 3 × 3 skew-symmetric matrices? Can you find a basis?

EXAMPLE 2 Vector Space of Polynomials

The set of all constant, linear, and quadratic polynomials in x together is a vector space of dimension 3 with basis {1, x, x²} under the usual addition and multiplication by real numbers because these two operations give polynomials not exceeding degree 2. What is the dimension of the vector space of all polynomials of degree not exceeding a given fixed n? Can you find a basis?

If a vector space V contains a linearly independent set of n vectors for every n, no matter how large, then V is called infinite dimensional, as opposed to a finite dimensional (n-dimensional) vector space just defined. An example of an infinite dimensional vector space is the space of all continuous functions on some interval [a, b] of the x-axis, as we mention without proof.

Inner Product Spaces

If a and b are vectors in Rⁿ, regarded as column vectors, we can form the product a^Tb. This is a 1 × 1 matrix, which we can identify with its single entry, that is, with a number. This product is called the inner product or dot product of a and b. Other notations for it are (a, b) and a • b. Thus

We now extend this concept to general real vector spaces by taking basic properties of (a, b) as axioms for an “abstract inner product” (a, b) as follows.

DEFINITION Real Inner Product Space

A real vector space V is called a real inner product space (or real pre-Hilbert⁴ space) if it has the following property. With every pair of vectors a and b in V there is associated a real number, which is denoted by (a, b) and is called the inner product of a and b, such that the following axioms are satisfied.

I. For all scalars q₁ and q₂ and all vectors a, b, c in V,

II. For all vectors a and b in V,

III. For every a in V,

Vectors whose inner product is zero are called orthogonal.

The length or norm of a vector in V is defined by

A vector of norm 1 is called a unit vector.

From these axioms and from (2) one can derive the basic inequality

From this follows

A simple direct calculation gives

EXAMPLE 3 n-Dimensional Euclidean Space

Rⁿ with the inner product

(where both a and b are column vectors) is called the n-dimensional Euclidean space and is denoted by Eⁿ or again simply by Rⁿ. Axioms I–III hold, as direct calculation shows. Equation (2) gives the “Euclidean norm”

EXAMPLE 4 An Inner Product for Functions. Function Space

The set of all real-valued continuous functions f(x), g(x), … on a given interval α x β is a real vector space under the usual addition of functions and multiplication by scalars (real numbers). On this “function space” we can define an inner product by the integral

Axioms I–III can be verified by direct calculation. Equation (2) gives the norm

Our examples give a first impression of the great generality of the abstract concepts of vector spaces and inner product spaces. Further details belong to more advanced courses (on functional analysis, meaning abstract modern analysis; see [GenRef7] listed in App. 1) and cannot be discussed here. Instead we now take up a related topic where matrices play a central role.

Linear Transformations

Let X and Y be any vector spaces. To each vector x in X we assign a unique vector y in Y. Then we say that a mapping (or transformation or operator) of X into Y is given. Such a mapping is denoted by a capital letter, say F. The vector y in Y assigned to a vector x in X is called the image of x under F and is denoted by F(x) [or Fx, without parentheses].

F is called a linear mapping or linear transformation if, for all vectors v and x in X and scalars c,

Linear Transformation of Space Rⁿ into Space R^m

From now on we let X = Rⁿ and Y = R^m. Then any real m × n matrix A = [a_jk] gives a transformation of Rⁿ into R^m,

Since A(u + x) = Au + Ax and A(cx) = cAx, this transformation is linear.

We show that, conversely, every linear transformation F of Rⁿ into R^m can be given in terms of an m × n matrix A, after a basis for Rⁿ and a basis for R^m have been chosen. This can be proved as follows.

Let e₍₁₎, …, e_(n) be any basis for Rⁿ. Then every x in Rⁿ has a unique representation

Since F is linear, this representation implies for the image F(x):

Hence F is uniquely determined by the images of the vectors of a basis for Rⁿ. We now choose for Rⁿ the “standard basis”

where e_(j) has its jth component equal to 1 and all others 0. We show that we can now determine an m × n matrix A = [a_jk] such that for every x in Rⁿ and image y = F(x) in R^m,

Indeed, from the image y⁽¹⁾ = F(e₍₁₎) of e₍₁₎ we get the condition

from which we can determine the first column of A, namely , . Similarly, from the image of e₍₂₎ we get the second column of A, and so on. This completes the proof.

We say that A representsF, or is a representation of F, with respect to the bases for Rⁿ and R^m. Quite generally, the purpose of a “representation” is the replacement of one object of study by another object whose properties are more readily apparent.

In three-dimensional Euclidean space E³ the standard basis is usually written e₍₁₎ = i, e₍₂₎ = j, e₍₃₎ = k. Thus,

These are the three unit vectors in the positive directions of the axes of the Cartesian coordinate system in space, that is, the usual coordinate system with the same scale of measurement on the three mutually perpendicular coordinate axes.

EXAMPLE 5 Linear Transformations

Interpreted as transformations of Cartesian coordinates in the plane, the matrices

represent a reflection in the line x₂ = x₁, a reflection in the x₁-axis, a reflection in the origin, and a stretch (when a > 1, or a contraction when 0 < a < 1) in the x₁-direction, respectively.

EXAMPLE 6 Linear Transformations

Our discussion preceding Example 5 is simpler than it may look at first sight. To see this, find A representing the linear transformation that maps (x₁, x₂) onto (2x₁ − 5x₂, 3x₁ + 4x₂).

Solution. Obviously, the transformation is

From this we can directly see that the matrix is

If A in (11) is square, n × n, then (11) maps Rⁿ into Rⁿ. If this A is nonsingular, so that A⁻¹ exists (see Sec. 7.8), then multiplication of (11) by A⁻¹ from the left and use of A⁻¹A = I gives the inverse transformation

It maps every y = y₀ onto that x, which by (11) is mapped onto y₀. The inverse of a linear transformation is itself linear, because it is given by a matrix, as (14) shows.

Composition of Linear Transformations

We want to give you a flavor of how linear transformations in general vector spaces work. You will notice, if you read carefully, that definitions and verifications (Example 7) strictly follow the given rules and you can think your way through the material by going in a slow systematic fashion.

The last operation we want to discuss is composition of linear transformations. Let X, Y, W be general vector spaces. As before, let F be a linear transformation from X to Y. Let G be a linear transformation from W to X. Then we denote, by H, the composition of F and G, that is,

which means we take transformation G and then apply transformation F to it (in that order!, i.e. you go from left to right).

Now, to give this a more concrete meaning, if we let w be a vector in W, then G(w) is a vector in X and F(G(w)) is a vector in Y. Thus, H maps W to Y, and we can write

which completes the definition of composition in a general vector space setting. But is composition really linear? To check this we have to verify that H, as defined in (15), obeys the two equations of (10).

EXAMPLE 7 The Composition of Linear Transformations Is Linear

To show that H is indeed linear we must show that (10) holds. We have, for two vectors w₁, w₂ in W,

We defined composition as a linear transformation in a general vector space setting and showed that the composition of linear transformations is indeed linear.

Next we want to relate composition of linear transformations to matrix multiplication.

To do so we let X = Rⁿ, Y = R^m and W = R^p. This choice of particular vector spaces allows us to represent the linear transformations as matrices and form matrix equations, as was done in (11). Thus F can be represented by a general real m × n matrix A = [a_jk] and G by an n × p matrix B = [b_jk]. Then we can write for F, with column vectors x with n entries, and resulting vector y, with m entries

and similarly for G, with column vector w with p entries,

Substituting (17) into (16) gives

This is (15) in a matrix setting, this is, we can define the composition of linear transformations in the Euclidean spaces as multiplication by matrices. Hence, the real m × p matrix C represents a linear transformation H which maps R^p to Rⁿ with vector w, a column vector with p entries.

Remarks. Our discussion is similar to the one in Sec. 7.2, where we motivated the “unnatural” matrix multiplication of matrices. Look back and see that our current, more general, discussion is written out there for the case of dimension m = 2, n = 2 and p = 2. (You may want to write out our development by picking small distinct dimensions, such as m = 2, n = 3, and p = 4, and writing down the matrices and vectors. This is a trick of the trade of mathematicians in that we like to develop and test theories on smaller examples to see that they work.)

EXAMPLE 8 Linear Transformations. Composition

In Example 5 of Sec. 7.9, let A be the first matrix and B be the fourth matrix with a > 1. Then, applying B to a vector w = [w₁ w₂]^T, stretches the element w₁ by a in the x₁ direction. Next, when we apply A to the “stretched” vector, we reflect the vector along the line x₁ = x₂, resulting in a vector y = [w₂ aw₁]^T. But this represents, precisely, a geometric description for the composition H of two linear transformations F and G represented by matrices A and B. We now show that, for this example, our result can be obtained by straightforward matrix multiplication, that is,

and as in (18) calculate

which is the same as before. This shows that indeed AB = C, and we see the composition of linear transformations can be represented by a linear transformation. It also shows that the order of matrix multiplication is important (!). You may want to try applying A first and then B, resulting in BA. What do you see? Does it make geometric sense? Is it the same result as AB?

We have learned several abstract concepts such as vector space, inner product space, and linear transformation. The introduction of such concepts allows engineers and scientists to communicate in a concise and common language. For example, the concept of a vector space encapsulated a lot of ideas in a very concise manner. For the student, learning such concepts provides a foundation for more advanced studies in engineering.

This concludes Chapter 7. The central theme was the Gaussian elimination of Sec. 7.3 from which most of the other concepts and theory flowed. The next chapter again has a central theme, that is, eigenvalue problems, an area very rich in applications such as in engineering, modern physics, and other areas.

PROBLEM SET 7.9

Basis. Find three bases of R².
Uniqueness. Show that the representation v = c₁a₍₁₎ + … + c_na_(n) of any given vector in an n-dimensional vector space V in terms of a given basis a₍₁₎, …, a_(n) for V is unique. Hint. Take two representations and consider the difference.

3–10 VECTOR SPACE

(More problems in Problem Set 9.4.) Is the given set, taken with the usual addition and scalar multiplication, a vector space? Give reason. If your answer is yes, find the dimension and a basis.

3. All vectors in R³ satisfying −υ₁ + 2υ₂ + 3υ₃ = 0, −4υ₁ + υ₂ + υ₃ = 0.
4. All skew-symmetric 3 × 3 matrices.
5. All polynomials in x of degree 4 or less with nonnegative coefficients.
6. All functions y(x) = a cos 2x + b sin 2x with arbitrary constants a and b.
7. All functions y(x) = (ax + b)e^−x with any constant a and b.
8. All n × n matrices A with fixed n and det A = 0.
9. All 2 × 2 matrices [a_jk] with a₁₁ + a₂₂ = 0.
10. All 3 × 2 matrices [a_jk] with first column any multiple of [3 0 −5]^T.

11–14 LINEAR TRANSFORMATIONS

Find the inverse transformation. Show the details.

11. y₁ = 0.5x₁ − 0.5x₂
y₂ = 1.5x₁ − 2.5x₂
12.
13.
14.

15–20 EUCLIDEAN NORM

Find the Euclidean norm of the vectors:

15. [3 1 −4]^T
16.
17. [1 0 0 1 −1 0 −1 1]^T
18. [−4 8 −1]^T
19.
20.

21–25 INNER PRODUCT. ORTHOGONALITY

21. Orthogonality. For what value(s) of k are the vectors and orthogonal?
22. Orthogonality. Find all vectors in R³ orthogonal to [2 0 1]. Do they form a vector space?
23. Triangle inequality. Verify (4) for the vectors in Probs. 15 and 18.
24. Cauchy–Schwarz inequality. Verify (3) for the vectors in Probs. 16 and 19.
25. Parallelogram equality. Verify (5) for the first two column vectors of the coefficient matrix in Prob. 13.

CHAPTER 7 REVIEW QUESTIONS AND PROBLEMS

What properties of matrix multiplication differ from those of the multiplication of numbers?
Let A be a 100 × 100 matrix and B a 100 × 50 matrix. Are the following expressions defined or not? A + B, A², B², AB, BA, AA^T, B^TA, B^TB, B^TAB. Give reasons.
Are there any linear systems without solutions? With one solution? With more than one solution? Give simple examples.
Let C be 10 × 10 matrix and a a column vector with 10 components. Are the following expressions defined or not? Ca, C^Ta, Ca^T, aC, a^TC, (Ca^T)^T.
Motivate the definition of matrix multiplication.
Explain the use of matrices in linear transformations.
How can you give the rank of a matrix in terms of row vectors? Of column vectors? Of determinants?
What is the role of rank in connection with solving linear systems?
What is the idea of Gauss elimination and back substitution?
What is the inverse of a matrix? When does it exist? How would you determine it?

11–20 MATRIX AND VECTOR CALCULATIONS

Showing the details, calculate the following expressions or give reason why they are not defined, when

11. AB, BA
12. A^T, B^T
13. Au, u^TA
14. u^Tv, uv^T
15. u^TAu, v^TBv
16. A⁻¹, B⁻¹
17. det A, det A², (det A)², det B
18. (A²)⁻¹, (A⁻¹)²
19. AB − BA
20. (A + A^T)(B − B^T)

21–28 LINEAR SYSTEMS

Showing the details, find all solutions or indicate that no solution exists.

21.
22.
23.
24.
25.
26.
27.
28.

29–32 RANK

Determine the ranks of the coefficient matrix and the augmented matrix and state how many solutions the linear system will have.

29. In Prob. 23
30. In Prob. 24
31. In Prob. 27
32. In Prob. 26

33–35 NETWORKS

Find the currents.

33.
34.
35.

SUMMARY OF CHAPTER 7 Linear Algebra: Matrices, Vectors, Determinants. Linear Systems

An m × n matrix A = [a_jk] is a rectangular array of numbers or functions (“entries,” “elements”) arranged in m horizontal rows and n vertical columns. If m = n, the matrix is called square. A 1 × n matrix is called a row vector and an m × 1 matrix a column vector (Sec. 7.1).

The sum A + B of matrices of the same size (i.e., both m × n) is obtained by adding corresponding entries. The product of A by a scalar c is obtained by multiplying each a_jk by c (Sec. 7.1).

The product C = AB of an m × n matrix A by an r × p matrix B = [b_jk] is defined only when r = n, and is the m × p matrix C = [c_jk] with entries

This multiplication is motivated by the composition of linear transformations (Secs. 7.2, 7.9). It is associative, but is not commutative: if AB is defined, BA may not be defined, but even if BA is defined, AB ≠ BA in general. Also AB = 0 may not imply A = 0 or B = 0 or BA = 0 (Secs. 7.2, 7.8). Illustrations:

The transpose A^T of a matrix A = [a_jk] is A^T = [a_kj]; rows become columns and conversely (Sec. 7.2). Here, A need not be square. If it is and A = A^T, then A is called symmetric; if A = −A^T, it is called skew-symmetric. For a product, (AB)^T = B^TA^T (Sec. 7.2).

A main application of matrices concerns linear systems of equations

(m equations in n unknowns x₁, …, x_n, A and b given). The most important method of solution is the Gauss elimination (Sec. 7.3), which reduces the system to “triangular” form by elementary row operations, which leave the set of solutions unchanged. (Numeric aspects and variants, such as Doolittle's and Cholesky's methods, are discussed in Secs. 20.1 and 20.2.)

Cramer's rule (Secs. 7.6, 7.7) represents the unknowns in a system (2) of n equations in n unknowns as quotients of determinants; for numeric work it is impractical. Determinants (Sec. 7.7) have decreased in importance, but will retain their place in eigenvalue problems, elementary geometry, etc.

The inverse A⁻¹ of a square matrix satisfies AA⁻¹ = A⁻¹A = I. It exists if det A ≠ 0. It can be computed by the Gauss–Jordan elimination (Sec. 7.8).

The rank r of a matrix A is the maximum number of linearly independent rows or columns of A or, equivalently, the number of rows of the largest square submatrix of A with nonzero determinant (Secs. 7.4, 7.7).

The system (2) has solutions if and only if rank A = rank [A b], where [A b] is the augmented matrix (Fundamental Theorem, Sec. 7.5).

The homogeneous system

has solutions x ≠ 0 (“nontrivial solutions”) if and only if rank A < n, in the case m = n equivalently if and only if det A = 0 (Secs. 7.6, 7.7).

Vector spaces, inner product spaces, and linear transformations are discussed in Sec. 7.9. See also Sec. 7.4.

¹ANDREI ANDREJEVITCH MARKOV (1856–1922), Russian mathematician, known for his work in probability theory.

²GABRIEL CRAMER (1704–1752), Swiss mathematician.

³WILHELM JORDAN (1842–1899), German geodesist and mathematician. He did important geodesic work in Africa, where he surveyed oases. [See Althoen, S.C. and R. McLaughlin, Gauss–Jordan reduction: A brief history. American Mathematical Monthly, Vol. 94, No. 2 (1987), pp. 130–142.]

We do not recommend it as a method for solving systems of linear equations, since the number of operations in addition to those of the Gauss elimination is larger than that for back substitution, which the Gauss–Jordan elimination avoids. See also Sec. 20.1.

⁴DAVID HILBERT (1862–1943), great German mathematician, taught at Königsberg and Göttingen and was the creator of the famous Göttingen mathematical school. He is known for his basic work in algebra, the calculus of variations, integral equations, functional analysis, and mathematical logic. His “Foundations of Geometry” helped the axiomatic method to gain general recognition. His famous 23 problems (presented in 1900 at the International Congress of Mathematicians in Paris) considerably influenced the development of modern mathematics.

If V is finite dimensional, it is actually a so-called Hilbert space; see [GenRef7], p. 128, listed in App. 1.

⁵HERMANN AMANDUS SCHWARZ (1843–1921). German mathematician, known by his work in complex analysis (conformal mapping) and differential geometry. For Cauchy see Sec. 2.5.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 7: Linear Algebra: Matrices, Vectors, Determinants. Linear Systems

Create new playlist

Sign In

Sign Up

CHAPTER 7

Linear Algebra: Matrices, Vectors, Determinants. Linear Systems

7.1 Matrices, Vectors: Addition and Scalar Multiplication

General Concepts and Notations

Vectors

Addition and Scalar Multiplication of Matrices and Vectors

7.2 Matrix Multiplication

Motivation of Multiplication by Linear Transformations

Transposition

Special Matrices

Some Applications of Matrix Multiplication

7.3 Linear Systems of Equations. Gauss Elimination

Linear System, Coefficient Matrix, Augmented Matrix

Gauss Elimination and Back Substitution

Elementary Row Operations. Row-Equivalent Systems

Elementary Row Operations for Matrices:

Elementary Operations for Equations:

Gauss Elimination: The Three Possible Cases of Systems

Row Echelon Form and Information From It

7.4 Linear Independence. Rank of a Matrix. Vector Space

Linear Independence and Dependence of Vectors

Rank of a Matrix

Vector Space

7.5 Solutions of Linear Systems: Existence, Uniqueness

Homogeneous Linear System

Nonhomogeneous Linear Systems

7.6 For Reference: Second- and Third-Order Determinants

Third-Order Determinants

Cramer's Rule for Linear Systems of Three Equations

7.7 Determinants. Cramer's Rule

General Properties of Determinants

Cramer's Rule

7.8 Inverse of a Matrix. Gauss–Jordan Elimination

Determination of the Inverse by the Gauss–Jordan Method

Formulas for Inverses

Unusual Properties of Matrix Multiplication. Cancellation Laws

Determinants of Matrix Products

7.9 Vector Spaces, Inner Product Spaces, Linear Transformations Optional

Inner Product Spaces

Linear Transformations

Linear Transformation of Space Rn into Space Rm

Composition of Linear Transformations

CHAPTER 7 REVIEW QUESTIONS AND PROBLEMS

SUMMARY OF CHAPTER 7 Linear Algebra: Matrices, Vectors, Determinants. Linear Systems

Table of Contents for
CHAPTER 7: Linear Algebra: Matrices, Vectors, Determinants. Linear Systems

Linear Transformation of Space Rⁿ into Space R^m