In Chapter 8, we introduced vectors as objects associated with a direction in everyday three-dimensional space and showed how they can be discussed using equations for their three components in a given reference frame. Here we shall show how to extend the number of components to define vectors in spaces of more than three dimensions. This leads to the introduction of matrices, which are two-dimensional arrays that enable vectors to be transformed into other vectors. The properties of matrices are discussed in detail and their uses illustrated in, for example, solving simultaneous linear equations. In the following chapter we continue the discussion of matrices, with applications to vibrating systems and to geometry. Firstly, however, we study related quantities called determinants, which will play a crucial role in this development.
These occur in many contexts and we have already met examples in the discussion of vectors in Chapter 8. From (8.16b), the vector product of two vectors a and b in Cartesian co-ordinates has an x-component (aybz − azby). Any four quantities aij(i, j = 1, 2) combined in this way can be written in the form of a square array, denoted by Δ2, called a determinant. This is written in the form
where the quantities aij(i, j = 1, 2) are called the elements of the determinant. For example,
The result, in this case − 5, is called the value of the determinant. It is important to note that the vertical bars in (9.1) do not mean that a modulus is to be taken, as this example confirms. Although we have used real numbers for the elements in this example, in general they can be algebraic expressions, real or complex, so the value of the determinant may also be real or complex expressions or numbers.
Determinants of larger dimensionality can also be constructed. Thus the 3 × 3 determinant
is defined as
Comparing this with (8.18), we see that the triple scalar product of three vectors a, b, c
is a determinant whose elements are the Cartesian components of the vectors. Likewise, comparing (9.2a) with (8.16a) shows that the vector product of two vectors a and b can also be written as a 3 × 3 determinant
The two compact forms (9.3a) (9.3b) are probably the easiest way of remembering the expressions (8.18) and (8.16a) for the triple scalar product and vector product, respectively.
Returning to (9.2b), we see that the terms in brackets on the right-hand side are themselves 2 × 2 determinants. Hence we can write
where the determinants that occur on the right-hand side are examples of minors. In general, the minor mij of any element aij of Δ3 is the 2 × 2 determinant obtained by deleting all the elements in the ith row and jth column of Δ3. Therefore (9.2b) can be written
where the co-factor of any element aij is defined by
Equation (9.4a) is called the Laplace expansion along the first row of Δ3. For example, the minors of the elements along the first row of the determinant
are
so that (9.4a) gives
Laplace expansions can be made along any row or column. For example, the expression in (9.2b) can be rearranged to give
which is the Laplace expansion
along the second row. Using this expansion for the determinant (9.5) gives
in agreement with the value obtained by expanding along the first row. Alternatively (9.2b) can be written in the form
which is a Laplace expansion along the third column.
The definition of a determinant can now be extended to integers n > 3 by generalising the Laplace expansion (9.4a) to any n. To do this, we first write an n × n array
(9.6a)
where the elements are aij (i, j = 1, 2, …, n) and the indices i and j again label the rows and columns, respectively. Then, by analogy with the expansion (9.4a) for 3 × 3 determinants, we define
where the minors mij are again the determinants obtained by deleting all the elements of the ith row and jth column, and the co-factors are given by (9.4b). Since the minors associated with the elements of an n × n determinant are (n − 1) × (n − 1) determinants, (9.6b) defines 4 × 4 determinants in terms of a sum of 3 × 3 determinants, and so on. Such higher order determinants are required in, for example, the solution of n simultaneous linear equations, as we shall see in Sections 9.1.2 and 9.4.4.
The evaluation of determinants using the Laplace expansion involves the arithmetical operations of addition, subtraction and multiplication, the number of which increases rapidly as the dimensionality of the determinant increases. The work involved can sometimes be reduced by exploiting a number of general properties of determinants that are given below.
Although these results hold in general, here we will only consider the case for 3 × 3 determinants. In this case it is convenient to define the totally antisymmetric symbol ϵijk as follows:
where cyclic permutations were defined following (8.16b). Using (9.7), Eqs. (9.2a) and (9.2b) may be written
where we have used a shorthand notation for a sum over three dummy indices i, j and k, which may each take the values 1, 2 and 3, i.e.
The theorems are as follows.
The value of a determinant is unchanged by interchanging (called transposing) its rows and columns.
This corresponds to the transformation aij → aji for i, j equal to 1, 2 and 3. Using the notation in (9.8) and denoting the new determinant by ΔT3, this gives
Rearranging the right-hand side gives
It follows that theorems about rows also apply to columns, so it is sufficient to prove them only for the former.
The sign of a determinant is reversed by interchanging any two of its rows (or columns).
This result again follows directly from (9.8). For example, interchanging the first row and second column gives
and using the definition (9.7),
The value of a determinant is zero if any two rows (or columns) are identical.
This follows immediately from the preceding result, because this interchange gives Δ3 = −Δ3 and hence Δ3 = 0.
If the elements of any one row (or column) are multiplied by a common factor, the value of the determinant is multiplied by this factor.
This follows trivially, because each term in (9.8) contains a single element from each row (or column).
Using these theorems, a number of other useful results may be established as follows.
These properties can often be used to manipulate a determinant into a form that is easier to evaluate. For example, consider the determinant
The elements of the first row are all multiples of 9, which can therefore be factored out to give
Then by property (vii) we can add row 3 to row 1 without changing the value of the determinant, when we obtain
because a determinant with two equal rows has a value zero [property (iii)].
In other cases, property (vii) can often be used to manipulate a determinant into a form where it has one or more zeros in a given row or column. Then if this row or column is used in the Laplace expansion, the number of arithmetic operations can be reduced considerably. Consider the evaluation of the determinant
In this case, one way of proceeding is to add column 4 to each of columns 1 and 3, and add twice column 4 to column 2, when we obtain
Making a Laplace expansion along the first row gives
Then subtracting row 2 from row 1 gives
The Laplace expansion is most suited for determinants of low dimensionality (i.e., small values of n) and where in numerical calculations the elements do not differ much in magnitude. For large-dimensional determinants, the final result may still be formed from the addition and subtraction of many terms, each of which is itself the product of several elements. In these cases there is a significant probability of inaccuracies being introduced in numerical calculations due to rounding errors, particularly if the elements differ considerably in magnitude. Special computer programs exist1 that address this problem, and are capable of evaluating determinants exactly.
We have seen that determinants appear naturally when manipulating vectors. They also appear in the theory of simultaneous linear equations. If there are n simultaneous linear equations in n unknowns xi(i = 1, 2, …, n), they may be written in the general form,
where the aij (i = 1, n ; j = 1, n) and bj (j = 1, n) are constants. These equations are not necessarily compatible. In the general case where the bj are not all zero, the equations are called inhomogeneous, and their solution will be discussed in Section 9.4.4. In the simpler homogeneous case, where all the constants bj are zero, the equations are never inconsistent, because they always have a so-called trivial solution where all the xi are zero. But they may also have non-trivial solutions, where not all the xi are zero. Because the equations are linear and homogeneous, it follows that if a non-trivial solution exists for a particular set of values xi(i = 1, 2, …, n), then the set cxi(i = 1, 2, …, n), where c is a constant, is also a solution. Thus non-trivial solutions are characterised by the ratios x1 : x2 : x3: ⋅⋅⋅: xn, rather than by unique values.
We will examine below how to find non-trivial solutions, using initially the example of n = 3, that is, the set of equations
which has an associated determinant of coefficients
The value of this determinant determines whether or not a non-trivial solution exists.
An obvious way to proceed is to use the third equation in (9.10) to give an expression for x3 in terms of x2 and x1, then substitute this into the other two equations and examine the two resulting equations in x1 and x2 to see if they have compatible solutions. However, this is algebraically rather cumbersome and rapidly becomes very tedious if one considers more than three equations.
Instead, we will use another method, in which the key result follows from the equation
obtained by multiplying the first equation in (9.10) by the co-factor A11, the second by A21, and the third by A31, and adding the three resulting equations together. The first term in brackets in (9.11a) is seen to be the Laplace expansion of Δ using the first column, and so has the value Δ. On comparing the second bracket with the first, we see that it is the Laplace expansion of a determinant in which the first column a11, a21, a31 of Δ has been replaced by a12, a22, a32. Hence
because two columns are identical. The third bracket in (9.11a) vanishes for a similar reason, so that (9.11a) reduces to
(9.11b)
and therefore x1 = 0 unless Δ = 0. Analogous arguments show that x2Δ = x3Δ = 0, so a necessary condition for a non-trivial solution to (9.10) is
Furthermore, if we substitute
into (9.10), we see that the left-hand sides of the three equations (9.10) equal the three terms in brackets in (9.11a), which have all been shown to vanish for Δ = 0. Hence (9.13a) is the desired non-trivial solution and (9.12) is both a necessary and sufficient condition for it to exist. A similar argument shows that the solution can equally well be expressed in the form
(9.13b)
In contrast to the direct method of solution, the above chain of reasoning can be extended in a straightforward way to solve n homogeneous linear equations for any integer n. The condition for a non-trivial solution then becomes
(9.14)
and provided this is satisfied, the non-trivial solution is given by the co-factors, i.e.,
(9.15)
Finally, we note that for the case n = 3, the homogeneous equations (9.10) have a simple geometrical interpretation if we interpret x1, x2 and x3 as Cartesian co-ordinates x, y and z. On comparing to (1.51), we see that the three equations (9.10) are those of three planes passing through the origin. Hence the line of intersection of two of these planes, assuming they are not identical, will also pass through the origin. If this line lies in the plane described by the third equation, then any point on it is a solution to all three equations (9.10). In this case, there is a non-trivial solution given by (9.13a), which is indeed the equation of a straight line through the origin, as can be seen by comparing with (8.40). On the other hand, if it does not lie in the plane described by the third equation, then it just passes through that plane at the origin and there is no non-trivial solution to all three equations.
In Chapter 8, three-dimensional vectors were defined as mathematical quantities having magnitude and direction and satisfying the parallelogram law of addition. This approach is a geometrical one and is independent of the co-ordinate system. We also developed an algebraic approach using basis vectors (i, j, k) in the directions of the x, y, z axes of a three-dimensional Cartesian co-ordinate system. Any vector a could then be specified by its components ax, ay, az along the directions of the basis vectors, i.e.
or equivalently a = (ax, ay, az). The basis vectors are not unique (for example, we could rotate the three axes through a fixed angle and use these new directions to define new basis vectors) but they are linearly independent. This means that there is no linear combination of them that vanishes, unless the coefficients are all zero. That is,
only if
In the physical sciences it is common to encounter ordered sets of n quantities a = (a1, a2, …, an), b = (b1, b2, …, bn) etc., whose elements satisfy the same algebraic properties as the components of vectors. In particular, if we define their sums by
and multiplication by a scalar λ by
then they obey all the general rules (8.1), (8.2) deduced for vectors in Chapter 8. For this reason (a1, a2, …, an) and (b1, b2, …, bn) are referred to as the components of vectors a and b in an n-dimensional vector space. In addition, we can define a null vector 0, whose n components are all zero, so that for any vector a,
Implicit in the choice of the word ‘component’ to describe (a1, a2, …, an), (b1, b2, …, bn), etc. is the existence of a set of basis vectors, for example,
so that
in analogy to a = axi + ayj + azk for ordinary three-dimensional vectors. As for the case of ordinary vectors, the choice of basis vectors is not unique, and we can equally well expand the vector a in terms of any set of basis vectors ei(i = 1, 2, …, n), providing the latter are linearly independent, that is, provided that
(9.19a)
has no solutions for the constants μi except
(9.19b)
This ensures that none of the basis vectors can be expressed in terms of the others, and, in general, a vector space is said to be n-dimensional if it contains no linearly independent set of vectors within it with more than n members. Such a set of n linearly independent vectors is called a complete set. It also guarantees the uniqueness of the expansion (9.18). This is easily seen by writing
and equating this to (9.18) gives
which from (9.19) has no solution other than for all i = 1, 2, …, n. Of course the components (a1, a2, …, an) will depend on the particular basis vectors chosen, and (a1, a2, …, an) is said to be a representation of a in the basis ei(i = 1, 2, …, n).
In what follows, we will need to relate the components ai in a given representation (9.18) to the components a′i in a representation
defined with respect to a different set of basis vectors where e′i(i = 1, 2, ⋅⋅⋅, n). To do this, we note that any vector in the space can be written in the form (9.18), including the new basis vectors e′i . Hence we can write
where pij are numerical constants. On substituting (9.21a) into (9.20), we obtain
This is only compatible with (9.18) for arbitrary vectors a if
(9.21b)
which is the desired relation.
The components of vectors need not be restricted to real quantities. Complex vectors in an arbitrary number of dimensions play an important role in, for example, quantum mechanics. Generalising the vectors and scalar variables to complex quantities does not alter any of the equations (8.1), (8.2) or (9.16)–(9.18), but does affect the definition of the scalar product. To distinguish this from the scalar product defined in Chapter 8 for three-dimensional vectors, we will use the notation (a, b) (also called the inner product in this context).
For the moment, we restrict ourselves to the basis (9.17), when the inner product of two vectors a = (a1, a2, …, an) and b = (b1, b2, …, bn) is defined to be
It reduces to the scalar (dot) product defined in Chapter 8 for the case of real coefficients and ensures that the squared length
remains real and positive. This leads to the basic properties
(9.23b)
from which it follows that
(9.23d)
and
(9.23e)
where λ and μ are both in general complex constants. Note that these relations reduce to the corresponding relations (8.8a), (8.8b) and (8.8c) for the real vectors discussed in Chapter 8 when λ, μ and the vectors themselves are real. In particular, we see from (9.23c) that the scalar product is only commutative for real vectors.
We can now apply the general properties (9.23a)–(9.23e) to a general basis (9.18). In doing so, we will assume that the chosen basis satisfies the orthonormality relations [cf. (8.11)]
(9.24a)
where δij is the Kronecker delta symbol, defined by
(9.24a)
Then using (9.23) repeatedly we have
using (9.24). Thus the expression (9.22) holds in all bases (9.18) provided the orthonormality relations (9.24) are satisfied. Furthermore, using (9.18) and (9.24) we have
i.e. the vector a is given by
(9.25)
In this section we introduce matrices and discuss their role in transforming vectors into other vectors.
Consider the set of linear simultaneous equations
where the coefficients aij(i = 1, 2, …, m; j = 1, 2, …, n) are constants. These equations determine m variables yi(i = 1, 2, …, m) in terms of n given variables xj(j = 1, 2, …, n), where the integers m and n are not necessarily equal. It is convenient to write (9.27) in a form that separates the variables xj from the coefficients aij as follows:
This array of coefficients is called a matrix and the quantities aij are called the elements of the matrix. It is said to be of order m × n because it has m rows and n columns. The vertical arrays yi(i = 1, 2, …, m) and xj(j = 1, 2, …, n) are also matrices, in this case of order m × 1 and n × 1. They are referred to as column matrices, or column vectors. Likewise, matrices of order 1 × n are referred to as row matrices, or row vectors. On comparing (9.28) with (9.27), we see that each of the yi(i = 1, 2, …, m) is obtained by multiplying the element in the ith row of the m × n matrix by the numbers xj(j = 1, 2, …, n) in turn and adding, so that
For example, if
then
So far we have merely rewritten (9.27) in the different, but equivalent, form (9.28). The usefulness of this form results from developing rules for manipulating matrices directly. In doing this, it is convenient to denote matrices by upper-case bold Roman letters A, B, C, etc., with the exception that both row and column vectors are denoted by lower-case bold Roman letters a, b, c, etc. Thus, (9.28) may be written in the compact form
Matrix algebra is then defined by the following rules.
Equality
Two matrices A, with elements aij, and B, with elements bij, are equal, if, and only if, they are of the same order m × n, and aij = bij for all i = 1, 2, …, m and j = 1, 2, …, n.
Addition
The sum S of two matrices A and B is defined if, and only if, they have the same order. The elements of S are then given by
This leads directly to the commutative and associative laws
(9.32a)
and
(9.32b)
respectively.
Scalar multiplication
If a matrix A is multiplied by a scalar quantity λ, then every element of A is multiplied by λ, i.e.
If λ and μ are arbitrary constants, (9.31)–(9.33) lead to the associative and distributive laws
(9.34a)
(9.34b)
and
(9.34c)
provided again that A and B are of the same order. In addition, we define null matrices 0 of any dimension, whose elements are all zero, so that
(9.34d)
Matrix multiplication
The product of two matrices AB is defined if, and only if, the number of columns in A is the same as the number of rows in B. Then, if A is an l × m matrix and B is an m × n matrix, the product AB is an l × n matrix whose elements are defined by
for all i = 1, 2, …, l; j = 1, 2, …, n. In other words, the element (AB)ik is obtained by multiplying each element of row i of A by the corresponding element of column k of B, and adding. For example, if
then AB is the 2 × 2 matrix
It is worth noting that, just as for the scalar products of ordinary three-dimensional vectors, . For example, if
then
but neither A nor B is a null matrix.
To motivate the definition (9.35) and to derive another important relation, let us suppose the n-component column vector x in (9.30) is related to a p-component column vector z by
where B is an n × p matrix, so that
Substituting (9.37a) into (9.30) gives
On the other hand, substituting (9.37b) into (9.29), gives
which, on comparing with (9.35), is seen to be
Hence y = (AB)z and on comparing this with (9.38a), we finally obtain
From this we see that the position of the brackets is immaterial and we can write y = ABz without ambiguity. By a similar argument one can show that
(9.39)
and so on. However, while the position of brackets in matrix products is not important, the order is crucial, since matrix multiplication is not in general commutative, that is, AB ≠ BA. This is obvious for the multiplication of a n × m matrix A and a m × n matrix B, because the products AB and BA have different dimensionalities, but it is also true even if n = m. Matrix multiplication is however distributive with respect to addition, i.e.
and
Column matrices are special cases of m × n matrices with n = 1 and are written with the second index suppressed, that is, we write them with a single row index. For example,
(9.41)
With this convention, for any two column matrices a and b, (9.31) and (9.33) reduce to
(9.42a)
and
(9.42b)
These relations are identical to (9.16a) and (9.16b) used to characterise the components of an n-dimensional vector in Section 9.2. Similarly, the matrix relations (9.32)–(9.34) reduce to the vector relations (8.1) and (8.2) when applied to column matrices. Hence column matrices are with justification referred to as column vectors. The scalar product of a vector a with a vector b is also easily expressed in matrix notation, since the product of a row vector and a column vector of the same order n is given by
Comparing this with (9.22), we see that in an orthogonal basis, the scalar product is
where the row vector a† corresponding to the column vector a is defined by
and is called the Hermitian conjugate of a for reasons that will become clear in Section 9.3.3.
Returning to (9.30), we now interpret the matrix A as a matrix operator that transforms an n-dimensional vector x into an m-dimensional vector y. By an operator we mean anything that acts on the object to its right, called the operand, to give a new object. Furthermore, it is easy to show, using (9.29) and (9.42) that
where λ and μ are arbitrary constants and a, b are arbitrary vectors. Any operator that satisfies an equation of the form (9.45) is called a linear operator and, correspondingly, (9.30) is called a linear transformation. Another linear operator, which we will meet in Chapter 10, is the differential operator , which transforms a function f(x) into its derivative. Thus,
(9.46a)
where the linearity condition
(9.46b)
follows directly from (3.19).
Linear operators and transformations are widely used in mathematics and physical science. Here we shall confine ourselves to matrix operators. A simple example is provided by considering a position vector in two dimensions,
When rotated through an angle θ, this gives a new position vector
of the same length r, as shown in Figure 9.1. Using the trigonometric identities (2.36), we have
and similarly
Hence in matrix notation,
(9.48)
or equivalently,
(9.49)
where the rotation matrix
Finally, we consider the product of two transformation matrices A and B. Equation (9.38b) implies
so that the transformation AB is equivalent to the operator B acting first, followed by the operator A. In other words, the operator on the right acts first, and if A acts before B, the appropriate operator is BA ≠ AB, since in general matrices do not commute.
Given a matrix A with elements aij, it is useful to define three related matrices, as follows.
The transpose of A, denoted AT, is obtained by interchanging rows and columns. An example is
while the general relation is
(9.51)
It follows from this that
since
In general, the transpose of a product of matrices is the product of the individual transposed matrices taken in reverse order. Thus,
and so on, which follows by repeated application of (9.52).
The complex conjugate of a matrix A is denoted A* and has elements a*ij. Complex conjugation has no effect on the order in products, i.e.
The Hermitian conjugate2 of a matrix A, written A†, is defined as the transpose of the complex conjugate matrix, or vice versa, i.e.
(9.53a)
so that3
Since Hermitian conjugation involves a transpose, it also reverses the order of products, i.e.
(9.54)
For a real matrix, the Hermitian conjugate is just the transpose.
Matrices with the same number of rows and columns are called square matrices, and their dimension n = m is called their order. We discuss here some of the most important types of square matrices that will be required in later sections.
Diagonal matrix
A matrix A is diagonal if its elements aij are zero unless they lie on the leading diagonal i = j, so that aij = aiδij, where δij is the Kronecker delta symbol of (9.24b). The sum of the elements along this diagonal is called the trace, denoted Tr. As an exception to the general rule, diagonal matrices of the same order commute under multiplication, that is, AB = BA if A and B are both diagonal. An important example of a diagonal matrix is the unit matrix I defined by
(9.55)
which has the property
(9.56)
for any matrix A (not necessarily diagonal) of the same order.
Symmetric and anti-symmetric matrices
A matrix is symmetric if it satisfies the condition A = AT, i.e. aij = aji, and anti-symmetric (or skew symmetric) if A = −AT, i.e. aij = −aji, where AT is the transpose of A. Any matrix A may be expressed as the sum of a symmetric and an anti-symmetric matrix, by analogy with the decomposition of functions as the sum of symmetric and anti-symmetric functions, as discussed in Section 1.3.1. Thus
where by construction the first bracket is a symmetric matrix and the second is anti-symmetric.
Hermitian matrix
A matrix is Hermitian, if it satisfies A = A†, where the dagger indicates the combined operation of complex conjugation and transposition, carried out in either order, that is, if a†ij = (aji)* = aij. If A† = −A, the matrix A is said to be anti-Hermitian (or skew Hermitian). Any complex matrix can be expressed as the sum of a Hermitian matrix and an anti-Hermitian matrix. Thus,
where by construction the first bracket is a Hermitian matrix and the second is anti-Hermitian. A real, symmetric matrix is automatically Hermitian, because A† = AT in this case.
Unitary matrix
A matrix U is said to be unitary if it satisfies
If we make the unitary transformation
on a vector x, then by (9.43) and (9.57a),
so that the length of the vector is unchanged.
Orthogonal matrix
An orthogonal matrix O is a real unitary matrix. It therefore also leaves the length of a vector unchanged and (9.57a) becomes
(9.57b)
Given a square matrix A of order n, we can define an associated determinant by
(9.58)
If , the matrix is said to be singular; if , then A is non-singular.
The properties of determinants have been summarised in Section 9.1. Since interchanging rows and columns leaves the value of the determinant unchanged, it follows that
(9.59a)
Similarly, since , we have
(9.59b)
for the Hermitian conjugate matrix A†. Multiplying a matrix by a scalar constant λ multiplies every element ai by λ, but since only one member of each row occurs in the determinant, we have
(9.60a)
for a square matrix of order n. The determinant of a product of matrices is equal to the product of the determinants.
The proof of (9.60b) is rather lengthy and will not be reproduced here4. However, it follows from it that
and repeated application of (9.60b) leads to
(9.60d)
for any number of matrices, independent of their order.
Equation (9.60b) also leads to useful results for unitary and orthogonal matrices. Specifically, from (9.57a) and (9.60b), we obtain
Hence the determinant of a unitary matrix is either +1 or −1, and since an orthogonal matrix O is just a real unitary matrix, the same result applies to orthogonal matrices.
A simple example of an orthogonal matrix is the rotation matrix in two dimensions R(θ) described in (9.50). One sees that
consistent with (9.61). In contrast, a matrix that generates a reflection in a given axis, for example
so that x′ = −x, y′ = y, has determinant − 1. This behaviour is characteristic of rotations and reflections about any given axis.
We can now complete the discussion of matrix algebra. The operation of division by a matrix is not defined. However, if we can find a matrix D such that AD = DA = I, then D is called the inverse of A and is written A− 1, so that
The analogy with division is then multiplication by A− 1, so that, for example,
Equation (9.62) can only be satisfied if A and A− 1 are square matrices of the same order, while (9.60b) then implies
so that a singular matrix (one having ) has no inverse, whereas a non-singular matrix does have an inverse. To find the inverse of a matrix A, we need a new matrix called the adjoint, denoted . This is defined as the transpose matrix of the co-factors of A. Thus for the n × n matrix A, with co-factors Aij corresponding to the element aij, the adjoint matrix is
(9.63)
from which it follows that
To see this, we note that for i = j, (9.64) is just the Laplace expansion of along row i; while for i ≠ j, it is the Laplace expansion of a matrix A′ which differs from A in that the jth row is replaced by the ith row. Thus we have arrived at the result that the matrix defined by has the property that AD = I and hence D can be identified with the inverse matrix A− 1, i.e.
and AA− 1 = I. A similar argument gives A− 1A = I, and hence (9.62) is satisfied.
Using this result, it is easy to prove that
(9.66a)
and
(9.66b)
while
(9.66c)
For a 2 × 2 matrix A, (9.65) reduces to
(9.67)
but the evaluation of the inverses of matrices with higher dimensionality can be somewhat tedious. However the computational work needed can be reduced by a process called row reduction, or Gaussian elimination.
The three elementary operations used in row reductions are:
Since by the law of matrix multiplication, the identity AA− 1 = I involves only the rows of A and the columns of A− 1, it follows that the equality is preserved if one applies the same row reductions to A and the unit matrix; hence if a set of row reductions can be found which transform A to I, the same set will transform I to A− 1. For example, if
then the row reduction r1 → r1 − 2r3 transforms the first row of A to (1, 0, 0), and when followed by the reduction r2 → r2 − r1 yields a unit matrix, as follows:
Applying the same sequence of reductions to the unit matrix I gives
so that
The calculations involved in manipulating matrices of large dimensionality can be very tedious and in these cases useful computer programs exist, such as that referenced in footnote 1 in Section 9.1.1. Simpler, but effective, free programs may also be found on the internet.
The n simultaneous linear equations in n unknowns xi(i = 1, 2, …, n) given in (9.9) are conveniently written in matrix form
where
The solution of (9.68) for the homogeneous case b = 0 was discussed in Section 9.1.3. Here we consider the inhomogeneous case, when b ≠ 0. We will also start by assuming that A is non-singular so that A− 1 exists. Then the solution of (9.68) is
(9.69)
and the solution is unique. The latter statement follows from assuming there are two solutions, x(1) and x(2), so that Ax(i) = bi(i = 1, 2). Then Ax(1) = Ax(2), and since A has an inverse, we may multiple by A− 1 to obtain x(1) = x(2), as required for the solution to be unique.
The solution of linear simultaneous equations by finding the inverse matrix A− 1 can be tedious and it is sometimes simpler to use an alternative method based on Cramer's rule, which we now discuss. We will again consider the set of equations (9.68a), which we will write in the form
(9.70)
Multiplying the equation for bi by Aij and summing over i using (9.64), gives
Hence, provided , and setting , (9.71) becomes
or equivalently,
where Δj is the determinant obtained by replacing the elements in the jth column of Δ by the elements of the column vector b. Equations (9.72a) and (9.72b) are the combined statement of Cramer's rule.
We now briefly consider the cases where A− 1 does not exist, that is, when There are two possibilities:
In the case of three simultaneous equations, these results have a simple geometrical interpretation. For n = 3, (9.68b) reduces to the three equations
and if we interpret x1, x2 and x3 as Cartesian co-ordinates x, y and z, on comparing to (1.51) we see that these are the equations of three planes. Assuming they are not identical, the first two planes will intersect in a straight line. There are then three possibilities. If the line lies in the plane described by the third equation, then any point on it is a solution to all three equations so that there is an infinite number of solutions. This corresponds to case (ii) above. Alternatively, if the line of intersection is parallel to, but not in, the third plane, there is no solution. This corresponds to case (i) above. Finally, if the line of intersection is not parallel to the third plane, it will pass through it at a single point, corresponding to a unique solution.
The vectors a, b, c, are given by
Use determinants to evaluate a × b and b · a × c.
Evaluate the determinant
by using the Laplace expansion about (i) the third column and (ii) the first row.
Use the general properties of a determinant, as stated in Section 9.1.2, to show that the determinant
may be written
and find its value.
Simplify and hence evaluate the determinant
Solve the equation
Write the determinant
as the product of factors that are linear in α, β, γ.
The n × n determinant Δn is given by
Establish a recurrence relation for Sn ≡ Δn + Δn − 1 and hence find an explicit formula for Δn
Consider the two sets of homogeneous equations
Determine whether these sets have non-trivial solutions for x, y, z and, if so, find them.
Find the values of α for which the equations
have a unique consistent solution and solve the equations for the larger of these values.
Given two vectors a and b in an arbitrary number of dimensions, use the properties of the inner product and the Cauchy–Schwarz inequality, (9.26), to prove:
the parallelogram equality
Consider the matrices
The three matrices
called the Pauli spin matrices, form a ‘vector’ σ. Show that (σ · a)2 = a2 I, where a is an arbitrary real vector a = (ax, ay, az) and I is the 2 × 2 unit matrix.
If the matrices M± are defined by M± ≡ Mx ± iMy, where
show that the commutator [M+, M−] ≡ M+M− − M−M+ = 2Mz.
Write down the matrix operator corresponding to a rotation R(θ) through an angle θ about the z-axis in three dimensions, where positive θ corresponds to the x-axis moving towards the original y-axis. Use the form of this matrix to verify explicitly that
and that
The matrix operators corresponding to rotations Rx(θ) and Ry(θ) through an angle θ about the x and y axes are given by
Show that the matrix corresponding to a rotation through θ1 about the x-axis, followed by a rotation through θ2 about the y-axis, is given by
Do Rx(θ1) and Ry(θ2) commute?
The powers of a matrix X are defined by X2 ≡ XX, X3 ≡ XXX etc., while its exponential is defined as
If A and B are square matrices: (a) find an expression for (A + B)3 in terms of the products of A and B and their powers; (b) derive a condition for the relation
to be valid.
Find the transpose, complex conjugate and Hermitian conjugate of the matrix
Verify that the matrix
is unitary.
Express the matrix
in the form AS + AAS, where AS is a symmetric matrix and AAS is an anti-symmetric matrix.
Which of the matrices below are: (i) symmetric, (ii) orthogonal, (iii) unitary or (iv) Hermitian? Use the matrix that has none of these properties to construct (v) an anti-symmetric matrix and (vi) an anti-Hermitian matrix.
Find the inverse of the matrix
and check the answer by direct multiplication.
Find the inverse of the matrix
and hence solve the matrix equation
Find by matrix inversion the solution of the equations
Find the solution of the equations
by Cramer's rule.
The half-life τ of a radioactive atom is defined as the time it takes for half of a given quantity of atoms to decay. A sample consists of just two radioactive components A and B, both of which decay to gaseous products that rapidly disperse. The sample is weighed after 8 and 12 hours and is found to weigh 90 and 30 grams, respectively. If the half-lives of A and B are τa = 2 h and τb = 4 h, respectively, use Cramer's rule to calculate the amounts of A and B initially in the sample.
For what values of the constants α and β do the simultaneous equations
have a unique solution?
Comment on both the existence and uniqueness of solutions in the cases: (i) α = 3, β = 6 ; (ii) α = 3, β = 2.