Given a square matrix A, it is often required to find scalar constants λ and vectors x such that
is satisfied. This equation only has non-trivial solutions x ≠ 0 for particular values of λ. These values are called eigenvalues and the corresponding vectors x are called eigenvectors.1 In physical applications the eigenvalues often correspond to the allowed values of observable quantities. In what follows, we shall firstly consider the solutions of (10.1) in general, before specialising to Hermitian matrices, which are the most important in physical applications. We then show how knowledge of the eigenvalues can be used to transform the matrix A to diagonal form, with applications to the theory of small vibrations and geometry.
The eigenvalue equation (10.1) may be written in the form
This is a set of homogeneous linear simultaneous equations in the components xi (i = 1, 2, …, n) of the type discussed in Section 9.1.2 and has non-trivial solutions if, and only if,
which is called the characteristic equation of the matrix A. The determinant is given by
where
is a polynomial in f(λ) in λ of degree n, called the characteristic polynomial, whose coefficients αi (i = 1, 2, …, n) depend on the matrix elements aij. Solving (10.3) is equivalent to finding the roots of this polynomial. In general, any polynomial of order n has n roots when complex values are allowed,2 so (10.4b) may be written in the form
and thus (10.3) gives rise to n eigenvalues λi (i = 1, 2, …, n). However, not all these eigenvalues are necessarily distinct, that is, two or more may have the same numerical value.
Once the eigenvalues have been determined, each value of λ = λi may be substituted into (10.2). In each case this yields a set of n simultaneous homogeneous linear equations in the components [x(i)]j of the corresponding eigenvector x(i), which may be solved by the methods discussed Section 9.1.2, as we shall shortly illustrate.3 However, this does not uniquely determine the eigenvectors, because if x is a solution of (10.2), then so is αx, where α is any constant. We will usually exploit this to choose normalised eigenvectors x of unit modulus, that is, such that (x, x) = |x|2 = 1.
In this section we will derive some useful properties of eigenvalues that follow directly from (10.3).
Firstly, if A is singular, that is, , then it follows from (10.3) that it has an eigenvalue λ = 0; conversely, if A has an eigenvalue λ = 0, then it is singular. Secondly, it follows from (10.4a) and (10.4c) that
Setting λ = 0 then gives
that is, the determinant of any matrix is equal to the product of its eigenvalues. Similarly, as we shall show, the sum of the eigenvalues is given by
where the trace is
Together with (10.6), Equation (10.7) is very useful in checking that the eigenvalues of a given matrix have been computed correctly. It is proved by computing the coefficient of λn − 1 in (10.4b) using (10.4a) and (10.4c) in turn, and comparing the results. In (10.4a), the co-factors of a12, a13, …, a1n are polynomials of order λn − 2. Hence terms of order λn − 1 can only occur in the product of the diagonal elements in (10.4a), giving
On the other hand, expanding (10.4c) gives
and comparing the two expressions yields the desired result.
Finally, suppose that an n × n matrix A has k ≤ n distinct eigenvalues λ1, λ2, …, λk, that is, λi ≠ λj for i ≠ j and i, j ≤ k. Then the following related matrices also have a total of k distinct eigenvalues, as specified below.
Here we will prove (iii) and leave the others as exercises for the reader. Since λi is an eigenvalue of A,
which by (9.59b) implies
From (9.33) and (9.55), we have
so that
and hence λ*i is an eigenvalue of A† for all i = 1, 2, …, k. That they are the only distinct eigenvalues of A†, even if k < n, follows by using the argument in reverse. Suppose A† had an extra eigenvalue λi ≠ λ*i, i = 1, 2, …, k. Then since (A†)† = A, this would imply that A had a distinct eigenvalue λ ≠ λi, i = 1, 2, …, k, in contradiction to the requirement that k is the total number of distinct eigenvalues of A.
If x(i) (i = 1, 2, …, k) is a set of eigenvectors corresponding to k different eigenvalues λi (i = 1, 2, …, k), then x(i) are linearly independent. That is, there is no linear relationship of the type
where the ci are constants, except the trivial case ci = 0 where i = 1, 2, …, k. The proof is as follows.
Since Ax(i) = λix(i) (i = 1, 2, …, k),
Suppose now that a condition of the form (10.9) does exist and we operate on it by (A − λjI), with the result
For j = 2, using (10.10) and (10.11) gives
where the term in x(2) is absent. If this operation is now repeated on (10.12) using j = 3, an additional bracket (λ1 − λ3) multiplying each term will be generated and the term in x(3) will be eliminated. Repeating the operation for the remaining values of j successively, eventually yields the result
and since all the λi are assumed to be different, this implies that c1 = 0. The same method can be used to show that c2 = 0, and so on. Hence if all the values of λi are different, only the trivial solution ci = 0 (i = 1, 2, 3, …, k) exists, and so the eigenvectors are linearly independent.
We next consider the implications of this for an n × n matrix A. If all the eigenvalues λi (i = 1, 2, …, n) are distinct, then k = n above and there are n linearly independent eigenvectors x(1), x(2), …, x(n). Since an n-dimensional space cannot contain more than n linearly independent vectors, the eigenvectors form a complete set of linearly independent vectors, as defined in Section 9.2.1. Hence an arbitrary vector x can always be written as a sum of eigenvectors of the form
where the numerical constants αi depend on x.
It remains to consider the case where k < n, that is, when there are less than n distinct eigenvalues. To illustrate this, suppose the characteristic polynomial is of the form
so that there are k = n − 1 distinct eigenvalues. Nonetheless, one can usually find two linearly independent eigenvectors x(n − 1), x(n) that both have eigenvalue λn − 1. Hence there are still n linearly independent eigenvectors, and an arbitrary vector x can still be expanded in the form (10.13). However, sometimes, as we shall illustrate by an example below, there is only a single eigenvector x(n − 1) corresponding to λn − 1. Hence there are only n − 1 linearly independent eigenvectors. Matrices like these, which have fewer independent eigenvectors than dimension of the matrix, are called defective matrices. For such matrices, an arbitrary vector in the n dimensional space cannot be expanded in terms of its eigenvectors.
In most physical applications, and especially in quantum mechanics, the eigenvalues and eigenvectors of interest are those of Hermitian matrices. This is because the eigenvalues are real and so can correspond to measurable quantities. In addition, the eigenvectors corresponding to different eigenvalues are not only linearly independent, but also orthogonal. In particular, these results apply to real, symmetric matrices, which are automatically Hermitian.
To prove these properties, consider a Hermitian matrix A and an eigenvector a, corresponding to an eigenvalue λa, so that
Taking the Hermitian conjugate, we obtain
where we have used A = A† and the relation
which follows from (9.33) and (9.53). Then multiplying (10.14a) on the left by a† and (10.14b) on the right by a, we obtain
and
Since (a, a) ≠ 0, these equations can only be satisfied if λa = λ*a, that is, the eigenvalue is real, as required.
Next we consider a second eigenvector b satisfying
On multiplying (10.14c) on the left by a† and (10.14b) on the right by b, we obtain
and
where in the second equation we have used the result λ* = λ proved above. Since λa ≠ λb, these two equations are only compatible if
(10.15)
that is, the eigenvectors are orthogonal.
An n × n Hermitian matrix A always has n linearly independent eigenvectors4 x(i). Hence an arbitrary n-dimensional vector can always be expanded in the form (10.13), that is,
where
and we have chosen unit eigenvectors . If the eigenvalues λi are all different, then the eigenvectors are orthonormal, that is,
where δij is the kronecker delta symbol defined in (9.24b). Multiplying (10.16a) by and using (10.17a) then gives
for the coefficients αj.
Equations (10.16) and (10.17) are very convenient in applications, but are only automatically valid if the eigenvalues λi are all different. If this is not so, the eigenvectors (10.16b) are not uniquely defined. However, one may always choose a complete set of linearly independent eigenvectors (10.16a) and (10.16b) that do satisfy (10.17a) and (10.17b). To see this, let us suppose there are k linearly independent eigenvectors u(1), u(2), …, u(k) corresponding to a given eigenvalue , that is,
Then the eigenvalue is said to be k-fold degenerate and any linear combination of the form
(10.18)
where the αi are arbitrary constants, is also an eigenvector. In particular, it is possible to choose a sequence of eigenvectors
(10.19a)
in which each x(i), i ≤ k, is chosen to be orthogonal to all x(j) with j < i. These can then be normalised:
(10.19b)
This procedure is called Gram-Schmidt orthogonalisation, and the resulting eigenvectors x(i) satisfy (10.17a), as required. They are, however, not unique and other choices of linearly independent eigenvectors satisfying (10.17a) are also possible.
In Section 9.2.1, we emphasised that the components of a vector depend on the choice of basis vectors. To find the corresponding dependence of a linear operator A, we first note that (9.21b) can be written in the matrix form a = Pa′ on transforming from the primed to unprimed basis. Re-labeling the vector a as x for convenience, this becomes
on transforming from the primed to unprimed basis. Furthermore, if we write the reverse transformation in the form x′ = P′ x, then we have
and since this must hold for any vector x, we must have P′ = P− 1 and hence
The corresponding transformation for a matrix A is then obtained by applying (10.21) to a vector y = Ax and using (10.20) to give
where
Equations of the type (10.22) are called similarity transformations and two matrices A and A′ related in this way are said to be similar. In geometrical problems we know that a suitable choice of co-ordinates can often simplify calculations and likewise problems involving linear transformations can often be simplified by a judicious choice of basis. In particular, any n-dimensional matrix with n linearly independent eigenvectors5 can be transformed to diagonal form by means of a similarity transformation. To see this, set
(10.23a)
i.e. the columns of P are the eigenvectors of A. Then from (10.22),
The matrix A′ is thus diagonal with elements that are the eigenvalues of A, that is,
(10.23b)
Using this expression, together with (9.60b) and (10.8), it follows that
in accordance with (10.6) and (10.9). In addition, with this transformation, the basis vectors with respect to which A′ is defined are just the eigenvectors, since
i.e. x′(1) = e(1) and so on.
Finally, we note that for Hermitian operators A, and some other types of matrices,6 the eigenvectors can always be chosen to be an orthonormal set. We then have
Hence P is unitary, that is, P− 1 = P† and so the original matrix can be diagonalised by
(10.24)
which is easier to evaluate.
In physical applications, diagonalisation of a matrix often enables one to choose a set of variables that decouple from each other. A typical application in mechanics is that of coupled oscillations. An example is given in Figure 10.1. This shows two equal masses m that are joined by a spring and suspended from fixed points by strings of equal length l. We will analyse the motion of the system when the weights are displaced small distances from their equilibrium positions, as shown.
If the instantaneous displacements are x1 and x2, then the force due to the spring pulling the two masses together is mk(x2 − x1), where mk is the spring constant. The tension Ti in the string produces a horizontal restoring force of magnitude mgxi/l, for small displacements, and so the equations of motion of the system are
(10.25a)
and
(10.25b)
These coupled equations may be written in the matrix form
where
(10.26b)
We now look for a transformation P such that
and
Since P is independent of t, the equations of motion become
so that in terms of x′1 and x′2, the equations of motion decouple
The eigenvalues are obtained using the characteristic equation
that is
The solution of the equations of motion (10.27) are then
(10.28a)
and
(10.28b)
where , , and where a1, b1, a2, b2 are arbitrary constants. If the latter are chosen such that x′1 = 0 (or x′2 = 0), the system vibrates with a single frequency ω1 (or ω2) and the motion is called a normal mode of the system. In general the actual motion will be a linear combination of its normal modes.
To express the motion (10.28) in terms of the original variables x1, x2, we need to find the matrix P. To do this, we first have to find the eigenvectors u(1) and u(2). Using the techniques discussed previously, we find the two eigenvectors
Thus, from x = P x′,
which, together with (10.28), completes the matrix analysis of solution. Specific motions depend on the values of the constants a1, b1, a2, b2, as shown in Example 10.6 below.
Finally, we note that coupled oscillations occur in a wide variety of contexts in physical science, which include compound pendulums, electrical circuits and infra-red spectroscopy. Provided the oscillations are small,7 as in the example above, they are always described by equations of the form (10.26a), where A can in general be a real n × n matrix with n ≥ 2. As in the example, these are solved by diagonalising the matrix to obtain a set of n decoupled equations analogous to (10.27), with solutions of the form (10.28) for each of the new variables. Further examples, from classical mechanics, are explored in the problems at the end of this chapter.
Another example of matrix diagonalisation occurs in the theory of quadratic forms. These are expressions of the type
where the quantities xi and the coefficients aij are real. The latter form an n × n square matrix A, so (10.30) may be written
where xT = (x1, x2, ⋅⋅⋅, xn). Furthermore, it can be seen from (10.30) that Q is the sum of terms of the form (aij + aji)xixj, which may be written (cijxixj + cjixjxi), where
Hence the quadratic form (10.31) can always be written in the form
where C is a real symmetric matrix. Therefore, in considering the quadratic forms (10.30), we may, without loss of generality, consider only cases where A is a real symmetric matrix. If Q > 0, it is said to be positive definite.
One application of quadratic forms is in analytic geometry. For example, suppose a surface in three-dimensional space is described by the equation
where x, y, z are Cartesian co-ordinates and k is a constant. Because of the cross terms in xy, etc., it is not obvious what is the geometrical nature of the surface. Its visualisation would be simpler if the surface could be expressed in co-ordinates such that the cross terms were absent. This may be done by using the technique of diagonalisation. We start by writing (10.32) in the matrix form
where x = (x, y, z)T and A is a real symmetric matrix. Since A is Hermitian it can be diagonalised by a unitary matrix P, where P− 1 = P†; and since it is also real, it can be chosen to be a real orthogonal matrix, with P− 1 = PT, so that
where λi (i = 1, 2, 3) are the eigenvalues of A. Given P, we can define new co-ordinates x′ = (x′, y′, z′) in terms of which (10.32) becomes simpler. The equation for the surface in these new co-ordinates may be found by writing
so that (10.33) becomes
(10.34)
where x′ = PTx = (x′, y′, z′). Writing this in terms of the new Cartesian co-ordinates gives
which is the equation of the quadratic surface where the eigenvectors of A define the direction the new co-ordinate axes x′, y′, z′, called the principal axes. They are related to the original axes x, y, z by rotations about, and possibly a reflection in, the origin.
The geometrical interpretation depends on the signs of the denominators in (10.35). If all three are positive, then (10.35) describes an ellipsoid, as shown in Figure 10.3. In this case the principal axis x′, for example, cuts the quadratic surface where y′ = z′ = 0, which from (10.35) is where x′ = ±(k/λ1)1/2. Thus the distance along the x′ axis from the origin to the point of intersection is a = (k/λ1)1/2. This is called the length of the semi-axis. The lengths of the other semi-axes are similarly given by b = (k/λ2)1/2 and c = (k/λ3)1/2, as shown in Figure 10.3.8
If all three denominators are different, then the ellipsoid is said to be triaxial. More familiar shapes are obtained when two of the denominators are equal. For example, if a = b > c, the ellipsoid reduces to an oblate spheroid, as shown in Figure 10.4b; while if a = b < c, it reduces to a prolate spheroid, as shown in Figure 10.4a. A familiar example of the former is the shape of earth, which is to a good approximation an oblate spheroid; while a rugby (or American) football is roughly a prolate spheroid. If a = b = c, the spheroid reduces to a sphere.
Finally, if one of the denominators in (10.35) is negative, the shape is a hyperboloid of one sheet, while if two are negative, it corresponds to a hyperboloid of two sheets, as shown in Figure 10.5a and Figure 10.5b, respectively. Examples of the former are the large cooling towers seen at power stations.
10.1 Given that one of the eigenvalues of the matrix
is λ = 3, find the other two eigenvalues, and hence the associated eigenvectors. Are the eigenvectors orthogonal?
10.2 Verify that the sum of the eigenvalues of the matrix
is equal to its trace and that their product is equal to .
10.3 Verify that the eigenvalues of the matrix
are the inverses of the eigenvalues of A− 1.
10.4 If A is an n × n matrix with eigenvalues λi (i = 1, 2, …, n), show that the transpose matrix AT also has eigenvalues λi, and that the inverse matrix A− 1, if it exists, has eigenvalues λ− 1i.
10.6 Find the linearly independent eigenvectors of the matrix
Is the matrix defective?
10.7 Show that the eigenvalues of an anti-Hermitian matrix A† = −A are purely imaginary, and that the eigenvectors corresponding to distinct eigenvectors are orthogonal.
Find the eigenvalues and eigenvectors of the matrix
Are the eigenvectors orthogonal?
Verify that the eigenvectors of the Hermitian matrix
are orthogonal.
10.9 Confirm, by explicit calculation, that the eigenvalues of the real, symmetric matrix
are real, and its eigenvectors are orthogonal.
10.10 Use the Gram–Schmidt orthogonalisation process of Section 10.1.3 to construct the orthonormalised vectors (i = 1, 2, 3) corresponding to the vectors
10.11 Source a computer matrix-manipulation application on the internet (there are several free ones) and use it to find the determinant, the inverse, the eigenvalues and the eigenvectors of the matrix
*10.12 Find the matrix that diagonalises the matrix
Verify this result by finding the form of the resulting diagonal matrix.
*10.13 Consider three masses on the x-axis joined by springs that obey Hooke's law with a common spring constant k, as shown in Figure 10.7. If the three masses remain on the x-axis, find the normal modes, in which they all move with the same frequency. (This type of system provides a simple model of molecules like CO2 that is, carbon dioxide, where the three atoms are arranged linearly.
A mass m, connected to two fixed points by identical stretched strings each of length l and with tension T, is displaced transversely from its equilibrium position by a distance y, as shown in Figure 10.8a. Assuming that for small displacements the change in the tension T can be neglected, show that
*10.15 Two masses, m and 3 m, suspended from two springs with force constants 4 k and k, respectively, are displaced downwards from their equilibrium positions by x1 and x2, as shown in Figure 10.9. If they are released from rest at x1 = 0, x2 = 1 at time t = 0, what will their positions be at time t = (m/k)1/2?
*10.16 Consider the surface described by the equation
By writing this in the quadratic form xTAx = k, find the principal axes, and show that it is a two-sheet hyperboloid. What is the distance between the two sheets? Hint: One of the eigenvalues of A is λ1 = 18.
*10.17 Classify the surfaces described by the quadratic forms xTAx = k > 0, as ellipsoid or spheroid (specify which type in either case), when
*10.18 Show that the quadratic form
for any unit vector x, where λm is the smallest eigenvalue of A. Hence state the condition for Q to be positive definite (Q > 0) for all x, except for the null vector x = 0.
*10.19 Show that the curve described by the equation
is a hyperbola. Find the angle between the principal axes and the x and y axes, and sketch the hyperbola in the x–y plane. What are the x and y co-ordinates of the points at which the two branches are closest together?