III.64 Optimization and Lagrange Multipliers

Keith Ball


1 Optimization

Soon after being introduced to calculus, most students learn of its application to optimization: that is, to the problem of finding the largest or smallest value of a given differentiable function, which is usually referred to as the objective function. A very helpful observation is that if f is an objective function that is maximized or minimized at x, then the tangent to the graph at the point (x, f(x)) will be horizontal, since otherwise we can find some value x′ close to x for which f(x′) is higher. This means that we can narrow down the search for the maximum and minimum values of f by looking just at the values of f(x) for which f′(x) = 0.

Now suppose that we have an objective function of more than one variable, such as, for example, the function

F(x, y) = 2x + l0y - x2 + 2xy - 3y2.

The “graph” of F is obtained by plotting the values F(x, y) of F as heights above the corresponding points (x, y) of the plane, so now it is a surface instead of a curve. A smooth surface possesses not a tangent line at each point, but a tangent plane. If F has a maximum value, it will occur at a point where the tangent plane is horizontal.

The tangent plane at each point (x, y) is the graph of the linear function that best approximates F near (x, y). For small values of h and k, F(x + h, y + k) will be approximately equal to F(x, y) plus a function of the form

(h, k) Image ah + bk,

that is, F(x, y) plus a linear function of h and k. As explained in SOME FUNDAMENTAL MATHEMATICAL DEFINITIONS [I.3 §5.3], the derivative of F at (x, y) is this linear map. The map can be represented by the pair of numbers (a, b), which can in turn be thought of as a vector in Image2. This derivative vector is usually called the gradient of the function F at the point (x, y) and is written ∇F(x, y). In vector notation (writing x for (x, y) and h for (h, k)), the approximation to F near (x, y) is

Image

Thus, ∇F points in the direction in which F increases most rapidly if you start at x, and the magnitude of ∇F is the slope of the “graph” of F in this direction.

The components a and b of the gradient can be calculated using partial differentiation. The number a tells us how quickly F(x, y) changes as we vary x while keeping y fixed: so to find a, we differentiate F(x, y) = 2x + l0y - x2 + 2xy - 3y2 with respect to x, treating y as a constant. In this case we get the partial derivative

Image

Similarly,

Image

Now, if we want to locate points where the tangent plane is horizontal, then we want to find the points at which the gradient is zero: that is, the points at which the vector (a, b) is the zero vector. So we solve the pair of simultaneous equations

   2 - 2x + 2y = 0,

10 + 2x - 6y = 0

to get x = 4, y = 3. Thus, the only candidate for the maximum is the point (4, 3), where F takes the value 19. It can be checked that 19 is indeed the maximum value of F.

2 The Gradient and Contours

One of the most common ways of representing surfaces (landscapes on maps, for example) is by means of contour lines, or curves of constant height. In the xy-plane, we plot several curves of the form F(x, y) = V, for various “representative” values of V. For the function we considered earlier,

F(x, y) = 2x + l0y - x2 + 2xy - 3y2,

the values 0, 8, 14, 18, 19 yield the contour plot shown in figure 1. The 14 contour, for example, contains all the points at which the surface has height 14. The figure indicates that this particular surface is an elliptical hump whose peak occurs at (4, 3) and has height 19.

There is a simple geometrical relationship between the contour lines and the gradient vector. The vector equation (1) shows that the direction h in which F is instantaneously constant is the direction which makes the scalar product h · ∇F equal to 0: the direction perpendicular to F. At each point, the gradient vector is perpendicular to the contour through that point. This fact underlies the method of Lagrange multipliers that we shall discuss in the next section.

Image

Figure 1 A contour plot.

Image

Figure 2 Constrained optimization.

3 Constrained Optimization and Lagrange Multipliers

It often happens that we are interested in the maximum or minimum value of an objective function that depends upon several variables whose values are constrained to satisfy certain equations or inequalities. Consider, for example, the following problem.

Find the maximum value of

F(x, y) = 4y - x

over all pairs (x,y) satisfying the constraint

Image

Figure 2 shows the curve in the xy-plane defined by G(x,y) = 0 (an ellipse), and also a number of contour lines of the function 4y - x. Our aim is to find the largest that 4y - x can be if (x, y) is a point on the curve. So we want to find the largest value of V for which the corresponding contour 4y - x = V contains a point on the curve. The value of V increases as the lines move up the diagram, and the uppermost line that touches the curve is the one labeled 4y - x = 7. So the maximum value we are looking for is 7, and it occurs at the point where the line 4y - x = 7 touches the curve. It is easy to check that this point is (1, 2).

How could we locate this point algebraically, rather than by drawing? The important thing to notice is that the optimizing line is tangent to the curve: the line and the curve are parallel at their common point. The line was chosen to be a contour of the function F. The curve is also a contour: the 0 contour of G. From the discussion in the previous section we know that these contours are perpendicular to the gradients of F and G, respectively (at the point in question). So the two gradient vectors are parallel to one another and are therefore multiples of one another: F = ImageG, say.

We thus have a way to hunt for solutions to the constrained optimization problem

maximize F(x, y)   subject to G(x, y) = 0.

We look for a point (x, y) and a number Image such that

Image

For our example (2), the gradient equation gives two partial derivative equations,

-1 = Image(2x - y - 1),      4 = Image(-x + 2y + 1),

from which we conclude that

Image

Substituting these values into the equation G(x, y) = 0, we obtain

Image

which has two solutions: Image = 1 and Image = -1. If we substitute Image = 1 into (4), we get the point (1, 2) where F is at its maximum. (Taking Image = -1 yields the minimum.)

The number Image that we introduced to solve the problem is called a Lagrange multiplier. It is possible to reformulate the problem by defining the Lagrangian

Image(x, y, Image) = F(x, y) - ImageG(x, y)

and then condensing the equations (3) into a single equation

Image = 0.

The reason this works is that if we differentiate Image with respect to Image, then we obtain G(x, y), so requiring this partial derivative to be zero is equivalent to requiring G(x, y) to be zero. And asking for the other two partial derivatives to be zero is equivalent to requiring that F = ImageG. The remarkable fact about this reformulation is that it has turned a constrained optimization problem involving x and y into into an unconstrained problem involving x, y, and Image.

4 The General Method of Lagrange Multipliers

In real problems we may wish to optimize a function F of many variables x1, . . . , xn under many constraints G1(x1, . . . , xn) = 0, G2(x1, . . . , xn) = 0, . . . , Gm(x1, . . . , xn) = 0. In this case we introduce a Lagrange multiplier for each constraint and define the Lagrtingian Image by the formula

Image

The partial derivative of Image with respect to Imagei is zero if and only if Gi(x1, . . . , xn) = 0. And the partial derivatives with respect to the xi will all be zero if and only if Image. This tells us that any direction that is perpendicular to all the gradients Gi (and therefore lies in all their “contour hypersurfaces”) will be perpendicular to the gradient F as well, so we cannot find a direction in which F increases while all the constraints are satisfied.

Problems of this kind occur frequently in economics, where the objective function F is a cost (which we are probably trying to minimize), and the constraints force us to allocate spending among different items so as to satisfy certain overall demands. For instance, we might want to minimize the total cost of supplies of various different foodstuffs that between them had to satisfy various nutritional demands. In this case, the Lagrange multipliers have an interpretation as “notional prices.” As we have just seen, at the optimum point we have an equation of the form Image. This tells us how much F will vary as we vary the Gi by small amounts: that is, it tells us the costs associated with increasing the various demands.

For a further use of Lagrange multipliers, see THE MATHEMATICS OF TRAFFIC IN NETWORKS [VII.4].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset