V.19 Inequalities


Let x and y be two nonnegative real numbers. Then (Image - Image)2 = x + y - 2Image is a nonnegative real number,from which it follows that Image(x + y) ≤ ImageThat is, the arithmetic mean of x and y is at least as big as the geometric mean. This conclusion is a very simple example of a mathematical inequality; its generalization to n numbers is called the AM–GM inequality.

In any branch of mathematics that has even the slightest flavor of analysis, inequalities will be of great importance: as well as analysis itself, this includes probability, and parts of combinatorics, number theory, and geometry. Inequalities are less prominent in some of the more abstract parts of analysis, but even there one needs them as soon as one wishes to apply the abstract results. For instance, one may not always need an inequality to prove a theorem about continuous LINEAR OPERATORS [III.50] between BANACH SPACES [III.62], but the statement that some specific linear operator between two specific Banach spaces is continuous is an inequality, and often a very interesting one. We do not have space to discuss more than a small handful of inequalities in this article, but we shall include some of the most important ones in the toolbox of any analyst.

Jensen’s inequality is another fairly simple but useful inequality. A function f : ImageImage is called convex if fx + μy) ≤ λf(x) + μf (y) whenever λ and μ are nonnegative real numbers with λ + μ = 1. Geometrically, this says that all chords of the graph of the function lie above the graph. A straightforward inductive argument can be used to show that this property implies the same property for n numbers:

f(λ1x1 + . . . + λnx1) ≤ λ1f(x1) + . . . + λnf(xn)

whenever all the λi are nonnegative and λ1 + . . . + λn = 1. This is Jensen’s inequality.

The second derivative of the EXPONENTIAL FUNCTION [III.25] is positive, from which it follows that the exponential function itself is convex. If a1,. . . , an are positive real numbers and we apply Jensen’s inequality to the numbers xi = log(ai), then we find, using standard properties of exponentials and LOGARITHMS [III.25 §4], that

Image

This is called the weighted AM–GM inequality. When all the λi are equal to 1/n it reduces to the usual AM–GM inequality. Applying Jensen’s inequality to other well-known convex functions produces several other well- known inequalities. For instance, if we apply it to the function x2, we obtain the inequality

Image

which can be interpreted as saying that if X is a RANDOM VARIABLE [III.71 §4] on a finite sample space, then (ImageX)2ImageX2.

The Cauchy–Schwarz inequality is perhaps the most important inequality in all of mathematics. Suppose that V is a real vector space with an INNER PRODUCT [III.37] 〈. , .〉 on it. One of the properties of an inner product is that 〈υ,υ〉 ≤ 0 for every υV, with equality if and only if υ = 0. Let us write ||υ|| for 〈υ,υ1/2. If x and y are any two vectors in V with ||x|| = ||y|| = 1, then 0≤, ||xy ||2 = 〈xy, xy〉 = 〈x,x〉 + 〈y,y〉 — 2 〈x,y〉 = 2 - 2 〈x,y〉. It follows that 〈x,y〉 ≤ 1 = ||x|| ||y||. Moreover, equality holds only if x = y. We can obtain a general pair of vectors by multiplying x by λ and y by μ, for some nonnegative real numbers λ and μ. Then both sides of the inequality scale up by a factor of λμ, so we can conclude that the inequality 〈x,y〉 ≤ ||x|| ||y|| holds in general, with equality if and only if x and y are proportional.

Particular inner-product spaces lead to special cases of this inequality, which are themselves often referred to as the Cauchy–Schwarz inequality. For instance, if we take the space Imagen with the inner product 〈a,b〉 = Image = 1 aibi, then we obtain the inequality

Image

It is not hard to deduce a similar inequality for complex scalars: one needs to replace Image and Image by |ai|2 and |bi|2 on the right-hand side. It is also not too hard to prove that inequality (2) is equivalent to the inequality (1) above.

Hölder’s inequality is an important generalization of the Cauchy–Schwarz inequality. Again it has several versions, but the one that corresponds to inequality (2) is

Image

where p belongs to the interval [1,∞] and q is the conjugate index of p, which is defined to be the number that satisfies the equation (1/p) + (1/q) = 1. (We interpret 1/∞ to be 0.) If we write ||a||p for the quantity (Image =1 |ai|P)1/p, then this inequality can be rewritten in the succinct form 〈a,b〉 ≤ ||a||p||b||q.

It is a straightforward exercise to find, for each sequence a, another (nonzero) sequence b such that equality occurs in the above inequality. Also, both sides of the inequality scale in the same way if you multiply b by a nonnegative scalar. It follows that ||a||p is the maximum of 〈a,b〉 over all sequences b such that ||b||q = 1. Using this fact, it is easy to verify that the function a Image ||a||p satisfies Minkowski’s inequality. ||x + y||p ≤ ||x||p + ||y||p.

This gives some idea of why Hölder’s inequality is so important. Once one has Minkowski’s inequality, it is very easy to check that ||·||p is (as the notation suggests) a NORM [III.62] on Imagen. This is an even more basic example of the phenomenon mentioned at the beginning of the article: just to show that a certain normed space is a normed space, we have had to prove an inequality about real numbers. In particular, looking at the case p = 2, we see that the entire theory of HILBERT SPACES [III.37] depends on the Cauchy–Schwarz inequality.

Minkowski’s inequality is a particular case of the triangle inequality, which states that if x,y, and z are three points in a METRIC SPACE [III.56], then d(x, z)ὄ d(x,y) + d(y, z), where d(a, b) denotes the distance between a and b. When put like this, the triangle inequality is a tautology, since it is one of the axioms of a metric space. However, the statement that a particular notion of distance actually is a metric is far from vacuous. If our space is Imagen and we define d(a,b) to be ||a - b||p, then Minkowski’s inequality is easily seen to be equivalent to the triangle inequality for this notion of distance.

The inequalities above have natural “continuous analogues” as well. For example, here is a continuous version of Hölder’s inequality. For two functions f and g defined on Image, let 〈f,g〉 be defined to be Imagef(x)g(x) dx, and write ||f||p for the quantity (Image|f(x)|p)1/p. Then, once again, 〈f,g〉 ≤ ||fp||g||q, where q is the conjugate index of p. Another example is a continuous version of Jensen’s inequality, which states, in a continuous setting, that if f is convex and X is a random variable, then f(ImageX) ≤ Imagef(X).

In all the inequalities we have so far mentioned, we have been comparing two quantities A and B, and it has been easy to identify the extreme cases where the ratio of A to B is maximized. However, not all inequalities are like this. Consider, for instance, the following two quantities associated with a sequence of real numbers a = (a1, a2,. . . , an). The first is the norm ||a||2 = (Image=1 Image)1/2. The second is the average of Image=1εiai| over all the 2n sequences (ε1, ε2,. . . , εn) such that each ct is 1 or -1. (In other words, for each i you randomly decide whether to multiply ai by -1 or not, add up the results, and take the expected absolute value of the sum.) It is not the case that the first quantity is always less than the second. For instance, let n = 2, and let a1 = a2 = 1. Then the first quantity is Image and the second is 1. However, Khinchin’s inequality (or to be more accurate an important special case of Khinchin’s inequality) is the remarkable statement that there is a constant C such that the first quantity is never more than C times the second. It is not hard to prove, using the inequality ImageX2 ≥ (ImageX)2, that the first quantity is always at least as big as the second; so the two rather different looking quantities are in fact “equivalent, up to a constant.” But what is the best constant? In other words, how much bigger can the first quantity be than the second? This question was not answered until 1976, by Stanislaw Szarek, over fifty years after Khinchin proved the original inequality. The answer turns out to be that the example given earlier is the extreme one: the ratio can never exceed Image.

This situation is typical. Another famous inequality for which the best constant was discovered much later than the inequality itself is the Hausdorff-Young in-equality, which relates norms of functions with norms of their FOURIER TRANSFORMS [III.27]. Suppose that 1 ≤ p ≤ 2, and that f is a function from Image to Image with the property that the norm

Image

exists and is finite. Let Image be the Fourier transform of f and let q be the conjugate index of p. Then ||Image||qCp ||f||p for some constant Cp that depends on p only (and not on f). Again, it was an open problem for many years to determine the best constant C. Some idea of why it might have been difficult can be gleaned from the fact that the “extreme” functions in this case are Gaussians: that is, functions of the form f(x) = e-(x-μ)2/2σ2. A sketch of a proof of the Hausdorff-Young inequality can be found in HARMONIC ANALYSIS [IV.11 §3].

There is an important class of inequalities known as geometric inequalities, where the quantities that are being compared are parameters associated with geometric objects. A famous example of such an inequality is the Brunn–Minkowski inequality, which states the following. Let A and B be two subsets of Imagen, and define A + B to be the set {x + y : xA, yB}. Then

(vol(A + B))1/n ≥ vol(A)1/n + vol(B)1/n

Here, vol(X) denotes the n-dimensional volume (or, more formally, the LEBESGUE MEASURE [III.55]) of the set X. The Brunn–Minkowski inequality can be used to prove the equally famous isoperimetric inequality in Imagen (which is one of a large class of isoperimetric inequalities). Informally, this states that, of all sets with a given volume, the one with the smallest surface area is a sphere. An explanation of why this follows from the Brunn–Minkowski inequality can be found in HIGH-DIMENSIONAL GEOMETRY AND ITS PROBABILISTIC ANALOGUES [IV.26 §3].

We finish this brief sample with one further inequality, the Sobolev inequality, which is important in the theory of partial differential equations. Suppose that f is a differentiable function from Image2 to Image. We can visualize its graph as a smooth surface in Image3 lying above the xy-plane. Suppose also that f is compactly supported, which means that there exists an M such that f(x, y) = 0 if the distance from (x, y) to (0, 0) is greater than M. We would now like to bound the size of f, as measured by some Lp norm, in terms of the size of its GRADIENT [I.3 §5.3] ∇ f, as measured by some other Lp norm. The Lp norm of a function f is defined here as

Image

In one dimension, it is clear that no such bound is possible. For instance, we could have a differentiable function that was 1 everywhere on the interval [-M, M], 0 everywhere outside the wider interval [ -(M + 1), M + 1], and gently decaying from 1 to 0 in between. Then if we increased M we would not change the size of the derivative: we would just move the two nonzero parts of the derivative further apart. On the other hand, by increasing M we could increase the size of f as much as we liked. However, we cannot do this sort of construction in two dimensions, because now the “boundary” of the function increases as the size of the function increases. The Sobolev inequality tells us that if 1 ≤ p < 2 and r = 2ρ/(2 - p), then ||f||r Cp||∇f||p. To see why this might be reasonable, consider the case p = 1, so that r = 2. Let f be a function that is 1 everywhere inside the circle of radius M about the origin and 0 everywhere outside the circle of radius M + 1. Then as M increases, the norm ||f||2 increases in proportion to M (since ||f||Image is approximately equal to the area of the circle of radius M), and so does ||∇f||1 (since it is roughly proportional to the length of the boundary of the circle). As this informal argument suggests, there are close connections between the Sobolev inequality and the isoperimetric inequality in the plane. And like the isoperimetric inequality, the Sobolev inequality has an n-dimensional version for each n: it is the same result, except that now the condition is that 1 ≤ p < n, and r is equal to np / (n - p).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset