III.18 Distributions

Terence Tao


A function is normally defined to be an object f : X → Y which assigns to each point x in a set X, known as the domain, a point f (x) in another set Y, known as the range (see THE LANGUAGE AND GRAMMAR OF MATHEMATICS [I.2 §2.2]). Thus, the definition of functions is set-theoretic and the fundamental operation that one can perform on a function is evaluation: given an element x of X, one evaluates f at x to obtain the element f(x) of Y.

However, there are some fields of mathematics where this may not be the best way of describing functions. In geometry, for instance, the fundamental property of a function is not necessarily how it acts on points, but rather how it pushes forward or pulls back objects that are more complicated than points (e.g., other functions, BUNDLES [IV.6 §5] and sections, SCHEMES [IV.5 §3] and sheaves, etc.). Similarly, in analysis, a function need not necessarily be defined by what it does to points, but may instead be defined by what it does to objects of different kinds, such as sets or other functions; the former leads to the notion of a measure; the latter to that of a distribution.

Of course, all these notions of function and function-like objects are related. In analysis, it is helpful to think of the various notions of a function as forming a spectrum, with very “smooth” classes of functions at one end and very “rough” ones at the other. The smooth classes of functions are very restrictive in their membership: this means that they have good properties, and there are many operations that one can perform on them (such as, for example, differentiation), but it also means that one cannot necessarily ensure that the functions one is working with belong to this category. Conversely, the rough classes of functions are very general and inclusive: it is easy to ensure that one is working with them, but the price one pays is that the number of operations one can perform on these functions is often sharply reduced (see FUNCTION SPACES [III.29]).

Nevertheless, the various classes of functions can often be treated in a unified manner, because it is often possible to approximate rough functions arbitrarily well (in an appropriate TOPOLOGY [III.90]) by smooth ones. Then, given an operation that is naturally defined for smooth functions, there is a good chance that there will be exactly one natural way to extend it to an operation on rough functions: one takes a sequence of better and better smooth approximations to the rough functions, performs the operation on them, and passes to the limit.

Distributions, or generalized functions, belong at the rough end of the spectrum, but before we say what they are, it will be helpful to begin by considering some smoother classes of functions, partly for comparison and partly because one obtains rough classes of functions from smooth ones by a process known as duality a linear functional defined on a space E of functions is simply a linear map φ from E to the scalars Image or Image. Typically, E is a normed space, or at least comes with a topology, and the dual space is the space of continuous linear functionals.

The class Cω [-1,1] of analytic functions. These are in many ways the “nicest” functions of all, and include many familiar functions such as exp(x), sin(x), polynomials, and so on. However, we shall not discuss them further, because for many purposes they form too rigid a class to be useful. (For example, if an analytic function is zero everywhere on an interval, then it is forced to be zero everywhere.)

The class Image[-1,1] of test functions. These are the smooth (that is, infinitely differentiable) functions f, defined on the interval [-1,1], that vanish on neighborhoods of 1 and –1. (That is, one can find δ > 0 such that f(x) = 0 whenever x > 1 - δ or x < - 1 + δ.) They are more numerous than analytic functions and therefore more tractable for analysis. For instance, it is often useful to construct smooth “cutoff functions,” which are functions that vanish outside some small set but do not vanish inside it. Also, all the operations from calculus (differentiation, integration, composition, convolution, evaluation, etc.) are available for these functions.

The class C0[-1,1] of continuous functions. These functions are regular enough for the notion of evaluation, x → f(x), to make sense for every x ∈ [-1,1], and one can integrate such functions and perform algebraic operations such as multiplication and composition, but they are not regular enough that operations such as differentiation can be performed on them. Still, they are usually considered among the smoother examples of functions in analysis.

The class L2[-1,1] of square-integrable functions. These are measurable functions f : [-1,1] → Image for which the Lebesgue integral Image |f(x)|2 dx is finite. Usually one regards two such functions f and g as equal if the set of x such that f(x) ≠ g(x) has measure zero. (Thus, from the set-theoretic point of view, the object in question is really an EQUIVALENCE CLASS [I.2 §2.3] of functions.) Since a singleton {x} has measure zero, we can change the value of f(x) without changing the function. Thus, the notion of evaluation does not make sense for a square-integrable function f(x) at any specific point x. However, two functions that differ on a set of measure zero have the same LEBESGUE INTEGRAL [III.55], so integration does make sense.

A key point about this class is that it is self-dual in the following sense. Any two functions in this class can be paired together by the inner product 〈f, g〉 = Image f(x)g(x) dx. Therefore, given a function g ∈ L2 [-1, 1], the map f → 〈f, g〉 defines a linear functional on L2[-1,1], which turns out to be continuous. Moreover, given any continuous linear functional φ on L2 [-1,1], there is a unique function g ∈ L2 [-1,1] such that φ(f) = 〈f, g〉 for every f. This is a special case of one of the Riesz representation theorems.

The class C0 [—1, 1]* of finite Borel measures. Any finite Borel MEASURE [III.55] µ gives rise to a continuous linear functional on c0[-1,1] defined by f (→, 〈µ, f〉 = Imagef (x) dµ. Another of the Riesz representation theorems says that every continuous linear functional on C0[-1, 1] arises in this way, so one could in principle define a finite Borel measure to be a continuous linear functional on C0[-1,1].

The class Image [—1,1]* of distributions. Just as measures can be viewed as continuous linear functionals on C0[-1,1], a distribution µ is a continuous linear functional on Image[-1,1] (with an appropriate topology). Thus, a distribution can be viewed as a “virtual function”: it cannot itself be directly evaluated, or even integrated over an open set, but it can still be paired with any test function g ∈ Image [-1,1], producing a number 〈µ, g〉. A famous example is the Dirac distribution0, defined as the functional which, when paired with any test function g, returns the evaluation g (0) of g at zero: 0, g〉 = g(0). Similarly, we have the derivative of the Dirac distribution, Image, which, when paired with any test function g, returns the derivative g’ (0) of g at zero: 〈—δ’0, g〉 = g’ (0). (The reason for the minus sign will be given later.) Since test functions have so many operations available to them, there are many ways to define continuous linear functionals, so the class of distributions is quite large. Despite this, and despite the indirect, virtual nature of distributions, one can still define many operations on them; we shall discuss this later.

The class cω [-1,1]* of hyperfunctions. There are classes of functions more general still than distributions. For instance, there are hyperfunctions, which roughly speaking one can think of as linear functionals that can be tested only against analytic functions g ∈ Cω[-1,1] rather than against test functions g ∈ C∞[-1,1]. However, as the class of analytic functions is so sparse, hyperfunctions tend not to be as useful in analysis as distributions.

At first glance, the concept of a distribution has limited utility, since all a distribution µ is empowered to do is to be tested against test functions g to produce inner products 〈µ, g〉. However, using this inner product, one can often take operations that are initially defined only on test functions, and extend them to distributions by duality. A typical example is differentiation. Suppose one wants to know how to define the derivative µ’ of a distribution, or in other words how to define 〈µ, g〉 for any test function g and distribution µ. If µ is itself a test function µ = f, then we can evaluate this using integration by parts (recalling that test functions vanish at —1 and 1). We have

Image

Image

Note that if g is a test function, then so is g’. We can therefore generalize this formula to arbitrary distributions by defining 〈µ’, g〉 = -〈µ, g’). This is the justification for the differentiation of the Dirac distribution:

Image

More formally, what we have done here is to compute the adjoint of the differentiation operation (as defined on the dense space of test functions). Then we have taken adjoints again to define the differentiation operation for general distributions. This procedure is well-defined and also works for many other concepts; for instance, one can add two distributions, multiply a distribution by a smooth function, convolve two distributions, and compose distributions on both the left and the right with suitably smooth functions. One can even take Fourier transforms of distributions. For instance, the Fourier transform of the Dirac delta δ0 is the constant function 1, and vice versa (this is essentially the Fourier inversion formula), while the distribution Image δ0 (x-n) is its own Fourier transform (this is essentially the Poisson summation formula). Thus, the space of distributions is quite a good space to work in, in that it contains a large class of functions (e.g., all measures and integrable functions), and is also closed under a large number of common operations in analysis. Because the test functions are dense in the space of distributions, the operations as defined on distributions are usually compatible with those on test functions. For instance, if f and g are test functions and f’ = g in the sense of distributions, then f’ = g will also be true in the classical sense. This often allows one to manipulate distributions as if they were test functions without fear of confusion or inaccuracy. The main operations one has to be careful about are evaluation and pointwise multiplication of distributions, both of which are usually not well-defined (e.g., the square of the Dirac delta distribution is not well-defined as a distribution).

Another way to view distributions is as the weak limit of test functions. A sequence of functions fn is said to converge weakly to a distribution µ if 〈fn, g〉 → 〈µg〉 for all test functions g. For instance, if ϕ is a test function with total integral Image ϕ = 1, then the test functions fn (x) = nϕ(nx) can be shown to converge weakly to the Dirac delta distribution δ0, while the functions Image (x) = n2ϕ’(nx) converge weakly to the derivative Image of the Dirac delta. On the other hand, the functions gn(x) = cos(nx)ϕ(x) converge weakly to zero (this is a variant of the Riemann-Lebesgue lemma). Thus weak convergence has some unusual features not present in stronger notions of convergence, in that severe oscillations can sometimes “disappear” in the limit. One advantage of working with distributions instead of smoother functions is that one often has some compactness in the space of distributions under weak limits (e.g., by the Banach-Alaoglu theorem). Thus, distributions can be thought of as asymptotic extremes of behavior of smoother functions, just as real numbers can be thought of as limits of rational numbers.

Because distributions can be easily differentiated, while still being closely connected to smoother functions, they have been extremely useful in the study of partial differential equations (PDEs), particularly when the equations are linear. For instance, the general solution to a linear PDE can often be described in terms of its fundamental solution, which solves the PDE in the sense of distributions. More generally, distribution theory (together with related concepts, such as that of a weak derivative) gives an important (though certainly not the only) means to define generalized solutions of both linear and nonlinear PDEs. As the name suggests, these generalize the concept of smooth (or classical) solutions by allowing the formation of singularities, shocks, and other nonsmooth behavior. In some cases the easiest way to construct a smooth solution to a PDE is first to construct a generalized solution and then to use additional arguments to show that the generalized solution is in fact smooth.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset