4.3 Characteristic function
Fourier analysis is indeed sometimes performed for real functions only, which requires dealing with the real and imaginary parts of \(\mathrm e^{-\mathrm i\xi \cdot x}\) separately, resulting in complicated formulas involving sines and cosines. This approach leads to a hot complicated mess, which makes everything harder without any advantages to make up for it.
The main idea behind Fourier analysis is that any function can be represented as a superposition of plane waves and the corresponding coefficients are explicitly computable. This is very plainly illustrated in the following finite-dimensional setting. For each \(N \in \mathbb{N}^*,\) define the discrete cube \[\Lambda :=\{0, 1, \dots, N-1\}^d\] and the dual cube \[\Lambda^* :=\frac{2 \pi}{N} \Lambda\,.\] Consider the finite-dimensional complex Hilbert spaces \(V :=\mathbb{C}^{\Lambda}\) and \(V^* :=\mathbb{C}^{\Lambda^*}.\) We use the notations \(f = (f(x))_{x \in \Lambda} \in V\) and \(f = (f(\xi))_{\xi \in \Lambda^*} \in V^*\) for vectors in these spaces. They carry the complex inner products \[\langle f \mspace{2mu}, g\rangle_{V} :=\sum_{x \in \Lambda} \overline{f(x)} \!\, g(x)\,, \qquad \langle f \mspace{2mu}, g\rangle_{V^*} :=\sum_{\xi \in \Lambda^*} \overline{f(\xi)} \!\, g(\xi)\,.\]
For any \(\xi \in \Lambda^*\) we define the vector \(e_\xi \in V\) as the normalized plane wave \[e_\xi(x) :=\frac{1}{N^{d/2}} \, \mathrm e^{- \mathrm i\xi \cdot x}\,.\] Now the truly wonderful fact is that the family \((e_\xi)_{\xi \in \Lambda^*}\) is an orthonormal basis of \(V\)! I strongly recommend that you check this carefully; it is a simple exercise using finite geometric series.
The Fourier transform of a vector \(f \in V\) is the vector \(\widehat{f} \in V^*\) defined by \[\tag{4.5} \widehat{f}(\xi) :=\langle e_\xi \mspace{2mu}, f\rangle\,.\] In other words, Fourier transformation is nothing but a change of basis from one orthonormal basis (the standard basis of \(\mathbb{C}^{\Lambda}\)) to another orthonormal basis (the basis \((e_\xi)\)). Hence, we can write \(f\) as a superposition of plane waves, \[\tag{4.6} f = \sum_{\xi \in \Lambda^*} \widehat{f}(\xi) \, e_\xi\,.\] The relations (4.5) and (4.6) can be explicitely written as \[\tag{4.7} \widehat{f}(\xi) = \frac{1}{N^{d/2}} \sum_{x \in \Lambda} \mathrm e^{\mathrm i\xi \cdot x} \, f(x)\,, \qquad f(x) = \frac{1}{N^{d/2}} \sum_{\xi \in \Lambda^*} \mathrm e^{-\mathrm i\xi \cdot x} \, \widehat{f}(\xi)\,,\] respectively. The former is usually called the Fourier transform and the latter the inverse Fourier transform. Remarkably, they have almost exactly the same form (up to the sign of the argument).
Summarising, Fourier transformation can be viewed as simply a change of orthonormal basis. This is somewhat complicated by the fact that, as in this class, it is often applied in infinite dimensions, which leads to analytic complications (see e.g. the precise statement of Lemma 4.16 below, as well as Remark 4.17 for a simplified formulation under stronger analytic assumptions). It is a tremendously useful tool for many reasons. One such reason is that it diagonalises all differential operators (to see why, you can immediately check that differentiating a plane wave \(\mathrm e^{-\mathrm i\xi \cdot x}\) gives \(-\mathrm i\xi\) times the same plane wave, so that a plane wave is an eigenfunction of the derivative operator). As a consequence, it is the most important and celebrated tool in all of analysis, upon which basically the entire modern theory of partial differential equations is founded. In this section we shall see other remarkable properties that make it particularly useful in probability theory. For another application, see Example 5.21 below.
Let us now bring this introductory digression to a close and return to probability theory. We begin with the following definition.
Let \(X\) be a real-valued random variable. Define the characteristic function of \(X\), denoted by \(\Phi_X \,\colon\mathbb{R}^d \to \mathbb{C},\) as the Fourier transform of its law \(\mathbb{P}_X.\) That is, \[\Phi_X(\xi) = \widehat{\mathbb{P}}_X(\xi) = \int \mathrm e^{\mathrm i\xi \cdot x} \, \mathbb{P}_X(\mathrm dx) = \mathbb{E}[\mathrm e^{\mathrm i\xi \cdot X}]\,.\]
By dominated convergence, \(\Phi_X \in C_b(\mathbb{R}^d).\)
The most important observation in all of Fourier analysis is the following computation for a Gaussian. For \(\sigma > 0,\) define \[\tag{4.8} g_\sigma(x) :=\frac{1}{\sigma \sqrt{2 \pi}}\,\mathrm e^{-\frac{x^2}{2 \sigma^2}}\,,\] the density of the Gaussian law with mean zero and variance \(\sigma^2.\)
Let \(X \in \mathbb{R}\) be a Gaussian random variable with law \(g_\sigma(x) \, \mathrm dx.\) Then \[\Phi_X(\xi) = \mathrm e^{-\frac{\sigma^2}{2} \xi^2}\,.\]
Proof. By definition, \[\Phi_X(\xi) = \int \frac{1}{\sigma \sqrt{2 \pi}} \, \mathrm e^{-\frac{x^2}{2 \sigma^2}}\, \mathrm e^{\mathrm i\xi x}\, \mathrm dx\,.\] By the change of variables \(x \mapsto \sigma x,\) we may suppose that \(\sigma = 1\) and compute \[f(\xi) :=\int \frac{1}{\sqrt{2 \pi}} \, \mathrm e^{-\frac{x^2}{2}}\, \mathrm e^{\mathrm i\xi x}\, \mathrm dx\,.\] Differentiating under the integral and then integrating by parts, we find \[\begin{aligned} f'(\xi) &= \int \frac{1}{\sqrt{2 \pi}} \, \mathrm e^{-\frac{x^2}{2}}\, \mathrm ix\, \mathrm e^{\mathrm i\xi x}\, \mathrm dx \\ &= \int \frac{1}{\sqrt{2 \pi}} \, (- \mathrm i) \partial_x \Bigl(\mathrm e^{-\frac{x^2}{2}}\Bigr)\, \mathrm e^{\mathrm i\xi x}\, \mathrm dx \\ &= \int \frac{1}{\sqrt{2 \pi}} \, (- 1) \Bigl(\mathrm e^{-\frac{x^2}{2}}\Bigr)\, \xi \, \mathrm e^{\mathrm i\xi x}\, \mathrm dx \\ &= -\xi f(\xi)\,. \end{aligned}\] Thus, \(f\) satisfies the ordinary differential equation \[\begin{cases} f(0)= 1 \\ f'(\xi) = - \xi f(\xi)\,. \end{cases}\] As seen in analysis (since \(f'\) is a Lipschitz continuous function of \(f\)), this equation has a unique solution, \(f(\xi) = \mathrm e^{-\frac{\xi^2}{2}}.\)
Thanks to the preceding computation, we can invert the Fourier transform in the following sense. For simplicity, set \(d = 1\); the case \(d > 1\) is done in exactly the same way.
Note that the function \(g_\sigma\) is (the density of) an approximate delta function (recall Example 4.8 (iii)).
For any finite complex measure \(\mu\) on \(\mathbb{R},\) we have \[\tag{4.10} f_\sigma(x) = \frac{1}{2 \pi} \int \mathrm e^{-\mathrm i\xi x} \, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \widehat{\mu}(\xi) \, \mathrm d\xi\,.\]
Proof. By Proposition 4.15 with \(\sigma\) replaced by \(1/\sigma,\) we have \[\sigma \sqrt{2 \pi } g_\sigma(x) = \mathrm e^{-\frac{x^2}{2 \sigma^2}} = \int \mathrm e^{\mathrm i\xi x}\, g_{1/\sigma}(\xi) \, \mathrm d\xi\,.\] Hence, \[\begin{aligned} f_\sigma(x) &= \int g_\sigma(x - y) \, \mu(\mathrm dy) \\ &= \frac{1}{\sigma \sqrt{2 \pi}} \int \int \mathrm e^{\mathrm i\xi (x - y)}\, g_{1/\sigma}(\xi) \, \mathrm d\xi \, \mu(\mathrm dy) \\ &= \frac{1}{2 \pi} \int \int \mathrm e^{\mathrm i\xi (x - y)}\, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \mathrm d\xi \, \mu(\mathrm dy) \\ &= \frac{1}{2 \pi} \int \mathrm e^{\mathrm i\xi x}\, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \int \mathrm e^{-\mathrm i\xi y} \mu(\mathrm dy) \, \mathrm d\xi \\ &= \frac{1}{2 \pi} \int \mathrm e^{\mathrm i\xi x}\, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \widehat{\mu}(-\xi)\, \mathrm d\xi\,, \end{aligned}\] where in the fourth step we used Fubini’s theorem. The claim follows by the change of variables \(\xi \mapsto -\xi.\)
If the measure \(\mu\) is sufficiently regular, then the Fourier inversion formula takes on a simpler form because one can take the limit \(\sigma \to 0\) and hence get rid of the mollifiers \(g_\sigma.\) Suppose that \(\mu(\mathrm dx) = f(x) \, \mathrm dx\) has a continuous density \(f\) that also satisfies \(\widehat{f} :=\widehat{\mu} \in L^1.\) (The latter condition is true provided that \(f\) is smooth enough.) Then by taking \(\sigma \to 0\) in (4.10), using Example 4.8 (iii) on the left-hand side and dominated convergence on the right-hand side, we find the Fourier inversion formula for regular functions \[f(x) = \frac{1}{2 \pi} \int \mathrm e^{- \mathrm i\xi x} \, \widehat{f}(\xi) \, \mathrm d\xi\,,\] where we recall that the Fourier transformation is given by \[\widehat{f}(\xi) = \int \mathrm e^{\mathrm i\xi x} \, f(x) \, \mathrm dx\,.\] Therefore, inverse Fourier transformation is, up to a sign in the argument, simply Fourier transformation itself! Compare these expressions to the finite-dimensional ones from (4.7).
The characteristic function provides yet another, extremely useful, equivalent criterion for convergence in law of random variables (to complement Propositions 4.11 and 4.12) – pointwise convergence of the characteristic function.
Let \(\mu_n\) and \(\mu\) be probability measures on \(\mathbb{R}^d.\) Then \(\mu_n \overset{\mathrm w}{\longrightarrow}\mu\) if and only if \(\widehat{\mu}_n(\xi) \to \widehat{\mu}(\xi)\) for all \(\xi \in \mathbb{R}^d.\)
Proof. The “only if” implication is obvious by definition of weak convergence, since the real and imaginary parts of the function \(x \mapsto \mathrm e^{\mathrm i\xi \cdot x}\) are continuous and bounded for all \(x \in \mathbb{R}^d.\)
To prove the “if” implication, we again suppose for simplicity that \(d = 1\) (the case \(d > 1\) is very similar). Suppose therefore that \(\widehat{\mu}_n(\xi) \to \widehat{\mu}(\xi)\) for all \(\xi \in \mathbb{R}^d.\) For \(\varphi \in C_c(\mathbb{R})\) we have, by Fubini’s theorem, \[\int g_\sigma * \varphi \, \mathrm d\mu = \int \varphi(x) \, (g_\sigma * \mu) (x) \, \mathrm dx\,.\] The function \(g_\sigma * \mu\) is simply (4.9), so that Lemma 4.16 yields \[\int g_\sigma * \varphi \, \mathrm d\mu = \int \varphi(x) \, \frac{1}{2 \pi} \int \mathrm e^{-\mathrm i\xi x} \, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \widehat{\mu}(\xi) \, \mathrm d\xi \, \mathrm dx\,.\] An analogous formula holds for \(\mu_n.\) By dominated convergence, for any \(\sigma > 0\) we have \[\int \mathrm e^{-\mathrm i\xi x} \, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \widehat{\mu}_n(\xi) \, \mathrm d\xi \longrightarrow \int \mathrm e^{-\mathrm i\xi x} \, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \widehat{\mu}(\xi) \, \mathrm d\xi\] as \(n \to \infty\) for all \(x,\) so that another application of dominated convergence (to the integral over \(x\)) yields, for all \(\varphi \in C_c,\) \[\tag{4.11} \int g_\sigma * \varphi \, \mathrm d\mu_n \longrightarrow \int g_\sigma * \varphi \, \mathrm d\mu\] as \(n \to \infty.\)
To conclude the argument, we define the space of functions \[H :=\{g_\sigma * \varphi \,\colon\sigma > 0, \varphi \in C_c\}\,.\] If we can prove that the closure of \(H\) under \(\lVert \cdot \rVert_{\infty}\) contains \(C_c,\) then the proof will be complete by applying Proposition 4.11 to (4.11).
What remains, therefore, is to prove that the closure of \(H\) under \(\lVert \cdot \rVert_{\infty}\) contains \(C_c.\) To that end, choose \(\varphi \in C_c\) and estimate \[\begin{aligned} \lVert g_\sigma * \varphi - \varphi \rVert_\infty &= \sup_x \biggl\lvert \int \frac{1}{\sigma \sqrt{2 \pi}} \, \mathrm e^{-\frac{y^2}{2 \sigma^2}} \bigl(\varphi(x - y) - \varphi(x)\bigr)\, \mathrm dy \biggr\rvert \\ &= \sup_x \biggl\lvert \int \frac{1}{\sqrt{2 \pi}} \, \mathrm e^{-\frac{y^2}{2}} \bigl(\varphi(x - \sigma y) - \varphi(x)\bigr)\, \mathrm dy \biggr\rvert\,. \end{aligned}\] Now let \(\varepsilon> 0\) and choose \(K > 0\) such that \[\int_{\lvert y \rvert > K} \frac{1}{\sqrt{2 \pi}} \, \mathrm e^{-\frac{y^2}{2}} \, \mathrm dy \leqslant\frac{\varepsilon}{\lVert \varphi \rVert_\infty}\,.\] Splitting the \(y\)-integration into \(\lvert y \rvert \leqslant K\) and \(\lvert y \rvert > K,\) we conclude that \[\lVert g_\sigma * \varphi - \varphi \rVert_\infty \leqslant \sup_x \biggl\lvert \int_{\lvert y \rvert \leqslant K} \frac{1}{\sqrt{2 \pi}} \, \mathrm e^{-\frac{y^2}{2}} \bigl(\varphi(x - \sigma y) - \varphi(x)\bigr)\, \mathrm dy \biggr\rvert + 2 \varepsilon\,.\] On the support of the integral, the vector \(\sigma y\) has norm bounded by \(\sigma K,\) so that by uniform continuity of \(\varphi\) we deduce that the right-hand side converges to \(2 \varepsilon\) as \(\sigma \to 0.\) This concludes the proof.
4.4 The central limit theorem
Another, perhaps more pragmatic, justification is that, if one does not know the distribution of a random variable one is considering, we have no choice but to guess, and the Gaussian is a particularly convenient guess. Even if this is not correct, in many applications the Gaussian is a good enough approximation.
As a consequence, some very complicated systems admit a remarkably simple emergent effective description on large scales, although the full analysis of their individual components is hopelessly complicated. An example is the derivation of the emergent laws of hydrodynamics from a microscopic theory of matter. This idea is also famously at the core of Isaac Asimov’s Foundation trilogy.
Let \(X_1, X_2, \dots\) be a sequence of independent identically distributed real-valued random variables in \(L^1.\) The strong law of large numbers states that \[\frac{1}{n}(X_1 + \cdots + X_n) \longrightarrow \mathbb{E}[X_1]\] almost surely as \(n \to \infty.\) It is natural to ask how fast this convergence takes place, i.e. what is the typical size, or scale, of \(\frac{1}{n}(X_1 + \cdots + X_n) - \mathbb{E}[X_1],\) as a function of \(n.\)
For \(X_1 \in L^2,\) the answer is easy. Indeed, since \[\mathbb{E}\bigl[(X_1 + \cdots + X_n - n \mathbb{E}[X_1])^2\bigr] = \mathop{\mathrm{Var}}(X_1 + \cdots + X_n) = n \mathop{\mathrm{Var}}(X_1)\,,\] we find that \[\tag{4.12} \frac{1}{\sqrt{n}} (X_1 + \cdots + X_n - n \mathbb{E}[X_1])\] is typically of order one (since the expectation of its square is equal to \(\mathop{\mathrm{Var}}(X_1),\) which does not depend on \(n\)).
The central limit theorem is a more precise version of this observation, as it even identifies the limiting law of (4.12).
Let \(X_1, X_2, \dots\) be independent identically distributed random variables in \(L^2,\) with variance \(\sigma^2.\) Then, as \(n \to \infty,\) the quantity (4.12) converges in law to a Gaussian random variable with mean zero and variance \(\sigma^2.\)
Proof. Using the technology of characteristic functions developed in the previous section, the proof is remarkably straightforward. First, without loss of generality we may suppose that \(\mathbb{E}[X_1] = 0\) (otherwise just replace \(X_n\) with \(X_n - \mathbb{E}[X_n]\)).
Here we recall the “little-o” notation for some complex-valued function \(f\) and nonnegative function \(g\): “\(f(\xi) = o(g(\xi))\) as \(\xi \to 0\)” means that \(\lim_{\xi \to 0} \frac{f(\xi)}{g(\xi)} = 0\); informally: “\(f\) is much smaller than \(g\)”. Contrast this to the “big-O” notation: “\(f(\xi) = O(g(\xi))\)” means that \(\frac{\lvert f(\xi) \rvert}{g(\xi)} \leqslant C\) for some constant \(C\) independent of \(\xi\); informally: “\(f\) is not much larger than \(g\)”.
With \(Z_n :=\frac{X_1 + \cdots + X_n}{\sqrt{n}}\) we have, by independence of the variables \(X_1, \dots, X_n,\) \[\Phi_{Z_n}(\xi) = \mathbb{E}\biggl[\exp \biggl(\mathrm i\xi \frac{X_1 + \cdots + X_n}{\sqrt{n}}\biggr)\biggr] = \mathbb{E}[\exp(\mathrm i\xi X_1 / \sqrt{n})]^n = \Phi_{X_1}(\xi / \sqrt{n})^n\,.\] By (4.13), we therefore get, for any \(\xi \in \mathbb{R},\) \[\Phi_{Z_n}(\xi) = \biggl(1 - \frac{\sigma^2 \xi^2}{2 n} + o\biggl(\frac{\xi^2}{n}\biggr)\biggr)^n \longrightarrow \mathrm e^{-\frac{\sigma^2}{2} \xi^2}\] as \(n \to \infty.\) The claim now follows from Propositions 4.15 and 4.18.