4.3 Characteristic function

The term characteristic function is used in probability theory to denote the Fourier transform of a law. As we shall see, it is a beautiful and incredibly powerful tool. Before giving the precise definition, for your general mathematical culture it is helpful to review the key ideas and definitions of Fourier analysis. For any \(\xi \in \mathbb{R}^d,\) we define the plane wave to be the function \(\mathbb{R}^d \to \mathbb{C}\) defined by \[x \mapsto \mathrm e^{-\mathrm i\xi \cdot x}\,.\] To understand the term of plane wave, you can simply decompose \(\mathrm e^{-\mathrm i\xi \cdot x}\) into its real and imaginary parts15 and plot these as a function of \(x\) (for instance for \(d = 2\)): you will see a series of parallel waves, like ones at open sea far from the shore.

The main idea behind Fourier analysis is that any function can be represented as a superposition of plane waves and the corresponding coefficients are explicitly computable. This is very plainly illustrated in the following finite-dimensional setting. For each \(N \in \mathbb{N}^*,\) define the discrete cube \[\Lambda :=\{0, 1, \dots, N-1\}^d\] and the dual cube \[\Lambda^* :=\frac{2 \pi}{N} \Lambda\,.\] Consider the finite-dimensional complex Hilbert spaces \(V :=\mathbb{C}^{\Lambda}\) and \(V^* :=\mathbb{C}^{\Lambda^*}.\) We use the notations \(f = (f(x))_{x \in \Lambda} \in V\) and \(f = (f(\xi))_{\xi \in \Lambda^*} \in V^*\) for vectors in these spaces. They carry the complex inner products \[\langle f \mspace{2mu}, g\rangle_{V} :=\sum_{x \in \Lambda} \overline{f(x)} \!\, g(x)\,, \qquad \langle f \mspace{2mu}, g\rangle_{V^*} :=\sum_{\xi \in \Lambda^*} \overline{f(\xi)} \!\, g(\xi)\,.\]

For any \(\xi \in \Lambda^*\) we define the vector \(e_\xi \in V\) as the normalized plane wave \[e_\xi(x) :=\frac{1}{N^{d/2}} \, \mathrm e^{- \mathrm i\xi \cdot x}\,.\] Now the truly wonderful fact is that the family \((e_\xi)_{\xi \in \Lambda^*}\) is an orthonormal basis of \(V\)! I strongly recommend that you check this carefully; it is a simple exercise using finite geometric series.

The Fourier transform of a vector \(f \in V\) is the vector \(\widehat{f} \in V^*\) defined by \[\tag{4.5} \widehat{f}(\xi) :=\langle e_\xi \mspace{2mu}, f\rangle\,.\] In other words, Fourier transformation is nothing but a change of basis from one orthonormal basis (the standard basis of \(\mathbb{C}^{\Lambda}\)) to another orthonormal basis (the basis \((e_\xi)\)). Hence, we can write \(f\) as a superposition of plane waves, \[\tag{4.6} f = \sum_{\xi \in \Lambda^*} \widehat{f}(\xi) \, e_\xi\,.\] The relations (4.5) and (4.6) can be explicitely written as \[\tag{4.7} \widehat{f}(\xi) = \frac{1}{N^{d/2}} \sum_{x \in \Lambda} \mathrm e^{\mathrm i\xi \cdot x} \, f(x)\,, \qquad f(x) = \frac{1}{N^{d/2}} \sum_{\xi \in \Lambda^*} \mathrm e^{-\mathrm i\xi \cdot x} \, \widehat{f}(\xi)\,,\] respectively. The former is usually called the Fourier transform and the latter the inverse Fourier transform. Remarkably, they have almost exactly the same form (up to the sign of the argument).

Summarising, Fourier transformation can be viewed as simply a change of orthonormal basis. This is somewhat complicated by the fact that, as in this class, it is often applied in infinite dimensions, which leads to analytic complications (see e.g. the precise statement of Lemma 4.16 below, as well as Remark 4.17 for a simplified formulation under stronger analytic assumptions). It is a tremendously useful tool for many reasons. One such reason is that it diagonalises all differential operators (to see why, you can immediately check that differentiating a plane wave \(\mathrm e^{-\mathrm i\xi \cdot x}\) gives \(-\mathrm i\xi\) times the same plane wave, so that a plane wave is an eigenfunction of the derivative operator). As a consequence, it is the most important and celebrated tool in all of analysis, upon which basically the entire modern theory of partial differential equations is founded. In this section we shall see other remarkable properties that make it particularly useful in probability theory. For another application, see Example 5.21 below.

Let us now bring this introductory digression to a close and return to probability theory. We begin with the following definition.

Definition 4.14

  1. Let \(X\) be a real-valued random variable. Define the characteristic function of \(X\), denoted by \(\Phi_X \,\colon\mathbb{R}^d \to \mathbb{C},\) as the Fourier transform of its law \(\mathbb{P}_X.\) That is, \[\Phi_X(\xi) = \widehat{\mathbb{P}}_X(\xi) = \int \mathrm e^{\mathrm i\xi \cdot x} \, \mathbb{P}_X(\mathrm dx) = \mathbb{E}[\mathrm e^{\mathrm i\xi \cdot X}]\,.\]

By dominated convergence, \(\Phi_X \in C_b(\mathbb{R}^d).\)

The most important observation in all of Fourier analysis is the following computation for a Gaussian. For \(\sigma > 0,\) define \[\tag{4.8} g_\sigma(x) :=\frac{1}{\sigma \sqrt{2 \pi}}\,\mathrm e^{-\frac{x^2}{2 \sigma^2}}\,,\] the density of the Gaussian law with mean zero and variance \(\sigma^2.\)

Proposition 4.15

Let \(X \in \mathbb{R}\) be a Gaussian random variable with law \(g_\sigma(x) \, \mathrm dx.\) Then \[\Phi_X(\xi) = \mathrm e^{-\frac{\sigma^2}{2} \xi^2}\,.\]

Proof. By definition, \[\Phi_X(\xi) = \int \frac{1}{\sigma \sqrt{2 \pi}} \, \mathrm e^{-\frac{x^2}{2 \sigma^2}}\, \mathrm e^{\mathrm i\xi x}\, \mathrm dx\,.\] By the change of variables \(x \mapsto \sigma x,\) we may suppose that \(\sigma = 1\) and compute \[f(\xi) :=\int \frac{1}{\sqrt{2 \pi}} \, \mathrm e^{-\frac{x^2}{2}}\, \mathrm e^{\mathrm i\xi x}\, \mathrm dx\,.\] Differentiating under the integral and then integrating by parts, we find \[\begin{aligned} f'(\xi) &= \int \frac{1}{\sqrt{2 \pi}} \, \mathrm e^{-\frac{x^2}{2}}\, \mathrm ix\, \mathrm e^{\mathrm i\xi x}\, \mathrm dx \\ &= \int \frac{1}{\sqrt{2 \pi}} \, (- \mathrm i) \partial_x \Bigl(\mathrm e^{-\frac{x^2}{2}}\Bigr)\, \mathrm e^{\mathrm i\xi x}\, \mathrm dx \\ &= \int \frac{1}{\sqrt{2 \pi}} \, (- 1) \Bigl(\mathrm e^{-\frac{x^2}{2}}\Bigr)\, \xi \, \mathrm e^{\mathrm i\xi x}\, \mathrm dx \\ &= -\xi f(\xi)\,. \end{aligned}\] Thus, \(f\) satisfies the ordinary differential equation \[\begin{cases} f(0)= 1 \\ f'(\xi) = - \xi f(\xi)\,. \end{cases}\] As seen in analysis (since \(f'\) is a Lipschitz continuous function of \(f\)), this equation has a unique solution, \(f(\xi) = \mathrm e^{-\frac{\xi^2}{2}}.\)

Thanks to the preceding computation, we can invert the Fourier transform in the following sense. For simplicity, set \(d = 1\); the case \(d > 1\) is done in exactly the same way.

Since the measure \(\mu\) can be quite rough (it need not have a density), it is very helpful to mollify17 it by convolving (recall Definition 3.21 and Remark 3.22) it with the smooth function (4.8). This convolution has density \[\tag{4.9} f_\sigma(x) :=\int g_\sigma(x - y) \, \mu(\mathrm dy)\,.\]
Lemma 4.16 • Fourier inversion formula for measures

For any finite complex measure \(\mu\) on \(\mathbb{R},\) we have \[\tag{4.10} f_\sigma(x) = \frac{1}{2 \pi} \int \mathrm e^{-\mathrm i\xi x} \, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \widehat{\mu}(\xi) \, \mathrm d\xi\,.\]

Proof. By Proposition 4.15 with \(\sigma\) replaced by \(1/\sigma,\) we have \[\sigma \sqrt{2 \pi } g_\sigma(x) = \mathrm e^{-\frac{x^2}{2 \sigma^2}} = \int \mathrm e^{\mathrm i\xi x}\, g_{1/\sigma}(\xi) \, \mathrm d\xi\,.\] Hence, \[\begin{aligned} f_\sigma(x) &= \int g_\sigma(x - y) \, \mu(\mathrm dy) \\ &= \frac{1}{\sigma \sqrt{2 \pi}} \int \int \mathrm e^{\mathrm i\xi (x - y)}\, g_{1/\sigma}(\xi) \, \mathrm d\xi \, \mu(\mathrm dy) \\ &= \frac{1}{2 \pi} \int \int \mathrm e^{\mathrm i\xi (x - y)}\, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \mathrm d\xi \, \mu(\mathrm dy) \\ &= \frac{1}{2 \pi} \int \mathrm e^{\mathrm i\xi x}\, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \int \mathrm e^{-\mathrm i\xi y} \mu(\mathrm dy) \, \mathrm d\xi \\ &= \frac{1}{2 \pi} \int \mathrm e^{\mathrm i\xi x}\, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \widehat{\mu}(-\xi)\, \mathrm d\xi\,, \end{aligned}\] where in the fourth step we used Fubini’s theorem. The claim follows by the change of variables \(\xi \mapsto -\xi.\)

Remark 4.17

If the measure \(\mu\) is sufficiently regular, then the Fourier inversion formula takes on a simpler form because one can take the limit \(\sigma \to 0\) and hence get rid of the mollifiers \(g_\sigma.\) Suppose that \(\mu(\mathrm dx) = f(x) \, \mathrm dx\) has a continuous density \(f\) that also satisfies \(\widehat{f} :=\widehat{\mu} \in L^1.\) (The latter condition is true provided that \(f\) is smooth enough.) Then by taking \(\sigma \to 0\) in (4.10), using Example 4.8 (iii) on the left-hand side and dominated convergence on the right-hand side, we find the Fourier inversion formula for regular functions \[f(x) = \frac{1}{2 \pi} \int \mathrm e^{- \mathrm i\xi x} \, \widehat{f}(\xi) \, \mathrm d\xi\,,\] where we recall that the Fourier transformation is given by \[\widehat{f}(\xi) = \int \mathrm e^{\mathrm i\xi x} \, f(x) \, \mathrm dx\,.\] Therefore, inverse Fourier transformation is, up to a sign in the argument, simply Fourier transformation itself! Compare these expressions to the finite-dimensional ones from (4.7).

The characteristic function provides yet another, extremely useful, equivalent criterion for convergence in law of random variables (to complement Propositions 4.11 and 4.12) – pointwise convergence of the characteristic function.

Proposition 4.18

Let \(\mu_n\) and \(\mu\) be probability measures on \(\mathbb{R}^d.\) Then \(\mu_n \overset{\mathrm w}{\longrightarrow}\mu\) if and only if \(\widehat{\mu}_n(\xi) \to \widehat{\mu}(\xi)\) for all \(\xi \in \mathbb{R}^d.\)

Proof. The “only if” implication is obvious by definition of weak convergence, since the real and imaginary parts of the function \(x \mapsto \mathrm e^{\mathrm i\xi \cdot x}\) are continuous and bounded for all \(x \in \mathbb{R}^d.\)

To prove the “if” implication, we again suppose for simplicity that \(d = 1\) (the case \(d > 1\) is very similar). Suppose therefore that \(\widehat{\mu}_n(\xi) \to \widehat{\mu}(\xi)\) for all \(\xi \in \mathbb{R}^d.\) For \(\varphi \in C_c(\mathbb{R})\) we have, by Fubini’s theorem, \[\int g_\sigma * \varphi \, \mathrm d\mu = \int \varphi(x) \, (g_\sigma * \mu) (x) \, \mathrm dx\,.\] The function \(g_\sigma * \mu\) is simply (4.9), so that Lemma 4.16 yields \[\int g_\sigma * \varphi \, \mathrm d\mu = \int \varphi(x) \, \frac{1}{2 \pi} \int \mathrm e^{-\mathrm i\xi x} \, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \widehat{\mu}(\xi) \, \mathrm d\xi \, \mathrm dx\,.\] An analogous formula holds for \(\mu_n.\) By dominated convergence, for any \(\sigma > 0\) we have \[\int \mathrm e^{-\mathrm i\xi x} \, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \widehat{\mu}_n(\xi) \, \mathrm d\xi \longrightarrow \int \mathrm e^{-\mathrm i\xi x} \, \mathrm e^{-\frac{\sigma^2}{2} \xi^2} \, \widehat{\mu}(\xi) \, \mathrm d\xi\] as \(n \to \infty\) for all \(x,\) so that another application of dominated convergence (to the integral over \(x\)) yields, for all \(\varphi \in C_c,\) \[\tag{4.11} \int g_\sigma * \varphi \, \mathrm d\mu_n \longrightarrow \int g_\sigma * \varphi \, \mathrm d\mu\] as \(n \to \infty.\)

To conclude the argument, we define the space of functions \[H :=\{g_\sigma * \varphi \,\colon\sigma > 0, \varphi \in C_c\}\,.\] If we can prove that the closure of \(H\) under \(\lVert \cdot \rVert_{\infty}\) contains \(C_c,\) then the proof will be complete by applying Proposition 4.11 to (4.11).

What remains, therefore, is to prove that the closure of \(H\) under \(\lVert \cdot \rVert_{\infty}\) contains \(C_c.\) To that end, choose \(\varphi \in C_c\) and estimate \[\begin{aligned} \lVert g_\sigma * \varphi - \varphi \rVert_\infty &= \sup_x \biggl\lvert \int \frac{1}{\sigma \sqrt{2 \pi}} \, \mathrm e^{-\frac{y^2}{2 \sigma^2}} \bigl(\varphi(x - y) - \varphi(x)\bigr)\, \mathrm dy \biggr\rvert \\ &= \sup_x \biggl\lvert \int \frac{1}{\sqrt{2 \pi}} \, \mathrm e^{-\frac{y^2}{2}} \bigl(\varphi(x - \sigma y) - \varphi(x)\bigr)\, \mathrm dy \biggr\rvert\,. \end{aligned}\] Now let \(\varepsilon> 0\) and choose \(K > 0\) such that \[\int_{\lvert y \rvert > K} \frac{1}{\sqrt{2 \pi}} \, \mathrm e^{-\frac{y^2}{2}} \, \mathrm dy \leqslant\frac{\varepsilon}{\lVert \varphi \rVert_\infty}\,.\] Splitting the \(y\)-integration into \(\lvert y \rvert \leqslant K\) and \(\lvert y \rvert > K,\) we conclude that \[\lVert g_\sigma * \varphi - \varphi \rVert_\infty \leqslant \sup_x \biggl\lvert \int_{\lvert y \rvert \leqslant K} \frac{1}{\sqrt{2 \pi}} \, \mathrm e^{-\frac{y^2}{2}} \bigl(\varphi(x - \sigma y) - \varphi(x)\bigr)\, \mathrm dy \biggr\rvert + 2 \varepsilon\,.\] On the support of the integral, the vector \(\sigma y\) has norm bounded by \(\sigma K,\) so that by uniform continuity of \(\varphi\) we deduce that the right-hand side converges to \(2 \varepsilon\) as \(\sigma \to 0.\) This concludes the proof.

4.4 The central limit theorem

The central limit theorem is, together with the law of large numbers, the second most fundamental result in probability. It states that the sum of a large number of independent identically distributed random variables has approximately a Gaussian distribution, no matter what the distribution of these variables is. This provides at least a partial theoretical justification18 for the ubiquity of the Gaussian distribution in probability and statistics. This represents the first instance of a remarkable phenomenon in probability and statistical physics called universality: if you take a complicated system made up of many small parts, the behaviour of the system on large scales is universal in the sense that it does not depend on the details of the individual parts19. In this instance, the universal behaviour is the Gaussian distribution of the sum, no matter the distribution of the individual random variables.

Let \(X_1, X_2, \dots\) be a sequence of independent identically distributed real-valued random variables in \(L^1.\) The strong law of large numbers states that \[\frac{1}{n}(X_1 + \cdots + X_n) \longrightarrow \mathbb{E}[X_1]\] almost surely as \(n \to \infty.\) It is natural to ask how fast this convergence takes place, i.e. what is the typical size, or scale, of \(\frac{1}{n}(X_1 + \cdots + X_n) - \mathbb{E}[X_1],\) as a function of \(n.\)

For \(X_1 \in L^2,\) the answer is easy. Indeed, since \[\mathbb{E}\bigl[(X_1 + \cdots + X_n - n \mathbb{E}[X_1])^2\bigr] = \mathop{\mathrm{Var}}(X_1 + \cdots + X_n) = n \mathop{\mathrm{Var}}(X_1)\,,\] we find that \[\tag{4.12} \frac{1}{\sqrt{n}} (X_1 + \cdots + X_n - n \mathbb{E}[X_1])\] is typically of order one (since the expectation of its square is equal to \(\mathop{\mathrm{Var}}(X_1),\) which does not depend on \(n\)).

The central limit theorem is a more precise version of this observation, as it even identifies the limiting law of (4.12).

Proposition 4.19 • Central limit theorem

Let \(X_1, X_2, \dots\) be independent identically distributed random variables in \(L^2,\) with variance \(\sigma^2.\) Then, as \(n \to \infty,\) the quantity (4.12) converges in law to a Gaussian random variable with mean zero and variance \(\sigma^2.\)

Proof. Using the technology of characteristic functions developed in the previous section, the proof is remarkably straightforward. First, without loss of generality we may suppose that \(\mathbb{E}[X_1] = 0\) (otherwise just replace \(X_n\) with \(X_n - \mathbb{E}[X_n]\)).

We shall use that for any random variable \(X \in L^2\) we have20 \[\tag{4.13} \Phi_X(\xi) = 1 + \mathrm i\xi \mathbb{E}[X] - \frac{1}{2} \xi^2 \mathbb{E}[X^2] + o(\xi^2)\] as \(\xi \to 0.\) To show (4.13), we differentiate under the expectation, using that \(X \in L^2,\) to obtain \[\Phi_X'(\xi) = \mathrm i\, \mathbb{E}[X \, \mathrm e^{\mathrm i\xi X}]\,,\] and differentiating again yields \[\Phi_X''(\xi) = - \mathbb{E}[X^2 \, \mathrm e^{\mathrm i\xi X}]\,.\] Note that differentiating inside the expectation is allowed since \(X \in L^2.\) By Taylor’s theorem, we therefore have \[\begin{aligned} \Phi_X(\xi) &= 1 + \mathrm i\, \mathbb{E}[X] \, \xi - \int_0^\xi \, \mathbb{E}[X^2 \, \mathrm e^{\mathrm it X}] \, (\xi - t) \, \mathrm dt \\ &= 1 + \mathrm i\, \mathbb{E}[X] \, \xi - \frac{1}{2} \xi^2 \mathbb{E}[X^2] - \int_0^\xi \, \mathbb{E}[X^2 \, (\mathrm e^{\mathrm it X} - 1)] \, (\xi - t) \, \mathrm dt\,. \end{aligned}\] The expectation under the last integral tends to zero as \(t \to 0,\) by the dominated convergence theorem. Hence, the whole integral is \(o(\xi^2),\) and we obtain (4.13).

With \(Z_n :=\frac{X_1 + \cdots + X_n}{\sqrt{n}}\) we have, by independence of the variables \(X_1, \dots, X_n,\) \[\Phi_{Z_n}(\xi) = \mathbb{E}\biggl[\exp \biggl(\mathrm i\xi \frac{X_1 + \cdots + X_n}{\sqrt{n}}\biggr)\biggr] = \mathbb{E}[\exp(\mathrm i\xi X_1 / \sqrt{n})]^n = \Phi_{X_1}(\xi / \sqrt{n})^n\,.\] By (4.13), we therefore get, for any \(\xi \in \mathbb{R},\) \[\Phi_{Z_n}(\xi) = \biggl(1 - \frac{\sigma^2 \xi^2}{2 n} + o\biggl(\frac{\xi^2}{n}\biggr)\biggr)^n \longrightarrow \mathrm e^{-\frac{\sigma^2}{2} \xi^2}\] as \(n \to \infty.\) The claim now follows from Propositions 4.15 and 4.18.

Home

Contents

Study Weeks