4.5 The central limit theorem
Another, perhaps more pragmatic, justification is that, if one does not know the distribution of a random variable one is considering, we have no choice but to guess, and the Gaussian is a particularly convenient guess. Even if this is not correct, in many applications the Gaussian is a good enough approximation.
As a consequence, some very complicated systems admit a remarkably simple emergent effective description on large scales, although the full analysis of their individual components is hopelessly complicated. An example is the derivation of the emergent laws of hydrodynamics from a microscopic theory of matter. This idea is also famously at the core of Isaac Asimov’s Foundation trilogy.
Let \(X_1, X_2, \dots\) be a sequence of independent identically distributed real-valued random variables in \(L^1.\) The law of large numbers states that \[\frac{1}{n}(X_1 + \cdots + X_n) \longrightarrow \mathbb{E}[X_1]\] almost surely as \(n \to \infty.\) It is natural to ask how fast this convergence takes place, i.e. what is the typical size, or scale, of \(\frac{1}{n}(X_1 + \cdots + X_n) - \mathbb{E}[X_1],\) as a function of \(n.\)
For \(X_1 \in L^2,\) the answer is easy. Indeed, since \[\mathbb{E}\bigl[(X_1 + \cdots + X_n - n \mathbb{E}[X_1])^2\bigr] = \mathop{\mathrm{Var}}(X_1 + \cdots + X_n) = n \mathop{\mathrm{Var}}(X_1)\,,\] we find that \[\tag{4.16} \frac{1}{\sqrt{n}} (X_1 + \cdots + X_n - n \mathbb{E}[X_1])\] is typically of order one (since the expectation of its square is equal to \(\mathop{\mathrm{Var}}(X_1),\) which does not depend on \(n\)).
The central limit theorem is a more precise version of this observation, as it even identifies the limiting law of (4.16).
Let \(X_1, X_2, \dots\) be independent identically distributed random variables in \(L^2,\) with variance \(\sigma^2.\) Then, as \(n \to \infty,\) the quantity (4.16) converges in law to a Gaussian random variable with mean zero and variance \(\sigma^2.\)
Proof. Using the technology of characteristic functions developed in the previous section, the proof is remarkably straightforward. First, without loss of generality we may suppose that \(\mathbb{E}[X_1] = 0\) (otherwise just replace \(X_n\) with \(X_n - \mathbb{E}[X_n]\)).
Here we recall the “little-o” notation for some complex-valued function \(f\) and nonnegative function \(g\): “\(f(\xi) = o(g(\xi))\) as \(\xi \to 0\)” means that \(\lim_{\xi \to 0} \frac{f(\xi)}{g(\xi)} = 0\); informally: “\(f\) is much smaller than \(g\)”. Contrast this to the “big-O” notation: “\(f(\xi) = O(g(\xi))\)” means that \(\frac{\lvert f(\xi) \rvert}{g(\xi)} \leqslant C\) for some constant \(C\) independent of \(\xi\); informally: “\(f\) is not much larger than \(g\)”.
With \(Z_n :=\frac{X_1 + \cdots + X_n}{\sqrt{n}}\) we have, by independence of the variables \(X_1, \dots, X_n,\) \[\Phi_{Z_n}(\xi) = \mathbb{E}\biggl[\exp \biggl(\mathrm i\xi \frac{X_1 + \cdots + X_n}{\sqrt{n}}\biggr)\biggr] = \mathbb{E}[\exp(\mathrm i\xi X_1 / \sqrt{n})]^n = \Phi_{X_1}(\xi / \sqrt{n})^n\,.\] By (4.17), we therefore get, for any \(\xi \in \mathbb{R},\) \[\Phi_{Z_n}(\xi) = \biggl(1 - \frac{\sigma^2 \xi^2}{2 n} + o\biggl(\frac{\xi^2}{n}\biggr)\biggr)^n \longrightarrow \mathrm e^{-\frac{\sigma^2}{2} \xi^2}\] as \(n \to \infty.\) The claim now follows from Propositions Proposition 4.20 and Proposition 4.23.
This is the end of this course. I hope you enjoyed it!
Now you know all of the fundamentals of probability. If you liked what you learned (as I hope!), you are fully equipped to go on and learn about more advanced topics such as Markov chains and martingales.