4 Convergence of random variables
In this chapter we study the convergence of random variables in detail. We shall study the most important notions of convergence: almost surely, in probability, in \(L^p,\) and in law.
4.1 Notions of convergence
Let \((X_n)_{n \in \mathbb{N}^*}\) and \(X\) be random variables with values in \(\mathbb{R}.\) In this section we wish to understand different notions of the convergence \(X_n \to X\) and any logical implications between them.
Recall that we have already seen two notions of convergence:
\(X_n \overset{\text{a.s.}}{\longrightarrow}X\) if \(\mathbb{P}(\lim_n X_n = X) = 1.\)
\(X_n \overset{L^p}{\longrightarrow}X\) if \(\lim_n \mathbb{E}[\lvert X_n - X \rvert^p] = 0.\)
The following definition is very useful, and specific to probability.
The random variables \(X_n\) converge in probability to \(X,\) denoted \(X_n \overset{\mathbb{P}}{\longrightarrow}X,\) if for all \(\varepsilon> 0\) we have \[\lim_n \mathbb{P}(\lvert X_n - X \rvert > \varepsilon) = 0\,.\]
It is often useful to observe that this notion of convergence is metrisable, i.e. it arises from a metric on the space of all random variables.
Let \(\mathcal L^0\) be the space of random variables on \((\Omega, \mathcal A, \mathbb{P})\) with values in \(\mathbb{R},\) and let \(L^0 :=\mathcal L^0 / \sim,\) where \(\sim\) is the equivalence relation defined by \(X \sim Y\) if and only if \(X = Y\) almost surely. For \(X,Y \in L^0\) we define \[d(X,Y) :=\mathbb{E}[\lvert X - Y \rvert \wedge 1]\,.\]
The space \((L^0, d)\) is a complete metric space, and \(X_n \overset{\mathbb{P}}{\longrightarrow}X\) if and only if \(d(X_n, X) \to 0.\)
Proof. It is easy to check that \(d\) is a metric.
Let us now verify that \(X_n \overset{\mathbb{P}}{\longrightarrow}X\) implies \(d(X_n, X) \to 0.\) Suppose that \(X_n \overset{\mathbb{P}}{\longrightarrow}X\) and choose an arbitrary \(\varepsilon\in (0,1].\) Then \[\begin{gathered} d(X_n, X) = \mathbb{E}[\lvert X_n - X \rvert \wedge 1] = \mathbb{E}[\lvert X_n - X \rvert \, \mathbf 1_{\lvert X_n - X \rvert \leqslant\varepsilon}] + \mathbb{E}[(\lvert X_n - X \rvert \wedge 1) \, \mathbf 1_{\lvert X_n - X \rvert > \varepsilon}] \\ \leqslant\varepsilon+ \mathbb{P}(\lvert X_n - X \rvert > \varepsilon) \longrightarrow \varepsilon\,, \end{gathered}\] by assumption. Since \(\varepsilon> 0\) was arbitrary, we conclude that \(d(X_n, X) \to 0.\)
Conversely, suppose that \(d(X_n, X) \to 0.\) Then for all \(\varepsilon\in (0,1]\) we have, by Chebyshev’s inequality, \[\mathbb{P}(\lvert X_n - X \rvert > \varepsilon) \leqslant\frac{1}{\varepsilon} \mathbb{E}[\lvert X_n - X \rvert \wedge 1] \rightarrow 0\,,\] i.e. \(X_n \overset{\mathbb{P}}{\longrightarrow}X.\)
All that remains, therefore, is to show that the metric space \((L^0,d)\) is complete. To that end, let \((X_n)\) be a Cauchy sequence for \(d(\cdot, \cdot).\) Choose a subsequence \(Y_k = X_{n_k}\) such that \(d(Y_k, Y_{k+1}) \leqslant 2^{-k}.\) We then use the Borel-Cantelli lemma (see also Remark 3.18) with \[\mathbb{E}\Biggl[\sum_{k = 1}^\infty (\lvert Y_{k+1} - Y_k \rvert \wedge 1)\Biggr] \leqslant\sum_{k = 1}^\infty 2^{-k} < \infty\,,\] so that \[\sum_{k = 1}^\infty (\lvert Y_{k+1} - Y_k \rvert \wedge 1) < \infty \quad \text{a.s.}\,,\] which implies \[\sum_{k = 1}^\infty \lvert Y_{k+1} - Y_k \rvert < \infty \quad \text{a.s.}\,.\] Defining \[X :=Y_1 + \sum_{k = 1}^\infty (Y_{k+1} - Y_k)\,,\] we therefore have \(Y_k \overset{\text{a.s.}}{\longrightarrow} X\) as \(k \to \infty.\) Hence, \[d(Y_k, X) = \mathbb{E}[\lvert Y_k - X \rvert \wedge 1] \longrightarrow 0\] as \(k \to \infty,\) by dominated convergence. We conclude that \(d(X_n, X) \to 0\) as \(n \to \infty.\)
The argument in the preceding proof of completeness is a general and important fact from probability and measure theory: convergence in probability does not in general imply almost sure convergence, but it does so provided that we restrict ourselves to a suitable subsequence. This is made precise in the following proposition.
Let \(X_n, X\) be random variables with values in \(\mathbb{R}.\)
If \(X_n \overset{\text{a.s.}}{\longrightarrow} X\) or \(X_n \overset{L^p}{\longrightarrow} X\) then \(X_n \overset{\mathbb{P}}{\longrightarrow} X.\)
If \(X_n \overset{\mathbb{P}}{\longrightarrow} X\) then there exists a subsequence \((X_{n_k})\) such that \(X_{n_k} \overset{\text{a.s.}}{\longrightarrow} X.\)
Proof. Part (ii) was already established in the proof of Proposition 4.3. For part (i), if \(X_n \overset{\text{a.s.}}{\longrightarrow} X\) then \(\mathbb{P}(\lvert X_n - X \rvert > \varepsilon) = \mathbb{E}[\mathbf 1_{\lvert X_n - X \rvert > \varepsilon}] \to 0\) by dominated convergence, and if \(X_n \overset{L^p}{\longrightarrow} X\) then \(\mathbb{P}(\lvert X_n - X \rvert > \varepsilon) \leqslant\frac{1}{\varepsilon^p} \mathbb{E}[\lvert X_n - X \rvert^p] \to 0\) for any \(\varepsilon> 0.\)
In Proposition 4.4 (ii), it is in general necessary to take a subsequence; see Remark 3.26. (In this example, after taking a subsequence we can ensure that \(\sum_{n} b_n < \infty.\))
4.2 Convergence in law
In this section we introduce the final notion of convergence of random variables in this course. We fix the dimension \(d \in \mathbb{N}^*\) throughout. We denote by \(C_b \equiv C_b(\mathbb{R}^d)\) the space of bounded continuous real-valued functions on \(\mathbb{R}^d.\)
Let \(\mu_n,\) \(n \in \mathbb{N}^*,\) and \(\mu\) be probability measures on \(\mathbb{R}^d.\) We say that \(\mu_n\) converges weakly to \(\mu,\) denoted by \(\mu_n \overset{\mathrm w}{\longrightarrow} \mu,\) if \[\int \varphi \, \mathrm d\mu_n \longrightarrow \int \varphi \, \mathrm d\mu \,, \qquad \forall \varphi \in C_b\,.\]
Let \(X_n,\) \(n \in \mathbb{N}^*,\) and \(X\) be random variables with values in \(\mathbb{R}^d.\) We say that \(X_n\) converges in law, or in distribution, to \(X,\) denoted by \(X_n \overset{\mathrm d}{\longrightarrow}X,\) if \[\mathbb{P}_{X_n} \overset{\mathrm w}{\longrightarrow} \mathbb{P}_X\,.\] Explicitly, this means that \[\mathbb{E}[\varphi(X_n)] \longrightarrow \mathbb{E}[\varphi(X)] \,, \qquad \forall \varphi \in C_b\,.\]
The convergence in law \(\overset{\mathrm d}{\longrightarrow}\) is very different in nature from the other modes of convergence \(\overset{\text{a.s.}}{\longrightarrow},\) \(\overset{\mathbb{P}}{\longrightarrow},\) \(\overset{L^2}{\longrightarrow}\) that we have seen up to now: it only pertains to the laws of the random variables. In particular, the random variables \(X_n\) and \(X\) can all be defined on different probability spaces. Moreover, the limit is (trivially) not unique: if \(X\) and \(Y\) are different random variables with the same law and \(X_n \overset{\mathrm d}{\longrightarrow}X\) then clearly also \(X_n \overset{\mathrm d}{\longrightarrow}Y.\) (In contrast, the limit of weak convergence of probability measures is unique.)
If \(a_n \to a\) then \(\delta_{a_n} \overset{\mathrm w}{\longrightarrow} \delta_a\) (by definition of continuity).
If the law of \(X_n\) is uniform on \(\{\frac1n, \frac2n, \dots, \frac{n}{n}\}\) and the law of \(X\) is Lebesgue measure on \([0,1],\) then \(X_n \overset{\mathrm d}{\longrightarrow}X\) (by the Riemann sum approximation of integrals of continuous functions).
Let \(\mu\) be a probability measure on \(\mathbb{R}\) and define the scaling function \(s^\eta(x) :=\eta x\) for \(\eta > 0.\) Then \(s^\eta_* \mu \overset{\mathrm w}{\longrightarrow}\delta_0\) as \(\eta \to 0.\) To show this, take a function \(\varphi \in C_b\) and write, using the change of variables \(x = s^\eta(y),\) \[\int \varphi(x) \, s^\eta_* \mu(\mathrm dx) = \int \varphi(s^\eta(y)) \, \mu(\mathrm dy) = \int \varphi(\eta y) \, \mu(\mathrm dy) \to \varphi(0)\] as \(\eta \to 0,\) by dominated convergence.
In the important special case where \(\mu(\mathrm dx) = p(x) \, \mathrm dx\) has a density \(p,\) we have \[s^\eta_* \mu(\mathrm dx) = \frac{1}{\eta} p\biggl(\frac{x}{\eta}\biggr) \, \mathrm dx\,.\] The right-hand side is usually known as an approximate delta function. Such functions play a very important role in analysis. One such application is given in the Fourier inversion formula in Section 4.3.
If \(X_n \overset{\mathbb{P}}{\longrightarrow} X\) then \(X_n \overset{\mathrm d}{\longrightarrow}X.\)
Proof. We proceed by contradiction and suppose that \(X_n \overset{\mathbb{P}}{\longrightarrow} X\) but \(X_n\) does not converge to \(X\) in law. The latter means that there exists \(\varphi \in C_b\) such that \(\mathbb{E}[\varphi(X_n)] \not\to \mathbb{E}[\varphi(X)].\) Hence, there exists a subsequence \((n_k)\) and \(\varepsilon> 0\) such that \[\tag{4.1} \bigl\lvert \mathbb{E}[\varphi(X_{n_k})] - \mathbb{E}[\varphi(X)] \bigr\rvert \geqslant\varepsilon\] for all \(k.\) Moreover, by Proposition 4.4 (ii), there exists a further subsequence \((n_{k_l})\) such that \(X_{n_{k_l}} \to X\) a.s. as \(l \to \infty.\) But by dominated convergence, we have \[\lvert \mathbb{E}[\varphi(X_{n_{k_l}})] - \mathbb{E}[\varphi(X)] \rvert \to 0\] as \(l \to \infty,\) in contradiction to (4.1).
The reverse implication of Proposition 4.9 is false. Worse: if \(X_n \overset{\mathrm d}{\longrightarrow}X\) then the very statement \(X_n \overset{\mathbb{P}}{\longrightarrow} X\) is in general meaningless! This is because \(X_n \overset{\mathrm d}{\longrightarrow}X\) does not imply that \(X_n\) and \(X\) are all defined on the same probability space, while \(X_n \overset{\mathbb{P}}{\longrightarrow} X\) requires all random variables to be defined on the same probability space (see Remark 4.7). Even when all random variables are defined on the same probability space, it is easy to think of counterexamples. For example, let \(X\) have a Bernoulli law with parameter \(p = 1/2\) and set \(X_n :=1 - X\) for all \(n.\) Then clearly \(\mathbb{P}_{X_n} = \mathbb{P}_X,\) so that \(X_n \overset{\mathrm d}{\longrightarrow}X,\) but because \(\lvert X - X_n \rvert = 1\) a.s., clearly \(X_n\) does not converge to \(X\) is probability.
However, if \(X_n \overset{\mathrm d}{\longrightarrow}a\) for some constant \(a\) then the implication \(X_n \overset{\mathbb{P}}{\longrightarrow} a\) does hold. To show this, let \(\varepsilon> 0\) and define the continuous bounded function \[\varphi(x) :=\frac{\lvert x - a \rvert}{\varepsilon} \wedge 1\,.\] (Plot this function!) Then \[\mathbb{P}(\lvert X_n - a \rvert > \varepsilon) = \mathbb{E}[\mathbf 1_{\lvert X_n - a \rvert > \varepsilon}] \leqslant\mathbb{E}[\varphi(X_n)] \to \varphi(a) = 0\] as \(n \to \infty,\) by assumption \(X_n \overset{\mathrm d}{\longrightarrow}a.\)
We recall that the support of a function \(f \,\colon\mathbb{R}^d \to \mathbb{R}\) is the set \(\mathop{\mathrm{supp}}f :=\overline{\{x \in \mathbb{R}^d \,\colon f(x) \neq 0\}} \!\,.\) Hence, the condition that \(\mathop{\mathrm{supp}}f\) be compact simply means that it is bounded.
Let \(H \subset C_c\) be such that the closure of \(H\) under \(\lVert \cdot \rVert_\infty\) contains \(C_c.\) Let \(\mu_n\) and \(\mu\) be probability measures on \(\mathbb{R}^d.\) Then the following are equivalent.
\(\mu_n \overset{\mathrm w}{\longrightarrow}\mu\) (i.e. \(\forall \varphi \in C_b,\) \(\int \varphi \, \mathrm d\mu_n \to \int \varphi \, \mathrm d\mu\)).
\(\forall \varphi \in C_c,\) \(\int \varphi \, \mathrm d\mu_n \to \int \varphi \, \mathrm d\mu.\)
\(\forall \varphi \in H,\) \(\int \varphi \, \mathrm d\mu_n \to \int \varphi \, \mathrm d\mu.\)
Proof. The implications (i)\(\Rightarrow\)(ii) and (i)\(\Rightarrow\)(iii) are obvious. We shall show (ii)\(\Rightarrow\)(i) and (iii)\(\Rightarrow\)(ii).
To show (ii)\(\Rightarrow\)(i), suppose (ii). Let \(\varphi \in C_b.\) Choose a sequence \(f_k \in C_c\) such that \(0 \leqslant f_k \leqslant 1\) and \(f_k \uparrow 1\) as \(k \to \infty\) (you can take for instance \(f_k(x) = (1 - \lvert x/k \rvert)_+\)). Then we telescope \[\begin{aligned} \int \varphi \, \mathrm d\mu_n - \int \varphi \, \mathrm d\mu &= \int \varphi \, \mathrm d\mu_n - \int \varphi f_k \, \mathrm d\mu_n \\ &\quad+ \int \varphi f_k \, \mathrm d\mu_n - \int \varphi f_k \, \mathrm d\mu \\ &\quad+ \int \varphi f_k \, \mathrm d\mu - \int \varphi \, \mathrm d\mu\,, \end{aligned}\] and estimate each line on the right-hand side separately.
For any \(k \in \mathbb{N}^*,\) the second line tends to zero as \(n \to \infty,\) by assumption (ii) since \(\varphi f_k \in C_c.\)
For any \(k \in \mathbb{N}^*,\) the first line is estimated in absolute value by \[\lVert \varphi \rVert_\infty \biggl(1 - \int f_k \, \mathrm d\mu_n\biggr) \underset{n \to \infty}{\longrightarrow} \lVert \varphi \rVert_\infty \biggl(1 - \int f_k \, \mathrm d\mu\biggr)\,,\] where we again used (ii) since \(f_k \in C_c.\)
The third line is estimated in absolute value by \[\lVert \varphi \rVert_\infty \biggl(1 - \int f_k \, \mathrm d\mu\biggr)\,.\]
Putting everything together, we conclude that for any \(k \in \mathbb{N}^*\) we have \[\limsup_{n \to \infty} \biggl\lvert \int \varphi \, \mathrm d\mu_n - \int \varphi \, \mathrm d\mu \biggr\rvert \leqslant 2 \lVert \varphi \rVert_\infty \biggl(1 - \int f_k \, \mathrm d\mu\biggr)\,.\] Since \(k\) was arbitrary, we can take \(k \to \infty,\) under which the right-hand side tends to zero by dominated convergence. This concludes the proof of (ii)\(\Rightarrow\)(i).
To show (iii)\(\Rightarrow\)(ii), suppose (iii). Let \(\varphi \in C_c.\) Choose a sequence \(\varphi_k \in H\) such that \(\lVert \varphi_k - \varphi \rVert_{\infty} \to 0\) as \(k \to \infty.\) Then for any \(k \in \mathbb{N}^*\) we have, again by telescoping, \[\begin{aligned} &\limsup_{n \to \infty} \biggl\lvert \int \varphi \, \mathrm d\mu_n - \int \varphi \, \mathrm d\mu \biggr\rvert \\ &\quad \leqslant \limsup_{n \to \infty} \Biggl(\biggl\lvert \int \varphi \, \mathrm d\mu_n - \int \varphi_k \, \mathrm d\mu_n \biggr\rvert + \biggl\lvert \int \varphi_k \, \mathrm d\mu_n - \int \varphi_k \, \mathrm d\mu \biggr\rvert + \biggl\lvert \int \varphi_k \, \mathrm d\mu - \int \varphi \, \mathrm d\mu \biggr\rvert\Biggr) \\ &\quad \leqslant 2 \lVert \varphi - \varphi_k \rVert_\infty \underset{k \to \infty}{\longrightarrow} 0\,, \end{aligned}\] where we used that for any \(k \in \mathbb{N}^*,\) the middle term on the second line tends to zero as \(n \to \infty\) by (iii). This concludes the proof of (iii)\(\Rightarrow\)(ii).