5 Markov chains and random walks

Markov chains are some of the most useful and fundamental objects in probability theory. They constitute the paradigm of a random dynamical system, and as such have innumerable applications from physics to biology and economics21. Informally, a Markov chain is a stochastic process, i.e. a random variable depending on time, whereby the law of the future of the process depends only on the present and not the whole past. In other words, the process has no memory: knowing its entire history gives me no useful information for predicting the future as compared to knowing just its value today.

Before moving on, let us briefly unwrap the notion of a stochastic process. In this course, time is always discrete, i.e. an integer \(n \in \mathbb{N}.\) (Think \(n\) labelling days starting from some arbitrary starting point.) A stochastic process is a family of random variables \((X_n)_{n \in \mathbb{N}}\) taking values in some set \(S\) (which can be, for instance, \(\mathbb{R}^d,\) \(\mathbb{Z}^d,\) or some finite set). In other words, a stochastic process on the probability space \(\Omega\) is a function \[X \,\colon\mathbb{N}\times \Omega \to S\,,\] such that \(X_n(\cdot)\) is measurable for each \(n \in \mathbb{N}.\) It is helpful to note that a stochastic process can be though of in two different ways.

5.1 Definition and basic properties

Throughout this chapter, we fix a set \(S,\) which we always assume to be discrete, i.e. finite or countable. A prominent example of the latter is \(S = \mathbb{Z}^d.\)

Definition 5.1

  1. A stochastic process \((X_n)_{n \in \mathbb{N}}\) is a Markov chain, or a Markov process, if for any \(n \in \mathbb{N}\) and \(x_0, \dots, x_{n}, y \in S\) we have \[\tag{5.1} \mathbb{P}(X_{n+1} = y \mid X_n = x_n, \dots, X_0 = x_0) = \mathbb{P}(X_{n+1} = y \mid X_{n} = x_n)\,.\]

  2. The Markov chain is homogeneous if the function \[(x, y) \mapsto \mathbb{P}(X_{n+1} = y \mid X_n = x)\] does not depend on \(n \in \mathbb{N}.\)

Throughout this chapter, we shall always and without further mention suppose that \((X_n)\) is a homogeneous Markov chain.

Definition 5.1 says informally that, conditioned on the past \(X_0, \dots, X_{n-1}, X_n,\) the law of the future \(X_{n+1}\) depends only on the present \(X_n.\) The homogeneous property says that this law does not depend on the time \(n.\)

Definition 5.2

We define the transition matrix \(Q \,\colon S \times S \to [0,1]\) of the chain \((X_n)\) through \[Q(x,y) :=\mathbb{P}(X_1 = y \mid X_0 = x)\,.\]

By homogeneity, we have \(Q(x,y) = \mathbb{P}(X_{n+1} = y \mid X_n = x)\) for all \(n \in \mathbb{N}.\) The number \(Q(x,y)\) is therefore the probability of going from \(x\) to \(y\) in one time step of the chain.

The following remark follows immediately from Definition 5.2.

Remark 5.3

The transition matrix \(Q\) satisfies the follows properties.

  1. For all \(x,y \in S\) we have \(Q(x,y) \in [0,1].\)

  2. For all \(x \in S\) we have \(\sum_{y \in S} Q(x,y) = 1.\)

Definition 5.4

A matrix \(Q\) satisfying the properties (i) and (ii) from Remark 5.3 is called stochastic.

Proposition 5.5

Let \(Q\) be the transition matrix of the chain \((X_n).\) Then \(Q\) and the law of \(X_0\) fully determine the law of the entire process \((X_n),\) through the formula \[\tag{5.2} \mathbb{P}(X_0 = x_0, X_1 = x_1, \dots, X_n = x_n) = \mathbb{P}(X_0 = x_0) Q(x_0, x_1) \cdots Q(x_{n-1}, x_n)\,.\]

Proof. By definition of conditional expectation and by Definition 5.1, we get \[\begin{aligned} &\mspace{-20mu} \mathbb{P}(X_n = x_n, \dots, X_0 = x_0) \\ &= \mathbb{P}(X_n = x_n \mid X_{n-1} = x_{n-1}, \dots, X_0 = x_0) \, \mathbb{P}(X_{n-1} = x_{n-1}, \dots, X_0 = x_0) \\ &= Q(x_{n-1}, x_n) \, \mathbb{P}(X_{n-1} = x_{n-1}, \dots, X_0 = x_0)\,. \end{aligned}\] Repeatedly applying the same argument to the second term on the right-hand side yields the claim.

Definition 5.6

The law of \(X_0\) is called the initial distribution of the chain \((X_n).\)

Remark 5.7

Conversely, given a probability measure \(\mu\) and a stochastic matrix \(Q,\) can we construct a Markov chain with initial distribution \(\mu\) and transition matrix \(Q\)? The answer is yes.

For any finite time \(N \in \mathbb{N},\) we can define the law of the vector \((X_0, \dots, X_N)\) as in (5.2): \[\mathbb{P}(X_0 = x_0, X_1 = x_1, \dots, X_N = x_N) :=\mu(x_0) Q(x_0, x_1) \cdots Q(x_{N-1}, x_N)\,.\] Because \(Q\) is stochastic, the right-hand side is indeed a probability measure on \(S^n,\) and moreover for any \(n \leqslant N\) the relation (5.2) holds. Hence, for any \(n \leqslant N - 1,\) we have \[\begin{aligned} \mathbb{P}(X_{n+1} = y \mid X_0 = x_0, \dots, X_n = x_n) &= \frac{\mathbb{P}(X_{n+1} = x_{n+1}, \dots, X_0 = x_0)}{\mathbb{P}(X_{n} = x_{n}, \dots, X_0 = x_0)} \\ &= \frac{\mu(x_0) Q(x_0, x_1) \cdots Q(x_{n-1}, x_n) Q(x_n, y)}{\mu(x_0) Q(x_0, x_1) \cdots Q(x_{n-1}, x_n)} \\ &= Q(x_n, y)\,. \end{aligned}\] Therefore, the process \(X_0, X_1, \dots, X_N\) satisfies (5.1) up to time \(N.\) This construction can be extended to infinite times with a bit more work (e.g. using Kolmogorov’s extension theorem), which we shall not go into here.

Summarizing, one can equivalently consider

  1. a Markov chain (as in Definition 5.1), or

  2. its initial distribution and its transition matrix.

The latter two in (ii) are often easier to work with.

Example 5.8

Consider the following very simple weather model. Let \(p,q \in [0,1].\) Suppose that a day is either dry (D) or rainy (R). If today is rainy, then with probability \(p\) tomorrow is rainy. If today is dry, then with probability \(q\) tomorrow is dry.

This specifies the transition matrix \(Q\) on the state space \(\{\mathrm R, \mathrm D\}\) through \[Q(\mathrm R, \mathrm R) = p \,, \qquad Q(\mathrm R, \mathrm D) = 1 - p\,, \qquad Q(\mathrm D, \mathrm D) = q\,, \qquad Q(\mathrm D, \mathrm R) = 1 - q\,.\]

It is very convenient to interpret \(Q \in [0,1]^{S \times S}\) literally as a matrix and use the usual notations for matrix multiplication, according to the following definition.

Definition 5.9

Let \(Q\) be a stochastic matrix on \(S.\)

  1. For \(n \in \mathbb{N}\) we define the matrix \(Q^n \in [0,1]^{S \times S}\) through \(Q^0(x,y) = \delta_{xy},\) \(Q^1(x,y) :=Q(x,y),\) and \[Q^{n+1}(x,y) :=\sum_{z \in S} Q^n(x,z) Q(z,y)\] by recurrence.

  2. For a bounded function \(f \,\colon S \to \mathbb{R}\) we define \(Qf(x) :=\sum_{y \in S} Q(x,y) f(y)\) .

  3. For a probability measure \(\mu \,\colon S \to [0,1]\) we define \(\mu Q(x) :=\sum_{y \in S} \mu(y) Q(y,x)\) .

Hence, according to the conventions of linear algebra, we interpret \(Q\) as an \(S \times S\) matrix, \(f\) as an \(S\)-dimensional column vector, and \(\mu\) as an \(S\)-dimensional row vector.

Note that if \(\mu\) is a probability measure then so is \(\mu Q^n\) for any \(n \in \mathbb{N}.\) Moreover, by (5.2) we have \[\mathbb{P}(X_n = x_n \mid X_0 = x_0) = Q^n(x_0, x_n)\,.\] Thus, the matrix \(Q^n\) has the interpretation of the \(n\)-step transition matrix of the Markov chain.

For a Markov chain \((X_n)\) with initial distribution \(\mu\) and transition matrix \(Q,\) in matrix notation we can for instance compute \[\begin{gathered} \mathbb{E}[f(X_n)] = \sum_{x_n \in S} f(x_n) \mathbb{P}(X_n = x_n) = \sum_{x_n \in S} \sum_{x_0 \in S} f(x_n) \mathbb{P}(X_n = x_n \mid X_0 = x_0) \mathbb{P}(X_0 = x_0) \\ = \sum_{x_n \in S} \sum_{x_0 \in S} f(x_n) Q^n(x_0, x_n) \mu(x_0) = \mu Q^n f\,. \end{gathered}\]

Example 5.10 • Example 5.8 continued

For the Markov chain from Example 5.8 we can use the matrix notation to write \[Q = \begin{pmatrix} p & 1 - p \\ 1 - q & q \end{pmatrix}\,,\] as an \(\{\mathrm R, \mathrm D\} \times \{\mathrm R, \mathrm D\}\) matrix.

We conclude this section with a few more examples.

Example 5.11

(Random walk on \(\mathbb{Z}^d\)). Let \(d \in \mathbb{N}^*\) and \(\nu\) a probability measure on \(\mathbb{Z}^d.\) Define the stochastic matrix \(Q(x,y) :=\nu(y - x).\) A Markov chain with transition matrix \(Q\) is called a random walk on \(\mathbb{Z}^d.\) The interpretation is that at each step of the walk, the chain takes a random step from its current location, with the law of the step being given by \(\nu.\) The random walk can also be explicitly written as a sum \[\tag{5.3} X_n = X_0 + \sum_{i = 1}^n Z_i\,,\] where \(Z_1, Z_2, \dots\) are independent random variables with law \(\nu,\) independent of \(X_0.\) Indeed, to verify Definition 5.1, we simply note that (5.3) satisfies \[\begin{aligned} \mathbb{P}(X_{n+1} = y \mid X_n = x_n, \dots, X_0 = x_0) &= \mathbb{P}(X_n + Z_{n+1} = y \mid X_n = x_n, \dots, X_0 = x_0) \\ &= \mathbb{P}(Z_{n+1} = y-x) \\ &= Q(x,y)\,, \end{aligned}\] by independence of \(Z_{n+1}\) and \((X_0, Z_1, \dots, Z_n).\)

If \[\tag{5.4} \nu(x) = \frac{1}{2d} \mathbf 1_{\lvert x \rvert = 1}\,,\] then the walk is called simple (the walk jumps to the nearest neighbours with equal probability).

Example 5.12 • Knight’s random walk

Random walks can take place in a more general setting than the lattice \(\mathbb{Z}^d.\) As an example, let \(S = \{1, \dots, 8\}^2\) be a chessboard. For each square \(x \in S\) denote by \(K(x) \subset S\) the set of squares of the board that can be reached by a single move of a knight from \(x.\) Then we can define the “knight’s random walk” as the Markov chain with transition matrix \[Q(x,y) = \frac{1}{\lvert K(x) \rvert} \mathbf 1_{y \in K(x)}\,.\] Thus, at each step, the knight moves uniformly at random to any square it can reach.

Example 5.13 • Simple random walk on general graph

The previous example can be greatly generalised as follows. Let \((S, E)\) be a connected graph on the vertex set \(S\) such that the degree \(D_x :=\lvert \{e \in E \,\colon x \in e\} \rvert\) of each vertex \(x \in S\) is finite (such a graph is called locally finite). Define the stochastic matrix \(Q\) on \(S \times S\) through \[Q(x,y) :=\frac{1}{D_x} \mathbf 1_{\{x,y\} \in E}\,.\] Here, at each step the walk moves uniformly at random along an incident edge.

Example 5.14 • Ehrenfest model of diffusion

Here is a primitive model of diffusion names after its inventors, Tatiana and Paul Ehrenfest. Suppose that we have a container containing \(N\) gas molecules. The container is split in two halves separated by a small hole. Sometimes molecules will travel from one half to the other. We are interested in the number \(x\) of molecules in the left half. We take the following, rather naive, model of diffusion: at each time step we choose one molecule uniformly at random and bring it to the other side of the container.

Thus, we set \(S = \{0,1,\dots, N\}\) and define the transition matrix \[Q(x,y) = \begin{cases} \frac{N - x}{N} & \text{if } y = x+1 \\ \frac{x}{N} & \text{if } y = x - 1 \\ 0 & \text{otherwise}\,. \end{cases}\]

5.2 The Markov property

In this section we introduce a result, often simply referred to the Markov property, which will be our main tool in studying Markov chains. It is a generalisation of the condition (5.1) from Definition 5.1.

Proposition 5.15 • Markov property

Let \(n,m \in \mathbb{N}.\) Let \(f\) be a nonnegative function on \(S^{m+1}\) and \(A \subset S^{n+1}.\) Then for any \(x \in S\) we have \[\mathbb{E}[f(X_n, \dots, N_{n+m}) \mid X_n = x, (X_0, \dots, X_n) \in A] = \mathbb{E}[f(X_0, \dots, X_m) \mid X_0 = x]\,.\]

In words, the Markov property says that the law of the entire future of the process, conditioned on the entire past and being at \(x\) today, is the same as the law of a process simply starting at \(x\) at time zero. In other words, what happened before the present does not matter. This is a strong manifestation of the memoryless and homogeneity properties of Markov chains.

Proof of Proposition 5.15. For \(x_0, \dots, x_{n-1} \in S,\) we get from Definition 5.1 \[\begin{aligned} &\mathbb{E}[f(X_n, \dots, X_{n+m}) \mid X_0 = x_0, \dots, X_{n-1} = x_{n - 1}, X_n = x] \\ &= \sum_{y_1, \dots, y_m \in S} f(x, y_1, \dots, y_m) \\ &\quad \times \mathbb{P}(X_{n+m} = y_m, \dots, X_{n+1} = y_1 \mid X_0 = x_0, \dots, X_{n-1} = x_{n - 1}, X_n = x) \\ &= \sum_{y_1, \dots, y_m \in S} f(x, y_1, \dots, y_m) \, \mathbb{P}(X_{n+m} = y_m, \dots, X_{n+1} = y_1 \mid X_n = x) \\ &= \mathbb{E}[f(X_0, \dots, X_m) \mid X_0 = x]\,. \end{aligned}\] Therefore, \[\begin{aligned} &\mathbb{E}[f(X_n, \dots, N_{n+m}) \mathbf 1_{X_n = x, (X_0, \dots, X_n) \in A}] \\ &= \sum_{x_0, \dots, x_{n-1} \in S} \mathbf 1_{(x_0, \dots, x_{n-1}, x) \in A} \, \mathbb{E}[f(X_n, \dots, N_{n+m}) \mathbf 1_{X_0 = x_1, \dots, X_{n-1} = x_{n-1}, X_n = x}] \\ &= \sum_{x_0, \dots, x_{n-1} \in S} \mathbf 1_{(x_0, \dots, x_{n-1}, x) \in A} \, \mathbb{E}[f(X_n, \dots, N_{n+m}) \mid X_0 = x_1, \dots, X_{n-1} = x_{n-1}, X_n = x] \\ &\quad \times \mathbb{P}(X_0 = x_1, \dots, X_{n-1} = x_{n-1}, X_n = x) \\ &= \sum_{x_0, \dots, x_{n-1} \in S} \mathbf 1_{(x_0, \dots, x_{n-1}, x) \in A} \, \mathbb{E}[f(X_0, \dots, N_m) \mid X_0 = x] \\ &\quad \times \mathbb{P}(X_0 = x_1, \dots, X_{n-1} = x_{n-1}, X_n = x) \\ &= \mathbb{E}[f(X_0, \dots, N_m) \mid X_0 = x] \, \mathbb{P}(X_n = x, (X_0, \dots, X_n) \in A)\,, \end{aligned}\] and the claim follows after dividing by \(\mathbb{P}(X_n = x, (X_0, \dots, X_n) \in A).\)

The Markov property can be rewritten in the following form, which is sometimes useful. At a first reading, I suggest that you skip over it and return to it when you read the proof of Proposition 5.45 where it is used.

Corollary 5.16

Let \(n,m \in \mathbb{N}.\) Let \(f\) be a nonnegative function on \(S^{m+1}\) and \(g\) a nonnegative function on \(S^{n+1}.\) Then for any \(x \in S\) we have \[\begin{gathered} \mathbb{E}[f(X_n, \dots, N_{n+m}) \, g(X_0, \dots, X_n) \mid X_n = x] \\ = \mathbb{E}[f(X_0, \dots, X_m) \mid X_0 = x] \, \mathbb{E}[g(X_0, \dots, X_n) \mid X_n = x_n]\,. \end{gathered}\]

Proof. Abbreviating \(x_n :=x,\) we obtain from Proposition 5.15 \[\begin{aligned} &\mspace{-10mu}\sum_{x_0, \dots, x_{n-1} \in S} \, g(x_0, \dots, x_n) \, \mathbb{E}[f(X_n, \dots, N_{n+m}) \, \mathbf 1_{X_0 = x_0} \cdots \mathbf 1_{X_{n-1} = x_{n-1}} \mid X_n = x_n] \\ &= \sum_{x_0, \dots, x_{n-1} \in S} g(x_0, \dots, x_{n}) \, \mathbb{E}[f(X_n, \dots, N_{n+m}) \mid X_0 = x_0, \dots, X_n = x_n] \\ &\qquad \times \mathbb{P}(X_0 = x_0, \dots, X_{n-1} = x_{n-1} \mid X_n = x_n) \\ &= \sum_{x_0, \dots, x_{n-1} \in S} g(x_0, \dots, x_{n}) \, \mathbb{E}[f(X_0, \dots, N_{m}) \mid X_0 = x_n] \\ &\qquad \times \mathbb{P}(X_0 = x_0, \dots, X_{n-1} = x_{n-1} \mid X_n = x_n) \\ &= \mathbb{E}[f(X_0, \dots, N_{m}) \mid X_0 = x_n] \, \mathbb{E}[g(X_0, \dots, X_n) \mid X_n = x_n]\,. \end{aligned}\]

Home

Contents

Study Weeks