Lecture Notes

Asymptotic Memoryless of Markov Chains

Posted by Wanyu Zhang on February 6, 2026

This is the first post in a series of notes taken from the Stochastic Systems class (MS&E 321) taught by Professor Peter Glynn. This series will be updated as the course progresses.

A fundamental property of Markov chains is that their long-run behavior “forgets” the initial state. More precisely, for a finite, irreducible, aperiodic Markov chain with transition matrix $P$, the $n$-step transition probabilities $P^n(x,y)$ converge to a stationary distribution $\pi(y)$ at a geometric rate. The key idea behind the proof is a clean algebraic decomposition $P^m = \delta \Lambda + (1-\delta)Q$, which separates a “memoryless” rank-1 component $\Lambda$ from a remainder $Q$. This decomposition admits a natural probabilistic interpretation via coupling, and extends to non-stationary transitions, infinite state spaces, and continuous state spaces through Doeblin’s condition.

Setup

Consider a Markov chain (MC) with:

Finite state space: $\lvert S\rvert = d < \infty$
Stationary transition matrix $P$
$P$ is irreducible and aperiodic

We begin with a technical lemma that connects irreducibility and aperiodicity to a uniform positivity condition on $P^m$.

Lemma 1. Let $P$ be the transition matrix of an irreducible Markov chain on a finite state space $S$. Then $P$ is aperiodic if and only if there exists $m \geq 1$ such that $P^m(x,y) > 0$ for all $x, y \in S$.

Remark. This lemma guarantees the existence of $m$ such that $P^m \gg 0$ (all entries strictly positive), which is the starting point of the convergence analysis below. We will revisit the necessity of finiteness in Lemma 1 at the end of this note.

Convergence Analysis

We first consider the case $m = 1$, i.e., $P \gg 0$.

1. Decomposition of $P$

Let $\Lambda$ be a stochastic matrix ($\Lambda \mathbf{e} = \mathbf{e}$) with identical rows, i.e., $\Lambda = \begin{bmatrix} \lambda \\ \vdots \\ \lambda \end{bmatrix}$ where $\lambda\ \mathbf{e}= 1$.

Since $P \gg 0$, there exists $\delta \in (0,1)$ such that $P \geq \delta \Lambda$. Define

\[P = \delta \Lambda + (1 - \delta) Q, \qquad Q = \frac{P - \delta \Lambda}{1 - \delta}.\]

Then $Q \geq 0$, and $Q$ is stochastic: $Q\mathbf{e} = \frac{P\mathbf{e} - \delta \Lambda\mathbf{e}}{1 - \delta}= \mathbf{e}$.

Remark. The parameter $\delta$ measures how much of $P$ can be “explained” by the memoryless component $\Lambda$. A larger $\delta$ means faster mixing.

2. Computing powers of $P$

A key property: $R\Lambda = \Lambda$ for any stochastic matrix $R$, since $R\Lambda = R \mathbf{e} \lambda = \mathbf{e} \lambda = \Lambda$.

Expanding $P^2$:

\[\begin{aligned} P^2 &= P(\delta \Lambda + (1 - \delta)Q) \\ &= \delta \Lambda + (1 - \delta)PQ \\ &= \delta \Lambda + (1 - \delta)(\delta \Lambda + (1 - \delta)Q)Q \\ &= \delta \Lambda + (1 - \delta)\delta \Lambda Q + (1 - \delta)^2 Q^2 \end{aligned}\]

By induction, the general formula is:

\[P^n = \sum_{j=0}^{n-1}\delta(1 - \delta)^{j}\Lambda Q^{j} + (1-\delta)^n Q^n.\]

As $n \to \infty$, the remainder $(1-\delta)^n Q^n \to 0$, and:

\[P^n \to \sum_{j=0}^{\infty} \delta(1 - \delta)^j \Lambda Q^j \quad \triangleq \quad \Pi.\]

3. Structure of $\Pi$

$\Pi$ is stochastic: since $Q^j\mathbf{e} = \mathbf{e}$ for all $j$, we have $\Lambda Q^j\mathbf{e} = \Lambda\mathbf{e} = \mathbf{e}$, and the geometric series preserves row sums.
$\Pi$ has identical rows: since $\Lambda Q^j= \mathbf{e} (\lambda Q^j)$, every term in the series has identical rows.

Therefore $\Pi$ is a rank-1 stochastic matrix whose common row is the stationary distribution $\pi$.

4. Rate of convergence

Define the matrix operator norm induced by the $\ell_\infty$ vector norm. For any stochastic matrix $P$, $\lVert P\rVert = 1$. Then:

\[\lVert P^n - \Pi\rVert = O\!\left((1 - \delta)^n\right) \qquad \text{(rate of mixing)}\]

5. Generalization to $m > 1$

By Lemma 1, for a finite irreducible aperiodic chain, there exists $m$ such that $P^m \gg 0$. Applying the $m=1$ argument to $P^m$ yields

\[P^{m\,k} \to \Pi := \sum_{j=0}^\infty \delta(1-\delta)^j \Lambda Q^j, \qquad \lVert\,P^{m\,k}-\Pi\,\rVert = O\!\left((1-\delta)^k\right).\]

For general $n$, write $n=km+r$ with $0\le r<m$. Then $P^n = (P^m)^k P^r$. Since $\Pi$ has identical rows, $\Pi P^r = \Pi$, so:

\[\lVert P^n-\Pi\rVert = \lVert(P^{mk}-\Pi)P^r\rVert \leq \lVert P^{mk}-\Pi\rVert= O\!\left((1-\delta)^k\right).\]

Generalizations and Doeblin’s Condition

The convergence analysis relies entirely on the decomposition $P^m = \delta \Lambda + (1 - \delta) Q$. This decomposition admits a natural probabilistic interpretation and extends to several broader settings.

Probabilistic Interpretation (Coupling). At each block of $m$ steps, with probability $\delta$, the chain “forgets” its past and transitions according to the memoryless kernel $\Lambda$; with probability $1-\delta$, it follows the residual kernel $Q$. The number of blocks until the first “reset” is geometric with parameter $\delta$, which directly gives the exponential mixing rate.

Non-Stationary Transitions. Transition matrices $P(1), P(2), \ldots$ may vary over time, so we do not expect a stationary equilibrium. However, if a uniform lower bound holds:

\[P(i, x, y) \geq \delta \lambda(y), \quad \forall\, x \in S,\; i \geq 1,\]

then each $P(i) = \delta \Lambda + (1 - \delta) Q_i$, and the same telescoping argument applies:

\[\begin{aligned} P(1)P(2) &= P(1)\bigl(\delta \Lambda + (1 - \delta)Q_2\bigr) = \delta \Lambda + (1 - \delta) P(1) Q_2 \\ &= \delta \Lambda + (1 - \delta)\delta \Lambda Q_2 + (1 - \delta)^2 Q_1 Q_2. \end{aligned}\]

In general, $\lVert P(1) \cdots P(n) - \text{(rank-1)}\rVert \to 0$ at a geometric rate, even without stationarity.

Infinite Discrete State Space. For $\lvert S\rvert = \infty$, Lemma 1 may fail (see discussion below), so the decomposition $P^m \gg 0$ cannot be guaranteed. One must instead directly assume Doeblin’s condition: there exist $m$, $\delta > 0$, and a probability measure $\lambda$ such that $P^m(x, \cdot) \geq \delta \lambda(\cdot)$ for all $x \in S$.

Continuous State Space. Doeblin’s condition generalizes naturally: assume there exist $m$, $\delta > 0$, and a probability measure $\lambda$ on $(S, \mathcal{B})$ such that

\[\mathbb{P}_x(X_m \in B) \geq \delta \lambda(B), \quad \forall\, x \in S,\; B \in \mathcal{B}.\]

Then $\lvert\mathbb{P}_x(X_n \in B) - \pi(B)\rvert = O(r^n)$ for some $r < 1$, where $\pi$ is the unique stationary measure.

Discussion on Lemma 1

Recall Lemma 1: $P$ is irreducible and aperiodic on a finite state space if and only if $\exists\, m$ such that $P^m(x,y) > 0$ for all $x,y$.

This equivalence relies critically on finiteness of the state space. For infinite state spaces, an irreducible aperiodic chain may have no $m$ for which $P^m \gg 0$.

Counterexample: Birth-death chain. Consider the chain on $\lbrace 0, 1, 2, \ldots\rbrace$ with nearest-neighbor transitions. Starting from state $s$, in any finite number of steps $m$, the chain can reach at most state $s + m$, so $P^m(s, s+m+1) = 0$ for all $m$. Hence no single $m$ makes $P^m$ uniformly positive, even though the chain is irreducible and aperiodic.

This is precisely why Doeblin’s condition must be imposed as an additional assumption in the infinite state space setting.

Asymptotic Memoryless of Markov Chains

Setup

Convergence Analysis

Generalizations and Doeblin’s Condition

Discussion on Lemma 1

CATALOG

FEATURED TAGS