Fullscreen View

The author recommends viewing the raindrop in fullsceen.
Click here to go fullscreen.
×

Raindrop Info

Title:
Random Walks in Markov Chains: Part I
Authors:
Created:
2014-Feb-18
Updated:
2014-Apr-25
Views:
2847
Pages:
185
Likes:
5
×

Copy the link below to embed!

small
medium
large
×
1

Random Walks in Markov Chains: Part I
Data Sciences Course, Jan 2014

Ramesh Hariharan

2

A Random Walk Puzzle

  1. Given an undirected cycle graph of size $n$

Ramesh Hariharan

3

A Random Walk Puzzle

  1. Given an undirected cycle graph of size $n$
  2. Start at some arbitrary node $X_0$

Ramesh Hariharan

4

A Random Walk Puzzle

  1. Given an undirected cycle graph of size $n$
  2. Start at some arbitrary node $X_0$
  3. Repeat $X_{i+1} = \left\{\begin{array} \ X_{i}+1 & w.p \frac{1}{4}\\ X_{i}-1 & w.p \frac{1}{4} \\ X_i & w.p \frac{1}{2} \end{array} \right\}$

Ramesh Hariharan

5

A Random Walk Puzzle

  1. Given an undirected cycle graph of size $n$
  2. Start at some arbitrary node $X_0$
  3. Repeat $X_{i+1} = \left\{\begin{array} \ X_{i}+1 & w.p \frac{1}{4}\\ X_{i}-1 & w.p \frac{1}{4} \\ X_i & w.p \frac{1}{2} \end{array} \right\}$
  4. After $k$ steps, for large enough $k$, what is the distribution of $X_k$?

Ramesh Hariharan

6

A Random Walk Puzzle

  1. Given an undirected cycle graph of size $n$
  2. Start at some arbitrary node $X_0$
  3. Repeat $X_{i+1} = \left\{\begin{array} \ X_{i}+1 & w.p \frac{1}{4}\\ X_{i}-1 & w.p \frac{1}{4} \\ X_i & w.p \frac{1}{2} \end{array} \right\}$
  4. After $k$ steps, for large enough $k$, what is the distribution of $X_k$?
  5. Is it uniform or non-uniform? Does it depend upon the start? When does it converge?

Ramesh Hariharan

7

Random Walks for Random Sampling

  1. Easy instances of Random Sampling
    • Pick a subset of a given set uniformly at random (u.a.r)
    • Pick a random point within an n-dim hypersphere u.a.r

Ramesh Hariharan

8

Random Walks for Random Sampling

  1. Easy instances of Random Sampling
    • Pick a subset of a given set uniformly at random (u.a.r)
    • Pick a random point within an n-dim hypersphere u.a.r
  2. Harder instances of Random Sampling: may need Random Walks
    • Pick a random subset of a given set u.a.r from among those subsets for which the total sum of items is within a given limit $k$
    • Pick a point inside an n-dim body u.a.r
    • Pick a spanning tree or matching in a graph u.a.r

Ramesh Hariharan

9

Random Walks for Spanning Trees

$T_1$

$T_2$

$T_3$

$\frac{1}{|E|-|V|+1}$

$\frac{1}{|E|-|V|+1}$

:
2. : Pick a random subset of edges; repeat until you get a spanning tree?

Ramesh Hariharan

10

Random Walks for Spanning Trees

$T_1$

$T_2$

$T_3$

$\frac{1}{|E|-|V|+1}$

$\frac{1}{|E|-|V|+1}$

:
2. : Pick a random subset of edges; repeat until you get a spanning tree? 1. : Or start with an arbitrary spanning tree

Ramesh Hariharan

11

Random Walks for Spanning Trees

$T_1$

$T_2$

$T_3$

$\frac{1}{|E|-|V|+1}$

$\frac{1}{|E|-|V|+1}$

:
2. : Pick a random subset of edges; repeat until you get a spanning tree? 1. : Or start with an arbitrary spanning tree 2. : Make a random exchange

Ramesh Hariharan

12

Random Walks for Spanning Trees

$T_1$

$T_2$

$T_3$

$\frac{1}{|E|-|V|+1}$

$\frac{1}{|E|-|V|+1}$

:
2. : Pick a random subset of edges; repeat until you get a spanning tree? 1. : Or start with an arbitrary spanning tree 2. : Make a random exchange 4. : Repeat several times (how many?)

Ramesh Hariharan

13

Random Walks for Spanning Trees

$T_1$

$T_2$

$T_3$

$\frac{1}{|E|-|V|+1}$

$\frac{1}{|E|-|V|+1}$

:
2. : Pick a random subset of edges; repeat until you get a spanning tree? 1. : Or start with an arbitrary spanning tree 2. : Make a random exchange 4. : Repeat several times (how many?) 5. : Return what you get

Ramesh Hariharan

14

Random Walks for Spanning Trees

$T_1$

$T_2$

$T_3$

$\frac{1}{|E|-|V|+1}$

$\frac{1}{|E|-|V|+1}$

Ramesh Hariharan

:
Pick a random subset of edges; repeat until you get a spanning tree?
Or start with an arbitrary spanning tree
Make a random exchange
Repeat several times (how many?)
Return what you get
Is every spanning tree equally likely?
:
15

Approximate Counting via Random Sampling

  1. Given a mixture of red and blue balls, with reds being a $p$ fraction, estimate $p$.

Ramesh Hariharan

16

Approximate Counting via Random Sampling

  1. Given a mixture of red and blue balls, with reds being a $p$ fraction, estimate $p$.
  2. Sample $n$ balls uniformly at random (with replacement).

Ramesh Hariharan

17

Approximate Counting via Random Sampling

  1. Given a mixture of red and blue balls, with reds being a $p$ fraction, estimate $p$.
  2. Sample $n$ balls uniformly at random (with replacement).
  3. $X_i=1$ if the $i$th ball is red.

Ramesh Hariharan

18

Approximate Counting via Random Sampling

  1. Given a mixture of red and blue balls, with reds being a $p$ fraction, estimate $p$.
  2. Sample $n$ balls uniformly at random (with replacement).
  3. $X_i=1$ if the $i$th ball is red.
  4. $E(\sum X_i)=np$

Ramesh Hariharan

19

Approximate Counting via Random Sampling

  1. Given a mixture of red and blue balls, with reds being a $p$ fraction, estimate $p$.
  2. Sample $n$ balls uniformly at random (with replacement).
  3. $X_i=1$ if the $i$th ball is red.
  4. $E(\sum X_i)=np$
  5. $Var(\sum X_i)=np(1-p)$

Ramesh Hariharan

20

Approximate Counting via Random Sampling

  1. Given a mixture of red and blue balls, with reds being a $p$ fraction, estimate $p$.
  2. Sample $n$ balls uniformly at random (with replacement).
  3. $X_i=1$ if the $i$th ball is red.
  4. $E(\sum X_i)=np$
  5. $Var(\sum X_i)=np(1-p)$
  6. $Pr(|\frac{\sum X_i}{n}-p|\geq \epsilon p)\leq 2e^{-\epsilon^2 \Theta({np})}$

Ramesh Hariharan

21

Approximate Counting via Random Sampling

  1. Given a mixture of red and blue balls, with reds being a $p$ fraction, estimate $p$.
  2. Sample $n$ balls uniformly at random (with replacement).
  3. $X_i=1$ if the $i$th ball is red.
  4. $E(\sum X_i)=np$
  5. $Var(\sum X_i)=np(1-p)$
  6. $Pr(|\frac{\sum X_i}{n}-p|\geq \epsilon p)\leq 2e^{-\epsilon^2 \Theta({np})}$
  7. $n=\Theta(\frac{1}{\epsilon^2p})$ ensures an $1\pm\epsilon$ estimate, with prob at least, say 0.9

Ramesh Hariharan

22

Approximate Counting via Random Sampling (contd)

:
1. : Given a convex body (specified as a membership oracle), estimate it's volume.

Ramesh Hariharan

23

Approximate Counting via Random Sampling (contd)

:
1. : Given a convex body (specified as a membership oracle), estimate it's volume. 1. : Sample randomly from a bounding box of known volume.

Ramesh Hariharan

24

Approximate Counting via Random Sampling (contd)

:
1. : Given a convex body (specified as a membership oracle), estimate it's volume. 1. : Sample randomly from a bounding box of known volume. 2. : What fraction of time does the sampled point lie inside the body?

Ramesh Hariharan

25

Approximate Counting via Random Sampling (contd)

:
1. : Given a convex body (specified as a membership oracle), estimate it's volume. 1. : Sample randomly from a bounding box of known volume. 2. : What fraction of time does the sampled point lie inside the body? 3. : A sample of size $O(\frac{1}{\epsilon^2 p})$ can estimate fractional area, within a $1\pm \epsilon$ factor.

Ramesh Hariharan

26

Approximate Counting via Random Sampling (contd)

:
1. : Given a convex body (specified as a membership oracle), estimate it's volume. 1. : Sample randomly from a bounding box of known volume. 2. : What fraction of time does the sampled point lie inside the body? 3. : A sample of size $O(\frac{1}{\epsilon^2 p})$ can estimate fractional area, within a $1\pm \epsilon$ factor. 1. Problem: $p$ can be very small

Ramesh Hariharan

27

Approximate Counting via Iterative Random Sampling

$B_1$

$B_2$

$B_3$

  1. Take balls $B_1..B_r$ of decreasing radius

Ramesh Hariharan

28

Approximate Counting via Iterative Random Sampling

$B_1$

$B_2$

$B_3$

  1. Take balls $B_1..B_r$ of decreasing radius
  2. $ B_r\subset K, K\subset B_1$ and successive ball radii are close

Ramesh Hariharan

29

Approximate Counting via Iterative Random Sampling

$B_1$

$B_2$

$B_3$

  1. Take balls $B_1..B_r$ of decreasing radius
  2. $ B_r\subset K, K\subset B_1$ and successive ball radii are close
  3. Estimate $\frac{Vol(K\cap B_i)}{Vol(K\cap B_{i-1})}$ approximately by random sampling from ${K\cap B_{i-1}}$

Ramesh Hariharan

30

Approximate Counting via Iterative Random Sampling

$B_1$

$B_2$

$B_3$

  1. Take balls $B_1..B_r$ of decreasing radius
  2. $ B_r\subset K, K\subset B_1$ and successive ball radii are close
  3. Estimate $\frac{Vol(K\cap B_i)}{Vol(K\cap B_{i-1})}$ approximately by random sampling from ${K\cap B_{i-1}}$
  4. $\frac{Vol(K\cap B_r)}{Vol(K\cap B_{r-1})}\cdots \frac{Vol(K\cap B_2)}{Vol(K\cap B_{1})} = \frac{Vol(B_r)}{Vol(K)} $

Ramesh Hariharan

31

Approximate Counting via Iterative Random Sampling

$B_1$

$B_2$

$B_3$

  1. Take balls $B_1..B_r$ of decreasing radius
  2. $ B_r\subset K, K\subset B_1$ and successive ball radii are close
  3. Estimate $\frac{Vol(K\cap B_i)}{Vol(K\cap B_{i-1})}$ approximately by random sampling from ${K\cap B_{i-1}}$
  4. $\frac{Vol(K\cap B_r)}{Vol(K\cap B_{r-1})}\cdots \frac{Vol(K\cap B_2)}{Vol(K\cap B_{1})} = \frac{Vol(B_r)}{Vol(K)} $
  5. If each ratio is approximated to an $1\pm \epsilon$ factor then the total is approximated to a $(1\pm \epsilon)^k$ factor

Ramesh Hariharan

32

Approximate Counting via Iterative Random Sampling (Contd)

$B_1$

$B_2$

$B_3$

:
2. : Estimate number $\#G$ of spanning trees in a graph $G$ by sampling spanning trees at random

Ramesh Hariharan

33

Approximate Counting via Iterative Random Sampling (Contd)

$B_1$

$B_2$

$B_3$

:
2. : Estimate number $\#G$ of spanning trees in a graph $G$ by sampling spanning trees at random 4. : Create a series of graphs $G=G_1\ldots G_n$, adding one edge at a time

Ramesh Hariharan

34

Approximate Counting via Iterative Random Sampling (Contd)

$B_1$

$B_2$

$B_3$

:
2. : Estimate number $\#G$ of spanning trees in a graph $G$ by sampling spanning trees at random 4. : Create a series of graphs $G=G_1\ldots G_n$, adding one edge at a time 5. : $G_n$ is the complete graph, for which the number of spanning trees is known

Ramesh Hariharan

35

Approximate Counting via Iterative Random Sampling (Contd)

$B_1$

$B_2$

$B_3$

:
2. : Estimate number $\#G$ of spanning trees in a graph $G$ by sampling spanning trees at random 4. : Create a series of graphs $G=G_1\ldots G_n$, adding one edge at a time 5. : $G_n$ is the complete graph, for which the number of spanning trees is known 6. : Use random sampling from $G_{i+1}$ to estimate $\frac{\#G_i}{\#G_{i+1}}$

Ramesh Hariharan

36

Approximate Counting via Iterative Random Sampling (Contd)

$B_1$

$B_2$

$B_3$

:
2. : Estimate number $\#G$ of spanning trees in a graph $G$ by sampling spanning trees at random 4. : Create a series of graphs $G=G_1\ldots G_n$, adding one edge at a time 5. : $G_n$ is the complete graph, for which the number of spanning trees is known 6. : Use random sampling from $G_{i+1}$ to estimate $\frac{\#G_i}{\#G_{i+1}}$ 7. : Then use $\#G= \Pi_{i=1}^{n-1} \frac{\#G_i}{\#G_{i+1}} \times \#G_n$

Ramesh Hariharan

37

Approximate Counting via Iterative Random Sampling (Contd)

$B_1$

$B_2$

$B_3$

Ramesh Hariharan

:
Estimate number $#G$ of spanning trees in a graph $G$ by sampling spanning trees at random
Create a series of graphs $G=G_1\ldots G_n$, adding one edge at a time
$G_n$ is the complete graph, for which the number of spanning trees is known
Use random sampling from $G_{i+1}$ to estimate $\frac{#G_i}{#G_{i+1}}$
Then use $#G= \Pi_{i=1}^{n-1} \frac{#G_i}{#G_{i+1}} \times #G_n$
Note: Cleverer methods to count exactly are known via determinant of the Laplacian
:
38

Random Walks in Markov Chains

:
1. : A collection of states $1\ldots N$

Ramesh Hariharan

39

Random Walks in Markov Chains

Ramesh Hariharan

:
A collection of states $1\ldots N$
And probabilistic transitions between states
    Prob of transitioning from state $i$ to state $j$ is $p_{ij}$
    Independent of the history prior to reaching state $i$
    Of course, $\sum_j p_{ij}=1$
40

Random Walks in Markov Chains

Ramesh Hariharan

:
A collection of states $1\ldots N$
And probabilistic transitions between states
    Prob of transitioning from state $i$ to state $j$ is $p_{ij}$
    Independent of the history prior to reaching state $i$
    Of course, $\sum_j p_{ij}=1$ : A random walk follows these transitions for $k$ steps
41

Random Walks in Markov Chains

Ramesh Hariharan

:
A collection of states $1\ldots N$
And probabilistic transitions between states
    Prob of transitioning from state $i$ to state $j$ is $p_{ij}$
    Independent of the history prior to reaching state $i$
    Of course, $\sum_j p_{ij}=1$ : A random walk follows these transitions for $k$ steps : What is the probability of ending up in a particular state?
    Does it depend on where you start? On $k$?
:
42

The Transition Matrix for a Markov Chain

Ramesh Hariharan

:
$P =\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,M} \\
p_{2,1} & p_{2,2} & \cdots & p_{2,M} \\
\vdots & \vdots & \ddots & \vdots \\
p_{M,1} & p_{M,2} & \cdots & p_{M,M}
\end{pmatrix}$
43

The Transition Matrix for a Markov Chain

Ramesh Hariharan

:
$P =\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,M} \\
p_{2,1} & p_{2,2} & \cdots & p_{2,M} \\
\vdots & \vdots & \ddots & \vdots \\
p_{M,1} & p_{M,2} & \cdots & p_{M,M}
\end{pmatrix}$ 2. : All row sums are 1
44

The Transition Matrix for a Markov Chain

Ramesh Hariharan

:
$P =\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,M} \\
p_{2,1} & p_{2,2} & \cdots & p_{2,M} \\
\vdots & \vdots & \ddots & \vdots \\
p_{M,1} & p_{M,2} & \cdots & p_{M,M}
\end{pmatrix}$ 2. : All row sums are 1 3. : If you start with an initial prob distribution vector $v$ and take one random walk step, the new distribution is $vP$
45

The Transition Matrix for a Markov Chain

Ramesh Hariharan

:
$P =\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,M} \\
p_{2,1} & p_{2,2} & \cdots & p_{2,M} \\
\vdots & \vdots & \ddots & \vdots \\
p_{M,1} & p_{M,2} & \cdots & p_{M,M}
\end{pmatrix}$ 2. : All row sums are 1 3. : If you start with an initial prob distribution vector $v$ and take one random walk step, the new distribution is $vP$ 4. : After $k$ steps, it is $vP^k$
46

The Transition Matrix for a Markov Chain

Ramesh Hariharan

:
$P =\begin{pmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,M} \\
p_{2,1} & p_{2,2} & \cdots & p_{2,M} \\
\vdots & \vdots & \ddots & \vdots \\
p_{M,1} & p_{M,2} & \cdots & p_{M,M}
\end{pmatrix}$ 2. : All row sums are 1 3. : If you start with an initial prob distribution vector $v$ and take one random walk step, the new distribution is $vP$ 4. : After $k$ steps, it is $vP^k$ 5. : Convergence? To what?
:
47

Norm Definitions

Ramesh Hariharan

:
For vector $v$
    $|v|=\sum_i |v_i|$
    $||v||=\sqrt{\sum_i v_i^2}$
    $|v|^2\leq M||v||^2$ by Jensen's Inequality
:
48

The Stationary Distribution of a Markov Chain

:
1. : If the underlying graph is strongly connected, then there exists a unique distribution vector $\pi$ such that $\pi P=\pi$

Ramesh Hariharan

49

The Stationary Distribution of a Markov Chain

Ramesh Hariharan

:
If the underlying graph is strongly connected, then there exists a unique distribution vector $\pi$ such that $\pi P=\pi$
Proof
    $P-I$ is not full rank because $(P-I)\boldsymbol{1}^T=0$
50

The Stationary Distribution of a Markov Chain

Ramesh Hariharan

:
If the underlying graph is strongly connected, then there exists a unique distribution vector $\pi$ such that $\pi P=\pi$
Proof
    $P-I$ is not full rank because $(P-I)\boldsymbol{1}^T=0$
    $P-I$ has rank $M-1$ (proof needed, and uses strong connectivity)
51

The Stationary Distribution of a Markov Chain

Ramesh Hariharan

:
If the underlying graph is strongly connected, then there exists a unique distribution vector $\pi$ such that $\pi P=\pi$
Proof
    $P-I$ is not full rank because $(P-I)\boldsymbol{1}^T=0$
    $P-I$ has rank $M-1$ (proof needed, and uses strong connectivity)
    (Left) Null space of $P-I$ is 1-dimensional
52

The Stationary Distribution of a Markov Chain

Ramesh Hariharan

:
If the underlying graph is strongly connected, then there exists a unique distribution vector $\pi$ such that $\pi P=\pi$
Proof
    $P-I$ is not full rank because $(P-I)\boldsymbol{1}^T=0$
    $P-I$ has rank $M-1$ (proof needed, and uses strong connectivity)
    (Left) Null space of $P-I$ is 1-dimensional
    There is a unique vector $x$ in this space satisfying $\sum_i |x_i|=1$
53

The Stationary Distribution of a Markov Chain

Ramesh Hariharan

:
If the underlying graph is strongly connected, then there exists a unique distribution vector $\pi$ such that $\pi P=\pi$
Proof
    $P-I$ is not full rank because $(P-I)\boldsymbol{1}^T=0$
    $P-I$ has rank $M-1$ (proof needed, and uses strong connectivity)
    (Left) Null space of $P-I$ is 1-dimensional
    There is a unique vector $x$ in this space satisfying $\sum_i |x_i|=1$
    The entries of this vector are non-negative (proof needed)
:
54

The Stationary Distribution of a Markov Chain (Contd)

:
1. : Proof that $P-I$ has rank $M-1$

Ramesh Hariharan

55

The Stationary Distribution of a Markov Chain (Contd)

:
1. : Proof that $P-I$ has rank $M-1$ 2. Suppose not: then (right) null space is at least 2-d

Ramesh Hariharan

56

The Stationary Distribution of a Markov Chain (Contd)

:
1. : Proof that $P-I$ has rank $M-1$ 2. Suppose not: then (right) null space is at least 2-d 3. : $\boldsymbol{1}^T$ is in the (right) null space

Ramesh Hariharan

57

The Stationary Distribution of a Markov Chain (Contd)

:
1. : Proof that $P-I$ has rank $M-1$ 2. Suppose not: then (right) null space is at least 2-d 3. : $\boldsymbol{1}^T$ is in the (right) null space 4. : Then there exists vector $x$ so $x\cdot \boldsymbol{1}^T=0$ and $(P-I)x^T=0$

Ramesh Hariharan

58

The Stationary Distribution of a Markov Chain (Contd)

:
1. : Proof that $P-I$ has rank $M-1$ 2. Suppose not: then (right) null space is at least 2-d 3. : $\boldsymbol{1}^T$ is in the (right) null space 4. : Then there exists vector $x$ so $x\cdot \boldsymbol{1}^T=0$ and $(P-I)x^T=0$ 5. : Let $x_{max}=\max_i x_i$ and $S=\{i|x_i=x_{max}\}$; not all vertices are in $S$ 6. : By strong connectivity, there exists $i\in S$, $j\not \in S$ with $p_{ij}>0$ (so $x_j$<$x_{max}=x_i$)

Ramesh Hariharan

59

The Stationary Distribution of a Markov Chain (Contd)

:
1. : Proof that $P-I$ has rank $M-1$ 2. Suppose not: then (right) null space is at least 2-d 3. : $\boldsymbol{1}^T$ is in the (right) null space 4. : Then there exists vector $x$ so $x\cdot \boldsymbol{1}^T=0$ and $(P-I)x^T=0$ 5. : Let $x_{max}=\max_i x_i$ and $S=\{i|x_i=x_{max}\}$; not all vertices are in $S$ 6. : By strong connectivity, there exists $i\in S$, $j\not \in S$ with $p_{ij}>0$ (so $x_j$<$x_{max}=x_i$) 7. : $(Px)_i=\sum_k p_{ik}x_k$ is a convex combination of the $x_k$'s.

Ramesh Hariharan

60

The Stationary Distribution of a Markov Chain (Contd)

Ramesh Hariharan

:
Proof that $P-I$ has rank $M-1$
Suppose not: then (right) null space is at least 2-d
$\boldsymbol{1}^T$ is in the (right) null space
Then there exists vector $x$ so $x\cdot \boldsymbol{1}^T=0$ and $(P-I)x^T=0$
Let $x_{max}=\max_i x_i$ and $S={i|x_i=x_{max}}$; not all vertices are in $S$
By strong connectivity, there exists $i\in S$, $j\not \in S$ with $p_{ij}>0$ (so $x_j$
$(Px)_i=\sum_k p_{ik}x_k$ is a convex combination of the $x_k$'s.
So $(Px)_i=\sum_k p_{ik}x_k< x_{max}=x_i$, a contradiction
:
61

The Stationary Distribution of a Markov Chain (Contd)

:
1. : Proof that if $\pi(P-I)=0$ then $\pi$ has only non-negative entries

Ramesh Hariharan

62

The Stationary Distribution of a Markov Chain (Contd)

Ramesh Hariharan

:
Proof that if $\pi(P-I)=0$ then $\pi$ has only non-negative entries
    Suppose there is a $\pi$ with entries of both signs satisfying $\pi P=\pi$ and scaled to satisfy $|\pi|=1$
63

The Stationary Distribution of a Markov Chain (Contd)

Ramesh Hariharan

:
Proof that if $\pi(P-I)=0$ then $\pi$ has only non-negative entries
    Suppose there is a $\pi$ with entries of both signs satisfying $\pi P=\pi$ and scaled to satisfy $|\pi|=1$
    $\pi P^k=\pi$ for all $k$
64

The Stationary Distribution of a Markov Chain (Contd)

Ramesh Hariharan

:
Proof that if $\pi(P-I)=0$ then $\pi$ has only non-negative entries
    Suppose there is a $\pi$ with entries of both signs satisfying $\pi P=\pi$ and scaled to satisfy $|\pi|=1$
    $\pi P^k=\pi$ for all $k$
    For large enough $k$, $\sum_0^k \frac{P^k}{k}$ is a complete graph on account of strong connectivity; let $q_{ij}$ denote its transitions
65

The Stationary Distribution of a Markov Chain (Contd)

Ramesh Hariharan

:
Proof that if $\pi(P-I)=0$ then $\pi$ has only non-negative entries
    Suppose there is a $\pi$ with entries of both signs satisfying $\pi P=\pi$ and scaled to satisfy $|\pi|=1$
    $\pi P^k=\pi$ for all $k$
    For large enough $k$, $\sum_0^k \frac{P^k}{k}$ is a complete graph on account of strong connectivity; let $q_{ij}$ denote its transitions
    $1=|\pi|=|\pi \sum_0^k \frac{P^k}{k}|=\sum_j |\sum_i \pi_i q_{ij}|<\sum_j \sum_i |\pi_i| q_{ij}\\=\sum_i (|\pi_i| \sum_j q_{ij})=\sum_i (|\pi_i|)=1 $
66

The Stationary Distribution of a Markov Chain (Contd)

Ramesh Hariharan

:
Proof that if $\pi(P-I)=0$ then $\pi$ has only non-negative entries
    Suppose there is a $\pi$ with entries of both signs satisfying $\pi P=\pi$ and scaled to satisfy $|\pi|=1$
    $\pi P^k=\pi$ for all $k$
    For large enough $k$, $\sum_0^k \frac{P^k}{k}$ is a complete graph on account of strong connectivity; let $q_{ij}$ denote its transitions
    $1=|\pi|=|\pi \sum_0^k \frac{P^k}{k}|=\sum_j |\sum_i \pi_i q_{ij}|<\sum_j \sum_i |\pi_i| q_{ij}\\=\sum_i (|\pi_i| \sum_j q_{ij})=\sum_i (|\pi_i|)=1 $
    Contradiction. :
:
67

Finding the Stationary Distribution

  1. Is $\pi=\lim_{k\rightarrow \infty} vP^k$ a candidate for an arbitrary initial distribution $v$?

Ramesh Hariharan

68

Finding the Stationary Distribution

  1. Is $\pi=\lim_{k\rightarrow \infty} vP^k$ a candidate for an arbitrary initial distribution $v$?
  2. I.e., is $\pi P=\pi$?

Ramesh Hariharan

69

Finding the Stationary Distribution

  1. Is $\pi=\lim_{k\rightarrow \infty} vP^k$ a candidate for an arbitrary initial distribution $v$?
  2. I.e., is $\pi P=\pi$?
  3. In other words, is $\lim_{k\rightarrow \infty} |v(P^{k+1}-P^k)| =0$?

Ramesh Hariharan

70

Finding the Stationary Distribution

  1. Is $\pi=\lim_{k\rightarrow \infty} vP^k$ a candidate for an arbitrary initial distribution $v$?
  2. I.e., is $\pi P=\pi$?
  3. In other words, is $\lim_{k\rightarrow \infty} |v(P^{k+1}-P^k)| =0$?
  4. I.e., can one just start with an arbitrary distribution and then take an infinitely long walk to identify the stationary distribution?

Ramesh Hariharan

71

Finding the Stationary Distribution

  1. Is $\pi=\lim_{k\rightarrow \infty} vP^k$ a candidate for an arbitrary initial distribution $v$?
  2. I.e., is $\pi P=\pi$?
  3. In other words, is $\lim_{k\rightarrow \infty} |v(P^{k+1}-P^k)| =0$?
  4. I.e., can one just start with an arbitrary distribution and then take an infinitely long walk to identify the stationary distribution?
  5. Periodicity is a problem.

$1$

$1$

Ramesh Hariharan

72

Finding the Stationary Distribution despite Periodicity

  1. Aggregate over a long enough history

Ramesh Hariharan

73

Finding the Stationary Distribution despite Periodicity

  1. Aggregate over a long enough history
  2. How about $\pi=\lim_{k\rightarrow \infty} \sum_{i=0}^k \frac{vP^i}{k}$?

Ramesh Hariharan

74

Finding the Stationary Distribution despite Periodicity

  1. Aggregate over a long enough history
  2. How about $\pi=\lim_{k\rightarrow \infty} \sum_{i=0}^k \frac{vP^i}{k}$?
  3. $\lim_{k\rightarrow \infty} |\sum_{i=0}^k \frac{v(P^{i+1}-P^i)}{k}|\\ = \lim_{k\rightarrow \infty} |\frac{v(P^{k+1}-P^0)}{k}| \\ \leq \lim_{k\rightarrow \infty} \frac{2}{k}=0$

$1$

$1$

Ramesh Hariharan

75

Random Sampling from the Stationary Distribution $\pi$

:
1. : Start from an arbitrary state (described by probability vector $v$)

Ramesh Hariharan

76

Random Sampling from the Stationary Distribution $\pi$

:
1. : Start from an arbitrary state (described by probability vector $v$) 2. : Random walk for $k$ steps

Ramesh Hariharan

77

Random Sampling from the Stationary Distribution $\pi$

:
1. : Start from an arbitrary state (described by probability vector $v$) 2. : Random walk for $k$ steps 3. : Pick a time step $i$ uniformly at random from 0 to $k$

Ramesh Hariharan

78

Random Sampling from the Stationary Distribution $\pi$

:
1. : Start from an arbitrary state (described by probability vector $v$) 2. : Random walk for $k$ steps 3. : Pick a time step $i$ uniformly at random from 0 to $k$ 3. : Return the state visited at time step $i$

Ramesh Hariharan

79

Random Sampling from the Stationary Distribution $\pi$

:
1. : Start from an arbitrary state (described by probability vector $v$) 2. : Random walk for $k$ steps 3. : Pick a time step $i$ uniformly at random from 0 to $k$ 3. : Return the state visited at time step $i$ 4. : Probability of picking state $s$ is $\sum_{i=0}^k \frac{(vP^i)_s}{k}$

Ramesh Hariharan

80

Random Sampling from the Stationary Distribution $\pi$

:
1. : Start from an arbitrary state (described by probability vector $v$) 2. : Random walk for $k$ steps 3. : Pick a time step $i$ uniformly at random from 0 to $k$ 3. : Return the state visited at time step $i$ 4. : Probability of picking state $s$ is $\sum_{i=0}^k \frac{(vP^i)s}{k}$ 5. : $\sum{i=0}^k \frac{(vP^i)_s}{k} \rightarrow \pi_s$ as $k\rightarrow \infty$

Ramesh Hariharan

81

Random Sampling from the Stationary Distribution $\pi$

Ramesh Hariharan

:
Start from an arbitrary state (described by probability vector $v$)
Random walk for $k$ steps
Pick a time step $i$ uniformly at random from 0 to $k$
Return the state visited at time step $i$
Probability of picking state $s$ is $\sum_{i=0}^k \frac{(vP^i)s}{k}$
$\sum{i=0}^k \frac{(vP^i)_s}{k} \rightarrow \pi_s$ as $k\rightarrow \infty$
How fast? How large does $k$ need to be for $|\sum_{i=0}^k \frac{vP^i}{k}-\pi|\leq 2\epsilon$?
:
82

Removing Periodicity

:
0. : Add self loops (say prob $\frac{1}{2}$)

Ramesh Hariharan

83

Removing Periodicity

:
0. : Add self loops (say prob $\frac{1}{2}$) 1. : $\frac{P+I}{2}$ instead of $P$

Ramesh Hariharan

84

Removing Periodicity

:
0. : Add self loops (say prob $\frac{1}{2}$) 1. : $\frac{P+I}{2}$ instead of $P$ 2. : Does $\pi=\lim_{k\rightarrow \infty} v(\frac{P+I}{2})^k$ work now for arbitrary starting $v$?

Ramesh Hariharan

85

Removing Periodicity

:
0. : Add self loops (say prob $\frac{1}{2}$) 1. : $\frac{P+I}{2}$ instead of $P$ 2. : Does $\pi=\lim_{k\rightarrow \infty} v(\frac{P+I}{2})^k$ work now for arbitrary starting $v$? 3. : $\lim_{k\rightarrow \infty} |v(\frac{P+I}{2})^{k+1}-v(\frac{P+I}{2})^k|=0$?

Ramesh Hariharan

86

Removing Periodicity

:
0. : Add self loops (say prob $\frac{1}{2}$) 1. : $\frac{P+I}{2}$ instead of $P$ 2. : Does $\pi=\lim_{k\rightarrow \infty} v(\frac{P+I}{2})^k$ work now for arbitrary starting $v$? 3. : $\lim_{k\rightarrow \infty} |v(\frac{P+I}{2})^{k+1}-v(\frac{P+I}{2})^k|=0$? 4. : We will see indirect proofs on subsequent slides

$\frac{1}{2}$

$\frac{1}{2}$

$\frac{1}{2}$

$\frac{1}{2}$

$\frac{1}{2}$

Ramesh Hariharan

]]

87

Proving Convergence via the Coupling Method

:
0. : Run two copies $X,Y$ of the chain in parallel

Ramesh Hariharan

88

Proving Convergence via the Coupling Method

:
0. : Run two copies $X,Y$ of the chain in parallel 1. : $X$ starts from an arbitrary distribution

Ramesh Hariharan

89

Proving Convergence via the Coupling Method

:
0. : Run two copies $X,Y$ of the chain in parallel 1. : $X$ starts from an arbitrary distribution 2. : $Y$ starts from the stationary distribution

Ramesh Hariharan

90

Proving Convergence via the Coupling Method

:
0. : Run two copies $X,Y$ of the chain in parallel 1. : $X$ starts from an arbitrary distribution 2. : $Y$ starts from the stationary distribution 1. : $X$ and $Y$ are coupled, moves in one influence moves in the other

Ramesh Hariharan

91

Proving Convergence via the Coupling Method

:
0. : Run two copies $X,Y$ of the chain in parallel 1. : $X$ starts from an arbitrary distribution 2. : $Y$ starts from the stationary distribution 1. : $X$ and $Y$ are coupled, moves in one influence moves in the other 2. : But if you ignore the other copy, then each copy does what it normally would!

Ramesh Hariharan

92

Proving Convergence via the Coupling Method

:
0. : Run two copies $X,Y$ of the chain in parallel 1. : $X$ starts from an arbitrary distribution 2. : $Y$ starts from the stationary distribution 1. : $X$ and $Y$ are coupled, moves in one influence moves in the other 2. : But if you ignore the other copy, then each copy does what it normally would! 3. : Can we design a coupling so that $Pr(X_t=Y_t)$ increases with time time?

Ramesh Hariharan

93

The Coupling Rule

:
0. : Let $X_t=i$, $Y_t=j$

Ramesh Hariharan

94

The Coupling Rule

:
0. : Let $X_t=i$, $Y_t=j$ 0. : If $i=j$, then $X_{t+1}=Y_{t+1}$ ($X$ makes the same move as $Y$). Otherwise: * with prob. $\frac{1}{2}$: $\left\{\begin{array} \ X_{t+1}=i\\ \ Y_{t+1}=k & \mbox{ w.p } {p_{jk}} \end{array} \right\}$ * with prob. $\frac{1}{2}$: $\left\{\begin{array} \ X_{t+1}=k &\mbox{ w.p } {p_{ik}}\\ \ Y_{t+1}=j \end{array} \right\}$

Ramesh Hariharan

95

The Coupling Rule

:
0. : Let $X_t=i$, $Y_t=j$ 0. : If $i=j$, then $X_{t+1}=Y_{t+1}$ ($X$ makes the same move as $Y$). Otherwise: * with prob. $\frac{1}{2}$: $\left\{\begin{array} \ X_{t+1}=i\\ \ Y_{t+1}=k & \mbox{ w.p } {p_{jk}} \end{array} \right\}$ * with prob. $\frac{1}{2}$: $\left\{\begin{array} \ X_{t+1}=k &\mbox{ w.p } {p_{ik}}\\ \ Y_{t+1}=j \end{array} \right\}$ 1. : Note $Pr(X_{t+1}=k|X_t=i)=\left\{\begin{array} \ \frac{p_{ik}}{2} & \mbox{ if } k\not=i \\ \frac{1+p_{ii}}{2} & \mbox{ if } k=i \end{array}\right\}$, as desired; ditto for $Y$

Ramesh Hariharan

96

Analysing the Coupling Rule to show Convergence

:
1. : Let $\Delta$ be the diameter of the underlying graph (max path length between any two nodes; recall strong connectivity)

Ramesh Hariharan

97

Analysing the Coupling Rule to show Convergence

:
1. : Let $\Delta$ be the diameter of the underlying graph (max path length between any two nodes; recall strong connectivity) 0. : If $X_t=Y_t$, then $X_{t'}=Y_{t'}, \forall t'>t$

Ramesh Hariharan

98

Analysing the Coupling Rule to show Convergence

:
1. : Let $\Delta$ be the diameter of the underlying graph (max path length between any two nodes; recall strong connectivity) 0. : If $X_t=Y_t$, then $X_{t'}=Y_{t'}, \forall t'>t$ 1. : Suppose $X_t\not=Y_t$

Ramesh Hariharan

99

Analysing the Coupling Rule to show Convergence

Ramesh Hariharan

:
Let $\Delta$ be the diameter of the underlying graph (max path length between any two nodes; recall strong connectivity)
If $X_t=Y_t$, then $X_{t'}=Y_{t'}, \forall t'>t$
Suppose $X_t\not=Y_t$
    Then with some non-zero probability, $X_{t+\Delta}=Y_{t+\Delta}$; Why?
100

Analysing the Coupling Rule to show Convergence

: 1. : Let ~D\Delta~D be the diameter of the underlying graph (max path length between any two nodes; recall strong connectivity) 0. : If ~DX_t=Y_t~D, then ~DX_{t'}=Y_{t'}, \forall t'>t~D 1. : Suppose ~DX_t\not=Y_t~D
    Then with some non-zero probability, $X_{t+\Delta}=Y_{t+\Delta}$; Why?
    There is at least one coupled trajectory of length at most $\Delta$ with non-zero probability which keeps $Y$ at $Y_t$ (the self loop) and moves $X_t$ to $Y_t$ (strong connectivity)

Ramesh Hariharan

101

Analysing the Coupling Rule to show Convergence (Contd)

:
0. : $Pr(X_{t+\Delta}= Y_{t+\Delta})= Pr(X_{t}=Y_{t})\cdot 1 + Pr(X_{t}\not =Y_{t})\cdot r$

Ramesh Hariharan

102

Analysing the Coupling Rule to show Convergence (Contd)

:
0. : $Pr(X_{t+\Delta}= Y_{t+\Delta})= Pr(X_{t}=Y_{t})\cdot 1 + Pr(X_{t}\not =Y_{t})\cdot r$ 1. : Here, $0 < r\leq 1$ and $r$ does not depend on $t$. Rewriting:

Ramesh Hariharan

103

Analysing the Coupling Rule to show Convergence (Contd)

:
0. : $Pr(X_{t+\Delta}= Y_{t+\Delta})= Pr(X_{t}=Y_{t})\cdot 1 + Pr(X_{t}\not =Y_{t})\cdot r$ 1. : Here, $0 < r\leq 1$ and $r$ does not depend on $t$. Rewriting: 1. : $Pr(X_{t+\Delta}= Y_{t+\Delta})- Pr(X_{t}=Y_{t}) = (1-Pr(X_{t}=Y_{t}))\cdot r$

Ramesh Hariharan

104

Analysing the Coupling Rule to show Convergence (Contd)

:
0. : $Pr(X_{t+\Delta}= Y_{t+\Delta})= Pr(X_{t}=Y_{t})\cdot 1 + Pr(X_{t}\not =Y_{t})\cdot r$ 1. : Here, $0 < r\leq 1$ and $r$ does not depend on $t$. Rewriting: 1. : $Pr(X_{t+\Delta}= Y_{t+\Delta})- Pr(X_{t}=Y_{t}) = (1-Pr(X_{t}=Y_{t}))\cdot r$ 1. : $Pr(X_t=Y_t)= 1-e^{\frac{t \cdot \ln{(1-r)}}{\Delta}}$

Ramesh Hariharan

105

Analysing the Coupling Rule to show Convergence (Contd)

:
0. : $Pr(X_{t+\Delta}= Y_{t+\Delta})= Pr(X_{t}=Y_{t})\cdot 1 + Pr(X_{t}\not =Y_{t})\cdot r$ 1. : Here, $0 < r\leq 1$ and $r$ does not depend on $t$. Rewriting: 1. : $Pr(X_{t+\Delta}= Y_{t+\Delta})- Pr(X_{t}=Y_{t}) = (1-Pr(X_{t}=Y_{t}))\cdot r$ 1. : $Pr(X_t=Y_t)= 1-e^{\frac{t \cdot \ln{(1-r)}}{\Delta}}$ 1. : $\lim_{t\rightarrow \infty} Pr(X_{t}=Y_{t})=1$

Ramesh Hariharan

106

How Fast does the Coupling Rule Converge?

:
1. : $Pr(X_t=Y_t)= 1-e^{\frac{t \cdot \ln (1-r)}{\Delta}}$

Ramesh Hariharan

107

How Fast does the Coupling Rule Converge?

:
1. : $Pr(X_t=Y_t)= 1-e^{\frac{t \cdot \ln (1-r)}{\Delta}}$ 1. : At $t>\frac{\Delta \ln(\epsilon)}{\ln(1-r)} \sim -\frac{\Delta \ln(\epsilon)}{r} $, $Pr(X_t=Y_t)>1-\epsilon$

Ramesh Hariharan

108

How Fast does the Coupling Rule Converge?

Ramesh Hariharan

:
$Pr(X_t=Y_t)= 1-e^{\frac{t \cdot \ln (1-r)}{\Delta}}$
At $t>\frac{\Delta \ln(\epsilon)}{\ln(1-r)} \sim -\frac{\Delta \ln(\epsilon)}{r} $, $Pr(X_t=Y_t)>1-\epsilon$
Often
    the number of states $M$ is exponential in some parameter $n$
    diameter $\Delta$ is polynomial in $n$
109

How Fast does the Coupling Rule Converge?

Ramesh Hariharan

:
$Pr(X_t=Y_t)= 1-e^{\frac{t \cdot \ln (1-r)}{\Delta}}$
At $t>\frac{\Delta \ln(\epsilon)}{\ln(1-r)} \sim -\frac{\Delta \ln(\epsilon)}{r} $, $Pr(X_t=Y_t)>1-\epsilon$
Often
    the number of states $M$ is exponential in some parameter $n$
    diameter $\Delta$ is polynomial in $n$
    if (a big if) $r$ is not too small, i.e., at least inverse polynomial in $n$
110

How Fast does the Coupling Rule Converge?

Ramesh Hariharan

:
$Pr(X_t=Y_t)= 1-e^{\frac{t \cdot \ln (1-r)}{\Delta}}$
At $t>\frac{\Delta \ln(\epsilon)}{\ln(1-r)} \sim -\frac{\Delta \ln(\epsilon)}{r} $, $Pr(X_t=Y_t)>1-\epsilon$
Often
    the number of states $M$ is exponential in some parameter $n$
    diameter $\Delta$ is polynomial in $n$
    if (a big if) $r$ is not too small, i.e., at least inverse polynomial in $n$
    then the time to get to $Pr(X_t=Y_t)>1-\epsilon$ is polynomial in $n$
111

Does a good Coupling imply closeness to Stationary Distribution?

:
1. : $Pr(X_t=Y_t)>1-\epsilon \implies \sum_i |Pr(X_t=i)-\pi_i|\leq 2\epsilon$

Ramesh Hariharan

112

Does a good Coupling imply closeness to Stationary Distribution?

: 1. : ~DPr(X_t=Y_t)>1-\epsilon \implies \sum_i |Pr(X_t=i)-\pi_i|\leq 2\epsilon~D
    $\sum_i |Pr(X_t=i)-\pi_i| \\ = 2\sum_{i\in S=\{i|Pr(X_t=i)>\pi_i\}} (Pr(X_t=i)-\pi_i) \\ = : 2\sum_{i\in S} (\sum_j Pr(X_t=i \land Y_t=j)-\pi_i)\\ \leq : 2\epsilon + 2\sum_{i\in S} (Pr(X_t=i \land Y_t=i)-\pi_i)\\ \leq : 2\epsilon + 2\sum_{i\in S} (Pr(Y_t=i)-\pi_i)\\ =2\epsilon : $ : :

Ramesh Hariharan

113

An Algebraic look at Convergence (or Mixing)

:
1. : Assume $P$ is strongly connected, aperiodic so $\pi= \lim_{k\rightarrow \infty} vP^k$ for any starting distribution $v$

Ramesh Hariharan

114

An Algebraic look at Convergence (or Mixing)

Ramesh Hariharan

:
Assume $P$ is strongly connected, aperiodic so $\pi= \lim_{k\rightarrow \infty} vP^k$ for any starting distribution $v$
Write $v=\pi+\delta$ where $\sum_i \delta_i=0$
    $|\delta|=\sum_i |\delta_i|\leq 2$ and $||\delta||^2={\sum_i \delta_i^2}\leq 2$
115

An Algebraic look at Convergence (or Mixing)

Ramesh Hariharan

:
Assume $P$ is strongly connected, aperiodic so $\pi= \lim_{k\rightarrow \infty} vP^k$ for any starting distribution $v$
Write $v=\pi+\delta$ where $\sum_i \delta_i=0$
    $|\delta|=\sum_i |\delta_i|\leq 2$ and $||\delta||^2={\sum_i \delta_i^2}\leq 2$ : $vP^k=\pi P^k+\delta P^k=\pi+\delta P^k$
116

An Algebraic look at Convergence (or Mixing)

Ramesh Hariharan

:
Assume $P$ is strongly connected, aperiodic so $\pi= \lim_{k\rightarrow \infty} vP^k$ for any starting distribution $v$
Write $v=\pi+\delta$ where $\sum_i \delta_i=0$
    $|\delta|=\sum_i |\delta_i|\leq 2$ and $||\delta||^2={\sum_i \delta_i^2}\leq 2$ : $vP^k=\pi P^k+\delta P^k=\pi+\delta P^k$ : $|\delta P^k|=\sum_j |\sum_i \delta_i q_{ij}|<\sum_j \sum_i |\delta_i| q_{ij}=|\delta|$
117

An Algebraic look at Convergence (or Mixing)

Ramesh Hariharan

:
Assume $P$ is strongly connected, aperiodic so $\pi= \lim_{k\rightarrow \infty} vP^k$ for any starting distribution $v$
Write $v=\pi+\delta$ where $\sum_i \delta_i=0$
    $|\delta|=\sum_i |\delta_i|\leq 2$ and $||\delta||^2={\sum_i \delta_i^2}\leq 2$ : $vP^k=\pi P^k+\delta P^k=\pi+\delta P^k$ : $|\delta P^k|=\sum_j |\sum_i \delta_i q_{ij}|<\sum_j \sum_i |\delta_i| q_{ij}=|\delta|$ : Mixing rate depends on how fast $|\delta P^k|$ vanishes.
118

Some facts on Eigenvalues of $P$

:
2. : $\pi P=\pi$, so $\pi$ is the (unique, modulo scaling, left) eigenvector for $P$ with eigenvalue 1

Ramesh Hariharan

119

Some facts on Eigenvalues of $P$

:
2. : $\pi P=\pi$, so $\pi$ is the (unique, modulo scaling, left) eigenvector for $P$ with eigenvalue 1 4. : If $\lambda\not=1$ is a (possibly complex) eigenvalue, then $|\lambda|<1$ and the corresponding eigenvector $y$ satisfies $\sum_0 y_i=0$

Ramesh Hariharan

120

Some facts on Eigenvalues of $P$

Ramesh Hariharan

:
$\pi P=\pi$, so $\pi$ is the (unique, modulo scaling, left) eigenvector for $P$ with eigenvalue 1
If $\lambda\not=1$ is a (possibly complex) eigenvalue, then $|\lambda|<1$ and the corresponding eigenvector $y$ satisfies $\sum_0 y_i=0$
    $\lambda y\boldsymbol{1}^T = yP\boldsymbol{1}^T=y\boldsymbol{1}^T \\ \implies y\boldsymbol{1}^T=0 $
121

Some facts on Eigenvalues of $P$

Ramesh Hariharan

:
$\pi P=\pi$, so $\pi$ is the (unique, modulo scaling, left) eigenvector for $P$ with eigenvalue 1
If $\lambda\not=1$ is a (possibly complex) eigenvalue, then $|\lambda|<1$ and the corresponding eigenvector $y$ satisfies $\sum_0 y_i=0$
    $\lambda y\boldsymbol{1}^T = yP\boldsymbol{1}^T=y\boldsymbol{1}^T \\ \implies y\boldsymbol{1}^T=0 $
    $|\lambda^k y|=|yP^k|=\sum_j |\sum_i y_i q_{ij}|<\sum_j \sum_i |y_i| q_{ij}=|y|\\ \implies |\lambda|<1$
122

Diagonalizability of $P$

Ramesh Hariharan

:
: Suppose $P$ is diagonalizable
123

Diagonalizability of $P$

Ramesh Hariharan

:
: Suppose $P$ is diagonalizable
    $P=R\Lambda L$
124

Diagonalizability of $P$

Ramesh Hariharan

:
: Suppose $P$ is diagonalizable
    $P=R\Lambda L$
    $\Lambda$ is a diagonal matrix of the eigenvalues of $P$
125

Diagonalizability of $P$

Ramesh Hariharan

:
: Suppose $P$ is diagonalizable
    $P=R\Lambda L$
    $\Lambda$ is a diagonal matrix of the eigenvalues of $P$
    $LR=I$
126

Diagonalizability of $P$

Ramesh Hariharan

:
: Suppose $P$ is diagonalizable
    $P=R\Lambda L$
    $\Lambda$ is a diagonal matrix of the eigenvalues of $P$
    $LR=I$
    So $L$ and $R$ are full rank
127

Diagonalizability of $P$

Ramesh Hariharan

:
: Suppose $P$ is diagonalizable
    $P=R\Lambda L$
    $\Lambda$ is a diagonal matrix of the eigenvalues of $P$
    $LR=I$
    So $L$ and $R$ are full rank
    And $P^k = (R\Lambda L)^k = R\Lambda^k L$
128

Diagonalizability of $P$

Ramesh Hariharan

:
: Suppose $P$ is diagonalizable
    $P=R\Lambda L$
    $\Lambda$ is a diagonal matrix of the eigenvalues of $P$
    $LR=I$
    So $L$ and $R$ are full rank
    And $P^k = (R\Lambda L)^k = R\Lambda^k L$
    Rows of $L$ are the left eigenvectors of $P$
129

Diagonalizability of $P$

:
: Suppose $P$ is diagonalizable
    $P=R\Lambda L$
    $\Lambda$ is a diagonal matrix of the eigenvalues of $P$
    $LR=I$
    So $L$ and $R$ are full rank
    And $P^k = (R\Lambda L)^k = R\Lambda^k L$
    Rows of $L$ are the left eigenvectors of $P$
    Columns of $R$ are the right eigenvectors of $P$

Ramesh Hariharan

130

Diagonalizability of $P$ (contd)

Ramesh Hariharan

:
: Since $L$ is full rank, $\delta=vL=\sum_i v_iL_i$
131

Diagonalizability of $P$ (contd)

Ramesh Hariharan

:
: Since $L$ is full rank, $\delta=vL=\sum_i v_iL_i$
    $L_i$'s are rows of $L$
132

Diagonalizability of $P$ (contd)

Ramesh Hariharan

:
: Since $L$ is full rank, $\delta=vL=\sum_i v_iL_i$
    $L_i$'s are rows of $L$
    $L_1=\pi$
133

Diagonalizability of $P$ (contd)

Ramesh Hariharan

:
: Since $L$ is full rank, $\delta=vL=\sum_i v_iL_i$
    $L_i$'s are rows of $L$
    $L_1=\pi$
    Rows sums of the $L_2\ldots L_j$ are 0
134

Diagonalizability of $P$ (contd)

Ramesh Hariharan

:
: Since $L$ is full rank, $\delta=vL=\sum_i v_iL_i$
    $L_i$'s are rows of $L$
    $L_1=\pi$
    Rows sums of the $L_2\ldots L_j$ are 0
    $\sum_i \delta_i=0$
135

Diagonalizability of $P$ (contd)

Ramesh Hariharan

:
: Since $L$ is full rank, $\delta=vL=\sum_i v_iL_i$
    $L_i$'s are rows of $L$
    $L_1=\pi$
    Rows sums of the $L_2\ldots L_j$ are 0
    $\sum_i \delta_i=0$
    $\implies v_1=0$
136

Eigenvalues and Mixing Rate I

Ramesh Hariharan

:
Recall, we are interested in upper bounding $|\delta P^k|$, where
    $|\delta|,||\delta||^2\leq 2$
    $\sum_i \delta_i=0$,
    $\delta=vL$, $v_1=0$
    $P=R\Lambda L$, where $LR=I$
137

Eigenvalues and Mixing Rate I

Ramesh Hariharan

:
Recall, we are interested in upper bounding $|\delta P^k|$, where
    $|\delta|,||\delta||^2\leq 2$
    $\sum_i \delta_i=0$,
    $\delta=vL$, $v_1=0$
    $P=R\Lambda L$, where $LR=I$ : $\delta P^k=vL(R\Lambda L)^k=vLR\Lambda^kL=v\Lambda^k L $
138

Eigenvalues and Mixing Rate I

: 1. : Recall, we are interested in upper bounding ~D|\delta P^k|~D, where
    $|\delta|,||\delta||^2\leq 2$
    $\sum_i \delta_i=0$,
    $\delta=vL$, $v_1=0$
    $P=R\Lambda L$, where $LR=I$ : $\delta P^k=vL(R\Lambda L)^k=vLR\Lambda^kL=v\Lambda^k L $ : $||\delta P^k||^2= ||v\Lambda^k L||^2 $

Ramesh Hariharan

139

Eigenvalues and Mixing Rate II

  1. If the left eigenvectors of $P$ were orthonormal ($LL^T=I$)

Ramesh Hariharan

140

Eigenvalues and Mixing Rate II

  1. If the left eigenvectors of $P$ were orthonormal ($LL^T=I$)
    • then $||\delta P^k||^2 = ||v\Lambda^k L||^2= v\Lambda^{2k} v^T $

Ramesh Hariharan

141

Eigenvalues and Mixing Rate II

  1. If the left eigenvectors of $P$ were orthonormal ($LL^T=I$)
    • then $||\delta P^k||^2 = ||v\Lambda^k L||^2= v\Lambda^{2k} v^T $
    • Since $v_1=0$ and ordering eigenvalues of $P$ so $1>|\lambda_2|\geq |\lambda_3|\cdots \geq |\lambda_M|$, we get

Ramesh Hariharan

142

Eigenvalues and Mixing Rate II

  1. If the left eigenvectors of $P$ were orthonormal ($LL^T=I$)
    • then $||\delta P^k||^2 = ||v\Lambda^k L||^2= v\Lambda^{2k} v^T $
    • Since $v_1=0$ and ordering eigenvalues of $P$ so $1>|\lambda_2|\geq |\lambda_3|\cdots \geq |\lambda_M|$, we get
      • $||δP^k||^2=v\Lambda^{2k}v^T \\ \leq |\lambda_2|^{2k} vv^T \\ =|\lambda_2|^{2k} (vL)(vL)^T \\ =|\lambda_2|^{2k}||\delta||^2 \leq 2|\lambda_2|^{2k} $

Ramesh Hariharan

143

Eigenvalues and Mixing Rate III

:
1. : Recall, $||\delta P^k||^2=||v\Lambda^k L||^2 $ and $v_1=0$

Ramesh Hariharan

144

Eigenvalues and Mixing Rate III

:
1. : Recall, $||\delta P^k||^2=||v\Lambda^k L||^2 $ and $v_1=0$ 3. : More generally, if $LDL^T=I$ for some diagonal matrix $D$ with min entry $d_{min}>0$ and max entry $d_{max}$

Ramesh Hariharan

145

Eigenvalues and Mixing Rate III

Ramesh Hariharan

:
Recall, $||\delta P^k||^2=||v\Lambda^k L||^2 $ and $v_1=0$
More generally, if $LDL^T=I$ for some diagonal matrix $D$ with min entry $d_{min}>0$ and max entry $d_{max}$
    for any vector $x$, $\frac{1}{d_{max}} ||xLD^\frac{1}{2}||^2 \leq ||xL||^2 \leq \frac{1}{d_{min}} ||xLD^\frac{1}{2}||^2 $
146

Eigenvalues and Mixing Rate III

Ramesh Hariharan

:
Recall, $||\delta P^k||^2=||v\Lambda^k L||^2 $ and $v_1=0$
More generally, if $LDL^T=I$ for some diagonal matrix $D$ with min entry $d_{min}>0$ and max entry $d_{max}$
    for any vector $x$, $\frac{1}{d_{max}} ||xLD^\frac{1}{2}||^2 \leq ||xL||^2 \leq \frac{1}{d_{min}} ||xLD^\frac{1}{2}||^2 $
    then $||\delta P^k||^2 = ||v\Lambda^k L||^2 \leq \frac{1}{d_{min}} ||v\Lambda^k LD^\frac{1}{2}||^2 = \frac{1}{d_{min}} (v\Lambda^k LD^\frac{1}{2})(v\Lambda^k LD^\frac{1}{2})^T$
147

Eigenvalues and Mixing Rate III

Ramesh Hariharan

:
Recall, $||\delta P^k||^2=||v\Lambda^k L||^2 $ and $v_1=0$
More generally, if $LDL^T=I$ for some diagonal matrix $D$ with min entry $d_{min}>0$ and max entry $d_{max}$
    for any vector $x$, $\frac{1}{d_{max}} ||xLD^\frac{1}{2}||^2 \leq ||xL||^2 \leq \frac{1}{d_{min}} ||xLD^\frac{1}{2}||^2 $
    then $||\delta P^k||^2 = ||v\Lambda^k L||^2 \leq \frac{1}{d_{min}} ||v\Lambda^k LD^\frac{1}{2}||^2 = \frac{1}{d_{min}} (v\Lambda^k LD^\frac{1}{2})(v\Lambda^k LD^\frac{1}{2})^T$
    $= \frac{1}{d_{min}} v\Lambda^{2k} v^T \leq \frac{1}{d_{min}} |\lambda_2|^{2k} vv^T= \frac{1}{d_{min}} |\lambda_2|^{2k}||vLD^\frac{1}{2}||^2 $
148

Eigenvalues and Mixing Rate III

: 1. : Recall, ~D||\delta P^k||^2=||v\Lambda^k L||^2 ~D and ~Dv_1=0~D 3. : More generally, if ~DLDL^T=I~D for some diagonal matrix ~DD~D with min entry ~Dd_{min}>0~D and max entry ~Dd_{max}~D
    for any vector $x$, $\frac{1}{d_{max}} ||xLD^\frac{1}{2}||^2 \leq ||xL||^2 \leq \frac{1}{d_{min}} ||xLD^\frac{1}{2}||^2 $
    then $||\delta P^k||^2 = ||v\Lambda^k L||^2 \leq \frac{1}{d_{min}} ||v\Lambda^k LD^\frac{1}{2}||^2 = \frac{1}{d_{min}} (v\Lambda^k LD^\frac{1}{2})(v\Lambda^k LD^\frac{1}{2})^T$
    $= \frac{1}{d_{min}} v\Lambda^{2k} v^T \leq \frac{1}{d_{min}} |\lambda_2|^{2k} vv^T= \frac{1}{d_{min}} |\lambda_2|^{2k}||vLD^\frac{1}{2}||^2 $
    $ \leq \frac{d_{max}}{d_{min}} |\lambda_2|^{2k}||vL||^2= \frac{d_{max}}{d_{min}} |\lambda_2|^{2k} ||\delta||^2 \leq 2 \frac{d_{max}}{d_{min}} |\lambda_2|^{2k}$

Ramesh Hariharan

149

Eigenvalues and Mixing Rate IV

  1. ${||\delta P^k||^2}\leq 2\frac{d_{max}}{d_{min}} |\lambda_2|^{2k}$

Ramesh Hariharan

150

Eigenvalues and Mixing Rate IV

  1. ${||\delta P^k||^2}\leq 2\frac{d_{max}}{d_{min}} |\lambda_2|^{2k}$
  2. $|\delta P^k|^2 \leq M ||\delta P^k||^2$, by Jensen's Inequality

Ramesh Hariharan

151

Eigenvalues and Mixing Rate IV

  1. ${||\delta P^k||^2}\leq 2\frac{d_{max}}{d_{min}} |\lambda_2|^{2k}$
  2. $|\delta P^k|^2 \leq M ||\delta P^k||^2$, by Jensen's Inequality
  3. $|\delta P^k|^2 \leq 2M\frac{d_{max}}{d_{min}} |\lambda_2|^{2k} $

Ramesh Hariharan

152

Eigenvalues and Mixing Rate IV

  1. ${||\delta P^k||^2}\leq 2\frac{d_{max}}{d_{min}} |\lambda_2|^{2k}$
  2. $|\delta P^k|^2 \leq M ||\delta P^k||^2$, by Jensen's Inequality
  3. $|\delta P^k|^2 \leq 2M\frac{d_{max}}{d_{min}} |\lambda_2|^{2k} $
  4. $|\delta P^k|\leq \sqrt{2M\frac{d_{max}}{d_{min}}} |\lambda_2|^{k} $

Ramesh Hariharan

153

Eigenvalues and Mixing Rate IV

  1. ${||\delta P^k||^2}\leq 2\frac{d_{max}}{d_{min}} |\lambda_2|^{2k}$
  2. $|\delta P^k|^2 \leq M ||\delta P^k||^2$, by Jensen's Inequality
  3. $|\delta P^k|^2 \leq 2M\frac{d_{max}}{d_{min}} |\lambda_2|^{2k} $
  4. $|\delta P^k|\leq \sqrt{2M\frac{d_{max}}{d_{min}}} |\lambda_2|^{k} $
  5. For ${|\delta P^k|}\leq 2\epsilon$, it suffices to set $$k= \frac{\log(\frac{2\epsilon}{\sqrt{2M }}\frac{\sqrt{d_{min}}}{\sqrt{ d_{max} }})}{\log |\lambda_2|} \leq \frac{\log(\frac{\sqrt{M}}{\sqrt{2}\epsilon} \frac{\sqrt{d_{max}}}{\sqrt{ d_{min} }} )}{1-\lambda_2} $$

Ramesh Hariharan

154

Diagonalizability of $P$

  1. If $P$ is symmetric, then it is diagonalizable

Ramesh Hariharan

155

Diagonalizability of $P$

  1. If $P$ is symmetric, then it is diagonalizable
    • The stationary distribution in uniform

Ramesh Hariharan

156

Diagonalizability of $P$

  1. If $P$ is symmetric, then it is diagonalizable
    • The stationary distribution in uniform
    • Left and right eigenvectors are the same ($L=R^T$, $P=L^T\Lambda L$)

Ramesh Hariharan

157

Diagonalizability of $P$

  1. If $P$ is symmetric, then it is diagonalizable
    • The stationary distribution in uniform
    • Left and right eigenvectors are the same ($L=R^T$, $P=L^T\Lambda L$)
    • Eigenvalues are real

Ramesh Hariharan

158

Diagonalizability of $P$

  1. If $P$ is symmetric, then it is diagonalizable
    • The stationary distribution in uniform
    • Left and right eigenvectors are the same ($L=R^T$, $P=L^T\Lambda L$)
    • Eigenvalues are real
    • Eigenvectors are orthonormal ($LL^T=I$)

Ramesh Hariharan

159

Diagonalizability of $P$

  1. If $P$ is symmetric, then it is diagonalizable
    • The stationary distribution in uniform
    • Left and right eigenvectors are the same ($L=R^T$, $P=L^T\Lambda L$)
    • Eigenvalues are real
    • Eigenvectors are orthonormal ($LL^T=I$)
    • Eigenvectors form an orthonormal basis for the entire $M$-dimensional space

Ramesh Hariharan

160

Diagonalizability of $P$

  1. If $P$ is symmetric, then it is diagonalizable
    • The stationary distribution in uniform
    • Left and right eigenvectors are the same ($L=R^T$, $P=L^T\Lambda L$)
    • Eigenvalues are real
    • Eigenvectors are orthonormal ($LL^T=I$)
    • Eigenvectors form an orthonormal basis for the entire $M$-dimensional space
  2. Is symmetry a practical condition?

Ramesh Hariharan

161

Is $P$ Diagonalizable in General?

Ramesh Hariharan

:
Example
$P =\frac{1}{12}\begin{pmatrix}
7 & 2 & 3 \\
4 & 5 & 3 \\
3 & 3 & 6 \\
\end{pmatrix}$ 2. : $\lambda_1=1$, $\lambda_2=\lambda_3=\frac{3}{12}$ 3. : $y_1=\begin{pmatrix}\frac{11}{27} & \frac{7}{27} & \frac{9}{27}\end{pmatrix}$, $y_2=y_3=\begin{pmatrix}1 & -1 & 0\end{pmatrix}$ 4. : $P-\lambda_1 I,P-\lambda_2 I$ both have rank 2, nullity 1 5. : $z=\begin{pmatrix}0 & 1 & -1\end{pmatrix}$ is not in the span of $y_2,y_3$ 6. : Needs generalized eigenvalues, beyond scope here!
162

Reversibility: A Special Diagonalizable Case of $P$

:
1. : $\pi_i p_{ij} = \pi_j p_{ji}$ (detailed balance or reversibility)

Ramesh Hariharan

163

Reversibility: A Special Diagonalizable Case of $P$

:
1. : $\pi_i p_{ij} = \pi_j p_{ji}$ (detailed balance or reversibility) * Aside: Any vector $x$ such that $x_ip_{ij}=x_jp_{ji}$ for all $i,j$ satisfies $x=\pi$

Ramesh Hariharan

164

Reversibility: A Special Diagonalizable Case of $P$

:
1. : $\pi_i p_{ij} = \pi_j p_{ji}$ (detailed balance or reversibility) * Aside: Any vector $x$ such that $x_ip_{ij}=x_jp_{ji}$ for all $i,j$ satisfies $x=\pi$ 2. : The probability of moving from $i$ to $j$ is the same as that of moving from $j$ to $i$

Ramesh Hariharan

165

Reversibility: A Special Diagonalizable Case of $P$

:
1. : $\pi_i p_{ij} = \pi_j p_{ji}$ (detailed balance or reversibility) * Aside: Any vector $x$ such that $x_ip_{ij}=x_jp_{ji}$ for all $i,j$ satisfies $x=\pi$ 2. : The probability of moving from $i$ to $j$ is the same as that of moving from $j$ to $i$ 3. : Consider $D$, diagonal matrix with entries $D_{ii}=\frac{1}{\pi_i}$

Ramesh Hariharan

166

Reversibility: A Special Diagonalizable Case of $P$

:
1. : ~D\pi_i p_{ij} = \pi_j p_{ji}~D (detailed balance or reversibility) * Aside: Any vector ~Dx~D such that ~Dx_ip_{ij}=x_jp_{ji}~D for all ~Di,j~D satisfies ~Dx=\pi~D 2. : The probability of moving from ~Di~D to ~Dj~D is the same as that of moving from ~Dj~D to ~Di~D 3. : Consider ~DD~D, diagonal matrix with entries ~DD_{ii}=\frac{1}{\pi_i}~D

    $D^{-\frac{1}{2}} P D^{\frac{1}{2}}$ is symmetric with the same eigenvalues as $P$

Ramesh Hariharan

167

Reversibility: A Special Diagonalizable Case of $P$

:
1. : ~D\pi_i p_{ij} = \pi_j p_{ji}~D (detailed balance or reversibility) * Aside: Any vector ~Dx~D such that ~Dx_ip_{ij}=x_jp_{ji}~D for all ~Di,j~D satisfies ~Dx=\pi~D 2. : The probability of moving from ~Di~D to ~Dj~D is the same as that of moving from ~Dj~D to ~Di~D 3. : Consider ~DD~D, diagonal matrix with entries ~DD_{ii}=\frac{1}{\pi_i}~D

    $D^{-\frac{1}{2}} P D^{\frac{1}{2}}$ is symmetric with the same eigenvalues as $P$
    Entries are $\sqrt{\frac{\pi_i}{\pi_j}} p_{ij}=\sqrt{\frac{\pi_j}{\pi_i}} p_{ji} $

Ramesh Hariharan

168

Reversibility: A Special Diagonalizable Case of $P$

: 1. : ~D\pi_i p_{ij} = \pi_j p_{ji}~D (***detailed balance or reversibility***) * Aside: Any vector ~Dx~D such that ~Dx_ip_{ij}=x_jp_{ji}~D for all ~Di,j~D satisfies ~Dx=\pi~D 2. : The probability of moving from ~Di~D to ~Dj~D is the same as that of moving from ~Dj~D to ~Di~D 3. : Consider ~DD~D, diagonal matrix with entries ~DD_{ii}=\frac{1}{\pi_i}~D
    $D^{-\frac{1}{2}} P D^{\frac{1}{2}}$ is symmetric with the same eigenvalues as $P$
    Entries are $\sqrt{\frac{\pi_i}{\pi_j}} p_{ij}=\sqrt{\frac{\pi_j}{\pi_i}} p_{ji} $
    Eigenvector corresponding to eigenvalue 1 is $v_i=\sqrt{\pi_i}$

Ramesh Hariharan

169

A Property of Reversible Chains

  • By symmetry
    • $D^{-\frac{1}{2}}PD^{\frac{1}{2}}=L_1^T\Lambda L_1$, where $L_1L_1^T=I$

Ramesh Hariharan

170

A Property of Reversible Chains

  • By symmetry
    • $D^{-\frac{1}{2}}PD^{\frac{1}{2}}=L_1^T\Lambda L_1$, where $L_1L_1^T=I$
  • So $P= D^{\frac{1}{2}}(L_1^T \Lambda L_1)D^{-\frac{1}{2}}=(L_1D^{\frac{1}{2}})^T\Lambda (L_1D^{-\frac{1}{2}})$

Ramesh Hariharan

171

A Property of Reversible Chains

  • By symmetry
    • $D^{-\frac{1}{2}}PD^{\frac{1}{2}}=L_1^T\Lambda L_1$, where $L_1L_1^T=I$
  • So $P= D^{\frac{1}{2}}(L_1^T \Lambda L_1)D^{-\frac{1}{2}}=(L_1D^{\frac{1}{2}})^T\Lambda (L_1D^{-\frac{1}{2}})$
    • So $L = L_1D^{-\frac{1}{2}} $

Ramesh Hariharan

172

A Property of Reversible Chains

  • By symmetry
    • $D^{-\frac{1}{2}}PD^{\frac{1}{2}}=L_1^T\Lambda L_1$, where $L_1L_1^T=I$
  • So $P= D^{\frac{1}{2}}(L_1^T \Lambda L_1)D^{-\frac{1}{2}}=(L_1D^{\frac{1}{2}})^T\Lambda (L_1D^{-\frac{1}{2}})$
    • So $L = L_1D^{-\frac{1}{2}} $
    • And $R=(L_1D^{\frac{1}{2}})^T$

Ramesh Hariharan

173

A Property of Reversible Chains

  • By symmetry
    • $D^{-\frac{1}{2}}PD^{\frac{1}{2}}=L_1^T\Lambda L_1$, where $L_1L_1^T=I$
  • So $P= D^{\frac{1}{2}}(L_1^T \Lambda L_1)D^{-\frac{1}{2}}=(L_1D^{\frac{1}{2}})^T\Lambda (L_1D^{-\frac{1}{2}})$
    • So $L = L_1D^{-\frac{1}{2}} $
    • And $R=(L_1D^{\frac{1}{2}})^T$
    • Then

Ramesh Hariharan

174

A Property of Reversible Chains

  • By symmetry
    • $D^{-\frac{1}{2}}PD^{\frac{1}{2}}=L_1^T\Lambda L_1$, where $L_1L_1^T=I$
  • So $P= D^{\frac{1}{2}}(L_1^T \Lambda L_1)D^{-\frac{1}{2}}=(L_1D^{\frac{1}{2}})^T\Lambda (L_1D^{-\frac{1}{2}})$
    • So $L = L_1D^{-\frac{1}{2}} $
    • And $R=(L_1D^{\frac{1}{2}})^T$
    • Then
      • $LR=LDL^T=L_1L_1^T=I$

Ramesh Hariharan

175

A Property of Reversible Chains

  • By symmetry
    • $D^{-\frac{1}{2}}PD^{\frac{1}{2}}=L_1^T\Lambda L_1$, where $L_1L_1^T=I$
  • So $P= D^{\frac{1}{2}}(L_1^T \Lambda L_1)D^{-\frac{1}{2}}=(L_1D^{\frac{1}{2}})^T\Lambda (L_1D^{-\frac{1}{2}})$
    • So $L = L_1D^{-\frac{1}{2}} $
    • And $R=(L_1D^{\frac{1}{2}})^T$
    • Then
      • $LR=LDL^T=L_1L_1^T=I$
    • $k\leq \frac{\log(\frac{\sqrt{M}}{\sqrt{2}\epsilon} \frac{\sqrt{\pi_{max}}}{\sqrt{ \pi_{min} }} )}{1-\lambda_2}$

Ramesh Hariharan

176

Reversible Markov Chains

:
1. Example: Undirected graph with uniform distribution on outgoing edges, i.e., $p_{ij}=\frac{1}{d_i}, \forall \mbox{ edges } i\rightarrow j$

Ramesh Hariharan

177

Reversible Markov Chains

Ramesh Hariharan

:
Example: Undirected graph with uniform distribution on outgoing edges, i.e., $p_{ij}=\frac{1}{d_i}, \forall \mbox{ edges } i\rightarrow j$
$\pi_i \propto d_i$
    $\pi_j=\sum_i \pi_i p_{ij}\propto \sum_i d_i \frac{1}{d_i} = d_j$
178

Reversible Markov Chains

: 1. Example: Undirected graph with uniform distribution on outgoing edges, i.e., ~Dp_{ij}=\frac{1}{d_i}, \forall \mbox{ edges } i\rightarrow j~D 2. : ~D\pi_i \propto d_i~D
    $\pi_j=\sum_i \pi_i p_{ij}\propto \sum_i d_i \frac{1}{d_i} = d_j$ : Mixing time is

$$\frac{\log(\frac{\sqrt{M}}{\sqrt{2}\epsilon\} : \frac{\sqrt{d_{max}}}{\sqrt{ d_{min} }} )}{1-\lambda_2}$$

Ramesh Hariharan

179

Reversible Markov Chains

: 1. Example: Undirected graph with uniform distribution on outgoing edges, i.e., ~Dp_{ij}=\frac{1}{d_i}, \forall \mbox{ edges } i\rightarrow j~D 2. : ~D\pi_i \propto d_i~D
    $\pi_j=\sum_i \pi_i p_{ij}\propto \sum_i d_i \frac{1}{d_i} = d_j$ : Mixing time is

$$\frac{\log(\frac{\sqrt{M}}{\sqrt{2}\epsilon\} : \frac{\sqrt{d_{max}}}{\sqrt{ d_{min} }} )}{1-\lambda_2}$$

  1. : How do we lower bound $1-\lambda_2$?

Ramesh Hariharan

180

Solution to the Random Walk Puzzle

  1. Symmetric $p_{ij}=p_{ji}$

Ramesh Hariharan

181

Solution to the Random Walk Puzzle

  1. Symmetric $p_{ij}=p_{ji}$
  2. $\pi_i=\frac{1}{M}$, so uniform distribution in the limit

Ramesh Hariharan

182

Solution to the Random Walk Puzzle

  1. Symmetric $p_{ij}=p_{ji}$
  2. $\pi_i=\frac{1}{M}$, so uniform distribution in the limit
  3. Eigenvectors=$\begin{pmatrix}\omega^0 & \omega^1 &\cdots & \omega^{M-1} \ \end{pmatrix}$
    • $\omega$ is one of the $M$th roots of 1

Ramesh Hariharan

183

Solution to the Random Walk Puzzle

  1. Symmetric $p_{ij}=p_{ji}$
  2. $\pi_i=\frac{1}{M}$, so uniform distribution in the limit
  3. Eigenvectors=$\begin{pmatrix}\omega^0 & \omega^1 &\cdots & \omega^{M-1} \ \end{pmatrix}$
    • $\omega$ is one of the $M$th roots of 1
  4. Eigenvalues $\lambda_j= \frac{1}{2}(1+\cos\frac{2\pi (j-1)}{M})$

Ramesh Hariharan

184

Solution to the Random Walk Puzzle

  1. Symmetric $p_{ij}=p_{ji}$
  2. $\pi_i=\frac{1}{M}$, so uniform distribution in the limit
  3. Eigenvectors=$\begin{pmatrix}\omega^0 & \omega^1 &\cdots & \omega^{M-1} \ \end{pmatrix}$
    • $\omega$ is one of the $M$th roots of 1
  4. Eigenvalues $\lambda_j= \frac{1}{2}(1+\cos\frac{2\pi (j-1)}{M})$
  5. $\lambda_2=\frac{1}{2}(1+\cos\frac{2\pi}{M}) \sim 1-\frac{\pi^2}{M^2}$

Ramesh Hariharan

185

Solution to the Random Walk Puzzle

  1. Symmetric $p_{ij}=p_{ji}$
  2. $\pi_i=\frac{1}{M}$, so uniform distribution in the limit
  3. Eigenvectors=$\begin{pmatrix}\omega^0 & \omega^1 &\cdots & \omega^{M-1} \ \end{pmatrix}$
    • $\omega$ is one of the $M$th roots of 1
  4. Eigenvalues $\lambda_j= \frac{1}{2}(1+\cos\frac{2\pi (j-1)}{M})$
  5. $\lambda_2=\frac{1}{2}(1+\cos\frac{2\pi}{M}) \sim 1-\frac{\pi^2}{M^2}$
  6. Mixing time $\frac{\log\frac{\sqrt{M}}{\sqrt{2}\epsilon}}{1-\lambda_2}=\frac{M^2}{\pi^2}\log\frac{\sqrt{M}}{\sqrt{2}\epsilon} $

Ramesh Hariharan

Copy the link below to share!

Share
×