# Fullscreen View

The author recommends viewing the raindrop in fullsceen.
×

# Raindrop Info

Title:
Sampling for Massive Matrices
Authors:
Created:
2015-Jan-25
Updated:
2015-Jul-17
Views:
738
Pages:
78
Likes:
5
×

# Copy the link below to embed!

×
1

Massive Data Problems
Sampling for Masssive Matrices
Data Sciences Course, Jan 2015

Ramesh Hariharan

2

Principal Component Analysis

• Given a collection of points in $d$-dimensional space; $d$ is large

Ramesh Hariharan

3

Principal Component Analysis

• Given a collection of points in $d$-dimensional space; $d$ is large

• Find the 1-dimensional linear subspace (line through the origin) that maximizes point spread

Ramesh Hariharan

4

Principal Component Analysis

• Given a collection of points in $d$-dimensional space; $d$ is large

• Find the 1-dimensional linear subspace (line through the origin) that maximizes point spread

• More generally, find the $r$-dimensional linear subspace that maximizes point spread

Ramesh Hariharan

5

Principal Component Analysis

• Given a collection of points in $d$-dimensional space; $d$ is large

• Find the 1-dimensional linear subspace (line through the origin) that maximizes point spread

• More generally, find the $r$-dimensional linear subspace that maximizes point spread

• As defined as sum of squares of projections of the points on to the linear subspace

Ramesh Hariharan

6

Principal Component Analysis

• Given a collection of points in $d$-dimensional space; $d$ is large

• Find the 1-dimensional linear subspace (line through the origin) that maximizes point spread

• More generally, find the $r$-dimensional linear subspace that maximizes point spread

• As defined as sum of squares of projections of the points on to the linear subspace

Ramesh Hariharan

7

Principal Component Analysis

• Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector

Ramesh Hariharan

8

Principal Component Analysis

• Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector

• Let $R$ denote the $n\times d$ matrix whose rows are $s_1,\ldots,s_n$

Ramesh Hariharan

9

Principal Component Analysis

• Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector

• Let $R$ denote the $n\times d$ matrix whose rows are $s_1,\ldots,s_n$

• Let $x$ be the unit column vector denoting a line through the origin

Ramesh Hariharan

10

Principal Component Analysis

• Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector

• Let $R$ denote the $n\times d$ matrix whose rows are $s_1,\ldots,s_n$

• Let $x$ be the unit column vector denoting a line through the origin

• $(Rx)^T Rx$ denotes the sum of squares of protections

Ramesh Hariharan

11

Principal Component Analysis

• Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector

• Let $R$ denote the $n\times d$ matrix whose rows are $s_1,\ldots,s_n$

• Let $x$ be the unit column vector denoting a line through the origin

• $(Rx)^T Rx$ denotes the sum of squares of protections

• We want the $x$ that maximizes this quantity

Ramesh Hariharan

12

Principal Component Analysis

• Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector

• Let $R$ denote the $n\times d$ matrix whose rows are $s_1,\ldots,s_n$

• Let $x$ be the unit column vector denoting a line through the origin

• $(Rx)^T Rx$ denotes the sum of squares of protections

• We want the $x$ that maximizes this quantity

Ramesh Hariharan

13

Principal Components And Eigenvectors

• $(Rx)^T Rx=x^T (R^T R) x$ is maximized by an eigenvector corresponding to the largest eigenvalue of $R^TR$

Ramesh Hariharan

14

Principal Components And Eigenvectors

• $(Rx)^T Rx=x^T (R^T R) x$ is maximized by an eigenvector corresponding to the largest eigenvalue of $R^TR$

• Note $R^TR$ is a real, symmetric $d\times d$ matrix, so eigenvalues are real, non-negative, and eigenvectors form a orthonormal basis

Ramesh Hariharan

15

Principal Components And Eigenvectors

• $(Rx)^T Rx=x^T (R^T R) x$ is maximized by an eigenvector corresponding to the largest eigenvalue of $R^TR$

• Note $R^TR$ is a real, symmetric $d\times d$ matrix, so eigenvalues are real, non-negative, and eigenvectors form a orthonormal basis

• Any unit vector $x$ can be written as $x=\sum_{i=1}^d \alpha_i v_i$, where $v_i$ are the unit eigenvectors of $R^T R$ in decreasing order of eigenvalue $e_i$.

Ramesh Hariharan

16

Principal Components And Eigenvectors

• $(Rx)^T Rx=x^T (R^T R) x$ is maximized by an eigenvector corresponding to the largest eigenvalue of $R^TR$

• Note $R^TR$ is a real, symmetric $d\times d$ matrix, so eigenvalues are real, non-negative, and eigenvectors form a orthonormal basis

• Any unit vector $x$ can be written as $x=\sum_{i=1}^d \alpha_i v_i$, where $v_i$ are the unit eigenvectors of $R^T R$ in decreasing order of eigenvalue $e_i$.

• $\sum_i^d \alpha_i^2=1$; further for spherically sym. random $x$, with 62% probability, $\frac{0.5}{\sqrt{d}} \leq |\alpha_1|$

Ramesh Hariharan

17

Finding the Eigenvectors of $R^TR$

• $(R^TR)x=(R^TR)\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e_i v_i$

Ramesh Hariharan

18

Finding the Eigenvectors of $R^TR$

• $(R^TR)x=(R^TR)\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e_i v_i$

• $(R^TR)^l x= (R^TR)^l\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e^l_i v_i$

Ramesh Hariharan

19

Finding the Eigenvectors of $R^TR$

• $(R^TR)x=(R^TR)\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e_i v_i$

• $(R^TR)^l x= (R^TR)^l\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e^l_i v_i$

• Start with a random $x$, apply $R^TR$ repeatedly, normalize to unit each time, and so on.

Ramesh Hariharan

20

Finding the Eigenvectors of $R^TR$

• $(R^TR)x=(R^TR)\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e_i v_i$

• $(R^TR)^l x= (R^TR)^l\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e^l_i v_i$

• Start with a random $x$, apply $R^TR$ repeatedly, normalize to unit each time, and so on.

• Assuming unique largest eigenvalue, as $l\rightarrow \infty$, $(R^TR)^l x\rightarrow v_1$

Ramesh Hariharan

21

Finding the Eigenvectors of $R^TR$

• $(R^TR)x=(R^TR)\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e_i v_i$

• $(R^TR)^l x= (R^TR)^l\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e^l_i v_i$

• Start with a random $x$, apply $R^TR$ repeatedly, normalize to unit each time, and so on.

• Assuming unique largest eigenvalue, as $l\rightarrow \infty$, $(R^TR)^l x\rightarrow v_1$

Ramesh Hariharan

22

Convergence Rate

• $|(R^TR)^lx - v_1|^2 =(\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}-1)^2+\frac{\sum_{i=2}^d \alpha_i^2e_i^{2l}}{{\sum_{i=1}^d \alpha^2_i e_i^{2l}}} = 2-2\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}$

Ramesh Hariharan

23

Convergence Rate

• $|(R^TR)^lx - v_1|^2 =(\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}-1)^2+\frac{\sum_{i=2}^d \alpha_i^2e_i^{2l}}{{\sum_{i=1}^d \alpha^2_i e_i^{2l}}} = 2-2\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}$

• $\sum_{i=1}^d \alpha^2_i e_i^{2l}\leq \alpha_1^2e_1^{2l}+(1-\alpha_1^2)e_2^{2l}\leq \alpha_1^2e_1^{2l}(1+(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}})$

Ramesh Hariharan

24

Convergence Rate

• $|(R^TR)^lx - v_1|^2 =(\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}-1)^2+\frac{\sum_{i=2}^d \alpha_i^2e_i^{2l}}{{\sum_{i=1}^d \alpha^2_i e_i^{2l}}} = 2-2\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}$

• $\sum_{i=1}^d \alpha^2_i e_i^{2l}\leq \alpha_1^2e_1^{2l}+(1-\alpha_1^2)e_2^{2l}\leq \alpha_1^2e_1^{2l}(1+(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}})$

• So $|(R^TR)^lx - v_1|^2\leq 2 (1-\frac{1}{\sqrt{1+(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}}}})\leq(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}} \leq (4d-1) \frac{e_2^{2l}}{e_1^{2l}}$

Ramesh Hariharan

25

Convergence Rate

• $|(R^TR)^lx - v_1|^2 =(\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}-1)^2+\frac{\sum_{i=2}^d \alpha_i^2e_i^{2l}}{{\sum_{i=1}^d \alpha^2_i e_i^{2l}}} = 2-2\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}$

• $\sum_{i=1}^d \alpha^2_i e_i^{2l}\leq \alpha_1^2e_1^{2l}+(1-\alpha_1^2)e_2^{2l}\leq \alpha_1^2e_1^{2l}(1+(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}})$

• So $|(R^TR)^lx - v_1|^2\leq 2 (1-\frac{1}{\sqrt{1+(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}}}})\leq(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}} \leq (4d-1) \frac{e_2^{2l}}{e_1^{2l}}$

• For $l>\frac{1}{2} \log(\frac{4d-1}{\epsilon^2})/\log(\frac{e_1}{e_2})$, $|(R^TR)^lx - v_1|\leq \epsilon$

Ramesh Hariharan

26

Time Complexity

• Computing $R^TR$ takes $d\times n\times d$ steps

Ramesh Hariharan

27

Time Complexity

• Computing $R^TR$ takes $d\times n\times d$ steps

• Each powering step takes $d\times d$, so time taken is $O(d^2n+d^2\log\frac{d}{\epsilon^2}/\log\frac{e_1}{e_2})$

Ramesh Hariharan

28

Time Complexity

• Computing $R^TR$ takes $d\times n\times d$ steps

• Each powering step takes $d\times d$, so time taken is $O(d^2n+d^2\log\frac{d}{\epsilon^2}/\log\frac{e_1}{e_2})$

• What if $n$ is really large, can one reduce the dependence on n?

Ramesh Hariharan

29

Time Complexity

• Computing $R^TR$ takes $d\times n\times d$ steps

• Each powering step takes $d\times d$, so time taken is $O(d^2n+d^2\log\frac{d}{\epsilon^2}/\log\frac{e_1}{e_2})$

• What if $n$ is really large, can one reduce the dependence on n?

• Sampling? Uniform? If not, which points should we prefer?

Ramesh Hariharan

30

Time Complexity

• Computing $R^TR$ takes $d\times n\times d$ steps

• Each powering step takes $d\times d$, so time taken is $O(d^2n+d^2\log\frac{d}{\epsilon^2}/\log\frac{e_1}{e_2})$

• What if $n$ is really large, can one reduce the dependence on n?

• Sampling? Uniform? If not, which points should we prefer?

• Intuitively, prefer points far away from the origin.

Ramesh Hariharan

31

Time Complexity

• Computing $R^TR$ takes $d\times n\times d$ steps

• Each powering step takes $d\times d$, so time taken is $O(d^2n+d^2\log\frac{d}{\epsilon^2}/\log\frac{e_1}{e_2})$

• What if $n$ is really large, can one reduce the dependence on n?

• Sampling? Uniform? If not, which points should we prefer?

• Intuitively, prefer points far away from the origin.

Ramesh Hariharan

32

Sampling for Principal Component Analysis

• Sample each point $s_i$ with probability $p_i$; hopefully only a few points will be sampled

Ramesh Hariharan

33

Sampling for Principal Component Analysis

• Sample each point $s_i$ with probability $p_i$; hopefully only a few points will be sampled

• We will also need to weigh each point by $1/\sqrt{p_i}$. Why?

Ramesh Hariharan

34

Sampling for Principal Component Analysis

• Sample each point $s_i$ with probability $p_i$; hopefully only a few points will be sampled

• We will also need to weigh each point by $1/\sqrt{p_i}$. Why?

• Note $R^TR = \sum_{i=1}^n s_i s_i^T$

Ramesh Hariharan

35

Sampling for Principal Component Analysis

• Sample each point $s_i$ with probability $p_i$; hopefully only a few points will be sampled

• We will also need to weigh each point by $1/\sqrt{p_i}$. Why?

• Note $R^TR = \sum_{i=1}^n s_i s_i^T$

• Expectation: $\sum_{i=1}^n p_i \frac{s_i}{\sqrt{p_i}} \frac{s_i^T}{\sqrt{p_i}} = R^TR$ is what we want

Ramesh Hariharan

36

Sampling for Principal Component Analysis

• Sample each point $s_i$ with probability $p_i$; hopefully only a few points will be sampled

• We will also need to weigh each point by $1/\sqrt{p_i}$. Why?

• Note $R^TR = \sum_{i=1}^n s_i s_i^T$

• Expectation: $\sum_{i=1}^n p_i \frac{s_i}{\sqrt{p_i}} \frac{s_i^T}{\sqrt{p_i}} = R^TR$ is what we want

• Variance?

Ramesh Hariharan

37

Variance and the Frobenius Norm

• Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$

Ramesh Hariharan

38

Variance and the Frobenius Norm

• Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$

• $E(\sum_{i=1}^n S_i S^T_i) = R^TR$

Ramesh Hariharan

39

Variance and the Frobenius Norm

• Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$

• $E(\sum_{i=1}^n S_i S^T_i) = R^TR$

• Define Variance as sum of the variances of the individual matrix entries

Ramesh Hariharan

40

Variance and the Frobenius Norm

• Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$

• $E(\sum_{i=1}^n S_i S^T_i) = R^TR$

• Define Variance as sum of the variances of the individual matrix entries

• Let $||M||$, the Frobenius Norm of matrix $M$ be $\sqrt{\sum_{ij} M_{ij}^2}$

Ramesh Hariharan

41

Variance and the Frobenius Norm

• Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$

• $E(\sum_{i=1}^n S_i S^T_i) = R^TR$

• Define Variance as sum of the variances of the individual matrix entries

• Let $||M||$, the Frobenius Norm of matrix $M$ be $\sqrt{\sum_{ij} M_{ij}^2}$

• $Var(\sum_{i=1}^n S_i S^T_i)=E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]$

Ramesh Hariharan

42

Variance and the Frobenius Norm

• Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$

• $E(\sum_{i=1}^n S_i S^T_i) = R^TR$

• Define Variance as sum of the variances of the individual matrix entries

• Let $||M||$, the Frobenius Norm of matrix $M$ be $\sqrt{\sum_{ij} M_{ij}^2}$

• $Var(\sum_{i=1}^n S_i S^T_i)=E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]$

Ramesh Hariharan

43

Frobenius Norm Properties

• $||M||^2=Tr(MM^T)$

Ramesh Hariharan

44

Frobenius Norm Properties

• $||M||^2=Tr(MM^T)$

• Note $Tr(M+N)=Tr(M)+Tr(N)$ and $Tr(AB)=Tr(BA)$, $Tr(ABC)=Tr(CAB)$

Ramesh Hariharan

45

Frobenius Norm Properties

• $||M||^2=Tr(MM^T)$

• Note $Tr(M+N)=Tr(M)+Tr(N)$ and $Tr(AB)=Tr(BA)$, $Tr(ABC)=Tr(CAB)$

• If $M=\sum_i v_i v^T_i$ for vectors $v_i$, then $||M||^2=\sum_i |v_i|^4 + 2\sum_{i\lt i'}(v_i^Tv_{i'})^2$

Ramesh Hariharan

46

Frobenius Norm Properties

• $||M||^2=Tr(MM^T)$

• Note $Tr(M+N)=Tr(M)+Tr(N)$ and $Tr(AB)=Tr(BA)$, $Tr(ABC)=Tr(CAB)$

• If $M=\sum_i v_i v^T_i$ for vectors $v_i$, then $||M||^2=\sum_i |v_i|^4 + 2\sum_{i\lt i'}(v_i^Tv_{i'})^2$

• For symmetric matrices $M,N$, $||M-N||^2=||M||^2+||N||^2-2Tr(MN)$

Ramesh Hariharan

47

Frobenius Norm Properties

• $||M||^2=Tr(MM^T)$

• Note $Tr(M+N)=Tr(M)+Tr(N)$ and $Tr(AB)=Tr(BA)$, $Tr(ABC)=Tr(CAB)$

• If $M=\sum_i v_i v^T_i$ for vectors $v_i$, then $||M||^2=\sum_i |v_i|^4 + 2\sum_{i\lt i'}(v_i^Tv_{i'})^2$

• For symmetric matrices $M,N$, $||M-N||^2=||M||^2+||N||^2-2Tr(MN)$

• If $M$ is a random symmetric matrix with $E(M)=N$, then $E(Tr(MN))=||N||^2$ and $E[||M-N||^2]=E[||M||^2]-||N||^2$

Ramesh Hariharan

48

An Expression for the Variance

• $E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=E[||\sum_i S_iS^T_i||^2]-||R^TR||^2$

Ramesh Hariharan

49

An Expression for the Variance

• $E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=E[||\sum_i S_iS^T_i||^2]-||R^TR||^2$

• $||\sum_i S_iS_i^T||^2=\sum_i |S_i|^4 + 2\sum_{i\lt i'}(S_i^TS_{i'})^2$

Ramesh Hariharan

50

An Expression for the Variance

• $E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=E[||\sum_i S_iS^T_i||^2]-||R^TR||^2$

• $||\sum_i S_iS_i^T||^2=\sum_i |S_i|^4 + 2\sum_{i\lt i'}(S_i^TS_{i'})^2$

• $E[||\sum_i S_iS_i^T||^2]=\sum_i p_i|\frac{s_i}{\sqrt{p_i}}|^4 + 2\sum_{i\lt i'} p_ip_{i'}(\frac{s_i^T}{\sqrt{p_i}}\frac{s_{i'}}{\sqrt{p_{i'}}})^2$

Ramesh Hariharan

51

An Expression for the Variance

• $E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=E[||\sum_i S_iS^T_i||^2]-||R^TR||^2$

• $||\sum_i S_iS_i^T||^2=\sum_i |S_i|^4 + 2\sum_{i\lt i'}(S_i^TS_{i'})^2$

• $E[||\sum_i S_iS_i^T||^2]=\sum_i p_i|\frac{s_i}{\sqrt{p_i}}|^4 + 2\sum_{i\lt i'} p_ip_{i'}(\frac{s_i^T}{\sqrt{p_i}}\frac{s_{i'}}{\sqrt{p_{i'}}})^2$

• $||R^TR||^2= \sum_i |s_i|^4 + 2\sum_{i\lt i'}(s_i^Ts_{i'})^2$

Ramesh Hariharan

52

An Expression for the Variance

• $E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=E[||\sum_i S_iS^T_i||^2]-||R^TR||^2$

• $||\sum_i S_iS_i^T||^2=\sum_i |S_i|^4 + 2\sum_{i\lt i'}(S_i^TS_{i'})^2$

• $E[||\sum_i S_iS_i^T||^2]=\sum_i p_i|\frac{s_i}{\sqrt{p_i}}|^4 + 2\sum_{i\lt i'} p_ip_{i'}(\frac{s_i^T}{\sqrt{p_i}}\frac{s_{i'}}{\sqrt{p_{i'}}})^2$

• $||R^TR||^2= \sum_i |s_i|^4 + 2\sum_{i\lt i'}(s_i^Ts_{i'})^2$

• $E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=\sum_i (\frac{1}{p_i}-1)|s_i|^4$

Ramesh Hariharan

53

Choosing Sampling Probabilities

• Minimize variance by minimizing $\sum_i (\frac{1}{p_i}-1)|s_i|^4$

Ramesh Hariharan

54

Choosing Sampling Probabilities

• Minimize variance by minimizing $\sum_i (\frac{1}{p_i}-1)|s_i|^4$

• Using calculus

• $p_i \propto |s_i|^2$

Ramesh Hariharan

55

Choosing Sampling Probabilities

• Minimize variance by minimizing $\sum_i (\frac{1}{p_i}-1)|s_i|^4$

• Using calculus

• $p_i \propto |s_i|^2$

• Variance: $\leq (\sum_i|s_i|^2)^2$

Ramesh Hariharan

56

Reducing Variance by Averaging Multiple Trials

• Choose $S_i=\frac{s_i}{\sqrt{p_i}}$ with prob. $p_i$ and the 0 vector otherwise; calculate the sampling matrix $X_1=\sum_i S_i S_i^T$

Ramesh Hariharan

57

Reducing Variance by Averaging Multiple Trials

• Choose $S_i=\frac{s_i}{\sqrt{p_i}}$ with prob. $p_i$ and the 0 vector otherwise; calculate the sampling matrix $X_1=\sum_i S_i S_i^T$

• Repeat the above $k$ times and take $X=\frac{\sum_{l=1}^k X_l}{k}$

Ramesh Hariharan

58

Reducing Variance by Averaging Multiple Trials

• Choose $S_i=\frac{s_i}{\sqrt{p_i}}$ with prob. $p_i$ and the 0 vector otherwise; calculate the sampling matrix $X_1=\sum_i S_i S_i^T$

• Repeat the above $k$ times and take $X=\frac{\sum_{l=1}^k X_l}{k}$

• $E(X)=R^TR$

Ramesh Hariharan

59

Reducing Variance by Averaging Multiple Trials

• Choose $S_i=\frac{s_i}{\sqrt{p_i}}$ with prob. $p_i$ and the 0 vector otherwise; calculate the sampling matrix $X_1=\sum_i S_i S_i^T$

• Repeat the above $k$ times and take $X=\frac{\sum_{l=1}^k X_l}{k}$

• $E(X)=R^TR$

• $Var(X)=E[||X - R^TR||^2]\leq \frac{(\sum_i|s_i|^2)^2}{k}$

Ramesh Hariharan

60

Tail Bound

• Use Chebyshchev's inequality, $Pr(||X-R^TR||\geq \Delta)\leq \frac{E[||X - R^TR||^2]}{\Delta^2} \leq \frac{(\sum_i|s_i|^2)^2}{k\Delta^2}$

Ramesh Hariharan

61

Tail Bound

• Use Chebyshchev's inequality, $Pr(||X-R^TR||\geq \Delta)\leq \frac{E[||X - R^TR||^2]}{\Delta^2} \leq \frac{(\sum_i|s_i|^2)^2}{k\Delta^2}$

• $Pr(||X-R^TR||\geq \frac{c\sum_i |s_i|^2}{\sqrt{k}})\leq \frac{1}{c^2}$

Ramesh Hariharan

62

Tail Bound

• Use Chebyshchev's inequality, $Pr(||X-R^TR||\geq \Delta)\leq \frac{E[||X - R^TR||^2]}{\Delta^2} \leq \frac{(\sum_i|s_i|^2)^2}{k\Delta^2}$

• $Pr(||X-R^TR||\geq \frac{c\sum_i |s_i|^2}{\sqrt{k}})\leq \frac{1}{c^2}$

• $X$ is obtained from $k$ trials, each having just 1 point on expectation, so the expected time to compute $X$ is $O(d^2k)$, instead of $O(d^2n)$

Ramesh Hariharan

63

Tail Bound

• Use Chebyshchev's inequality, $Pr(||X-R^TR||\geq \Delta)\leq \frac{E[||X - R^TR||^2]}{\Delta^2} \leq \frac{(\sum_i|s_i|^2)^2}{k\Delta^2}$

• $Pr(||X-R^TR||\geq \frac{c\sum_i |s_i|^2}{\sqrt{k}})\leq \frac{1}{c^2}$

• $X$ is obtained from $k$ trials, each having just 1 point on expectation, so the expected time to compute $X$ is $O(d^2k)$, instead of $O(d^2n)$

• So $X$ is quite close to $R^TR$; what happens if we find the largest eigenvector of $X$ instead of $R^TR$? How close are they to each other?

Ramesh Hariharan

64

Closeness Of Eigenvalues

• Let the eigenvalues (eigenvectors) of $X$ be $a_1 (x_1),a_2 (x_2),\ldots$ and those of $R^TR$ be $e_1 (v_1), e_2 (v_2),\ldots$, both in decreasing order

Ramesh Hariharan

65

Closeness Of Eigenvalues

• Let the eigenvalues (eigenvectors) of $X$ be $a_1 (x_1),a_2 (x_2),\ldots$ and those of $R^TR$ be $e_1 (v_1), e_2 (v_2),\ldots$, both in decreasing order

• Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$

Ramesh Hariharan

66

Closeness Of Eigenvalues

• Let the eigenvalues (eigenvectors) of $X$ be $a_1 (x_1),a_2 (x_2),\ldots$ and those of $R^TR$ be $e_1 (v_1), e_2 (v_2),\ldots$, both in decreasing order

• Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$

• Sum of projection length squared of points $s_i$ on $v_1$ is $|Rv_1|^2=v_1^TR^TRv_1=e_1$

Ramesh Hariharan

67

Closeness Of Eigenvalues

• Let the eigenvalues (eigenvectors) of $X$ be $a_1 (x_1),a_2 (x_2),\ldots$ and those of $R^TR$ be $e_1 (v_1), e_2 (v_2),\ldots$, both in decreasing order

• Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$

• Sum of projection length squared of points $s_i$ on $v_1$ is $|Rv_1|^2=v_1^TR^TRv_1=e_1$

• Likewise $a_1$ in the sampled case

Ramesh Hariharan

68

Closeness Of Eigenvalues

• Let the eigenvalues (eigenvectors) of $X$ be $a_1 (x_1),a_2 (x_2),\ldots$ and those of $R^TR$ be $e_1 (v_1), e_2 (v_2),\ldots$, both in decreasing order

• Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$

• Sum of projection length squared of points $s_i$ on $v_1$ is $|Rv_1|^2=v_1^TR^TRv_1=e_1$

• Likewise $a_1$ in the sampled case

• $|a_1-e_1|\leq \frac{c\sum_i |s_i|^2}{\sqrt{k}}$ with prob $\geq 1-\frac{1}{c^2}$

Ramesh Hariharan

69

Proof of Theorem

• Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$

Ramesh Hariharan

70

Proof of Theorem

• Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$

• $X=\sum_i a_i x_i x_i^T$ and $R^TR=\sum_i e_i v_i v_i^T$

Ramesh Hariharan

71

Proof of Theorem

• Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$

• $X=\sum_i a_i x_i x_i^T$ and $R^TR=\sum_i e_i v_i v_i^T$

• The $x_i$'s are orthonormal as are the $v_i$'s

Ramesh Hariharan

72

Proof of Theorem

• Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$

• $X=\sum_i a_i x_i x_i^T$ and $R^TR=\sum_i e_i v_i v_i^T$

• The $x_i$'s are orthonormal as are the $v_i$'s

• $||X||^2=||\sum_i a_i x_i^T x_i||^2=\sum_i a_i^2$, $||R^TR||^2=||\sum_i e_i v_i^T v_i||^2=\sum_i e_i^2$

Ramesh Hariharan

73

Proof of Theorem

• Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$

• $X=\sum_i a_i x_i x_i^T$ and $R^TR=\sum_i e_i v_i v_i^T$

• The $x_i$'s are orthonormal as are the $v_i$'s

• $||X||^2=||\sum_i a_i x_i^T x_i||^2=\sum_i a_i^2$, $||R^TR||^2=||\sum_i e_i v_i^T v_i||^2=\sum_i e_i^2$

• $||X-R^TR||^2=||X||^2+||R^TR||^2-2Tr(XR^TR)$

• $\sum_i a_i^2+\sum_i e_i^2-2Tr(XR^TR)$

Ramesh Hariharan

74

Proof of Theorem

• $Tr(XR^TR)=Tr((\sum_i a_i x_i x_i^T)(\sum_i e_i v_i v_i^T) )$

Ramesh Hariharan

75

Proof of Theorem

• $Tr(XR^TR)=Tr((\sum_i a_i x_i x_i^T)(\sum_i e_i v_i v_i^T) )$

• $= Tr(\sum_{i,i'} a_ie_{i'} x_ix_i^T v_{i'} v_{i'}^T)$

Ramesh Hariharan

76

Proof of Theorem

• $Tr(XR^TR)=Tr((\sum_i a_i x_i x_i^T)(\sum_i e_i v_i v_i^T) )$

• $= Tr(\sum_{i,i'} a_ie_{i'} x_ix_i^T v_{i'} v_{i'}^T)$

• $= Tr(\sum_{i,i'} a_ie_{i'} v_{i'}^Tx_ix_i^T v_{i'} )$

Ramesh Hariharan

77

Proof of Theorem

• $Tr(XR^TR)=Tr((\sum_i a_i x_i x_i^T)(\sum_i e_i v_i v_i^T) )$

• $= Tr(\sum_{i,i'} a_ie_{i'} x_ix_i^T v_{i'} v_{i'}^T)$

• $= Tr(\sum_{i,i'} a_ie_{i'} v_{i'}^Tx_ix_i^T v_{i'} )$

• $= \sum_{i,i'} a_ie_{i'} (v_{i'}.x_i)^2$

Ramesh Hariharan

78

Proof of Theorem

• $Tr(XR^TR)=Tr((\sum_i a_i x_i x_i^T)(\sum_i e_i v_i v_i^T) )$

• $= Tr(\sum_{i,i'} a_ie_{i'} x_ix_i^T v_{i'} v_{i'}^T)$

• $= Tr(\sum_{i,i'} a_ie_{i'} v_{i'}^Tx_ix_i^T v_{i'} )$

• $= \sum_{i,i'} a_ie_{i'} (v_{i'}.x_i)^2$

• Now show that the largest this can get is if $v_i=x_i$ so $v_{i'}.x_i=0, \forall i'\not=i$

Ramesh Hariharan

Share
×