
Massive Data Problems
Sampling for Masssive Matrices
Data Sciences Course, Jan 2015
Ramesh Hariharan
Principal Component Analysis
- Given a collection of points in $d$-dimensional space; $d$ is large
Ramesh Hariharan
Principal Component Analysis
Given a collection of points in $d$-dimensional space; $d$ is large
Find the 1-dimensional linear subspace (line through the origin) that maximizes point spread
Ramesh Hariharan
Principal Component Analysis
Given a collection of points in $d$-dimensional space; $d$ is large
Find the 1-dimensional linear subspace (line through the origin) that maximizes point spread
More generally, find the $r$-dimensional linear subspace that maximizes point spread
Ramesh Hariharan
Principal Component Analysis
Given a collection of points in $d$-dimensional space; $d$ is large
Find the 1-dimensional linear subspace (line through the origin) that maximizes point spread
More generally, find the $r$-dimensional linear subspace that maximizes point spread
As defined as sum of squares of projections of the points on to the linear subspace
Ramesh Hariharan
Principal Component Analysis
Given a collection of points in $d$-dimensional space; $d$ is large
Find the 1-dimensional linear subspace (line through the origin) that maximizes point spread
More generally, find the $r$-dimensional linear subspace that maximizes point spread
As defined as sum of squares of projections of the points on to the linear subspace
Ramesh Hariharan
Principal Component Analysis
- Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector
Ramesh Hariharan
Principal Component Analysis
Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector
Let $R$ denote the $n\times d$ matrix whose rows are $s_1,\ldots,s_n$
Ramesh Hariharan
Principal Component Analysis
Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector
Let $R$ denote the $n\times d$ matrix whose rows are $s_1,\ldots,s_n$
Let $x$ be the unit column vector denoting a line through the origin
Ramesh Hariharan
Principal Component Analysis
Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector
Let $R$ denote the $n\times d$ matrix whose rows are $s_1,\ldots,s_n$
Let $x$ be the unit column vector denoting a line through the origin
$(Rx)^T Rx$ denotes the sum of squares of protections
Ramesh Hariharan
Principal Component Analysis
Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector
Let $R$ denote the $n\times d$ matrix whose rows are $s_1,\ldots,s_n$
Let $x$ be the unit column vector denoting a line through the origin
$(Rx)^T Rx$ denotes the sum of squares of protections
We want the $x$ that maximizes this quantity
Ramesh Hariharan
Principal Component Analysis
Let $s_1,\ldots,s_n$ be the given points, each a $d$-dim vector
Let $R$ denote the $n\times d$ matrix whose rows are $s_1,\ldots,s_n$
Let $x$ be the unit column vector denoting a line through the origin
$(Rx)^T Rx$ denotes the sum of squares of protections
We want the $x$ that maximizes this quantity
Ramesh Hariharan
Principal Components And Eigenvectors
- $(Rx)^T Rx=x^T (R^T R) x$ is maximized by an eigenvector corresponding to the largest eigenvalue of $R^TR$
Ramesh Hariharan
Principal Components And Eigenvectors
$(Rx)^T Rx=x^T (R^T R) x$ is maximized by an eigenvector corresponding to the largest eigenvalue of $R^TR$
Note $R^TR$ is a real, symmetric $d\times d$ matrix, so eigenvalues are real, non-negative, and eigenvectors form a orthonormal basis
Ramesh Hariharan
Principal Components And Eigenvectors
$(Rx)^T Rx=x^T (R^T R) x$ is maximized by an eigenvector corresponding to the largest eigenvalue of $R^TR$
Note $R^TR$ is a real, symmetric $d\times d$ matrix, so eigenvalues are real, non-negative, and eigenvectors form a orthonormal basis
Any unit vector $x$ can be written as $x=\sum_{i=1}^d \alpha_i v_i$, where $v_i$ are the unit eigenvectors of $R^T R$ in decreasing order of eigenvalue $e_i$.
Ramesh Hariharan
Principal Components And Eigenvectors
$(Rx)^T Rx=x^T (R^T R) x$ is maximized by an eigenvector corresponding to the largest eigenvalue of $R^TR$
Note $R^TR$ is a real, symmetric $d\times d$ matrix, so eigenvalues are real, non-negative, and eigenvectors form a orthonormal basis
Any unit vector $x$ can be written as $x=\sum_{i=1}^d \alpha_i v_i$, where $v_i$ are the unit eigenvectors of $R^T R$ in decreasing order of eigenvalue $e_i$.
$\sum_i^d \alpha_i^2=1$; further for spherically sym. random $x$, with 62% probability, $\frac{0.5}{\sqrt{d}} \leq |\alpha_1|$
Ramesh Hariharan
Finding the Eigenvectors of $R^TR$
- $(R^TR)x=(R^TR)\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e_i v_i$
Ramesh Hariharan
Finding the Eigenvectors of $R^TR$
$(R^TR)x=(R^TR)\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e_i v_i$
$(R^TR)^l x= (R^TR)^l\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e^l_i v_i$
Ramesh Hariharan
Finding the Eigenvectors of $R^TR$
$(R^TR)x=(R^TR)\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e_i v_i$
$(R^TR)^l x= (R^TR)^l\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e^l_i v_i$
Start with a random $x$, apply $R^TR$ repeatedly, normalize to unit each time, and so on.
Ramesh Hariharan
Finding the Eigenvectors of $R^TR$
$(R^TR)x=(R^TR)\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e_i v_i$
$(R^TR)^l x= (R^TR)^l\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e^l_i v_i$
Start with a random $x$, apply $R^TR$ repeatedly, normalize to unit each time, and so on.
Assuming unique largest eigenvalue, as $l\rightarrow \infty$, $(R^TR)^l x\rightarrow v_1$
Ramesh Hariharan
Finding the Eigenvectors of $R^TR$
$(R^TR)x=(R^TR)\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e_i v_i$
$(R^TR)^l x= (R^TR)^l\sum_{i=1}^d \alpha_i v_i=\sum_{i=1}^d \alpha_i e^l_i v_i$
Start with a random $x$, apply $R^TR$ repeatedly, normalize to unit each time, and so on.
Assuming unique largest eigenvalue, as $l\rightarrow \infty$, $(R^TR)^l x\rightarrow v_1$
Ramesh Hariharan
Convergence Rate
- $|(R^TR)^lx - v_1|^2 =(\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}-1)^2+\frac{\sum_{i=2}^d \alpha_i^2e_i^{2l}}{{\sum_{i=1}^d \alpha^2_i e_i^{2l}}} = 2-2\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}$
Ramesh Hariharan
Convergence Rate
$|(R^TR)^lx - v_1|^2 =(\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}-1)^2+\frac{\sum_{i=2}^d \alpha_i^2e_i^{2l}}{{\sum_{i=1}^d \alpha^2_i e_i^{2l}}} = 2-2\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}$
$\sum_{i=1}^d \alpha^2_i e_i^{2l}\leq \alpha_1^2e_1^{2l}+(1-\alpha_1^2)e_2^{2l}\leq \alpha_1^2e_1^{2l}(1+(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}})$
Ramesh Hariharan
Convergence Rate
$|(R^TR)^lx - v_1|^2 =(\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}-1)^2+\frac{\sum_{i=2}^d \alpha_i^2e_i^{2l}}{{\sum_{i=1}^d \alpha^2_i e_i^{2l}}} = 2-2\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}$
$\sum_{i=1}^d \alpha^2_i e_i^{2l}\leq \alpha_1^2e_1^{2l}+(1-\alpha_1^2)e_2^{2l}\leq \alpha_1^2e_1^{2l}(1+(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}})$
So $|(R^TR)^lx - v_1|^2\leq 2 (1-\frac{1}{\sqrt{1+(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}}}})\leq(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}} \leq (4d-1) \frac{e_2^{2l}}{e_1^{2l}}$
Ramesh Hariharan
Convergence Rate
$|(R^TR)^lx - v_1|^2 =(\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}-1)^2+\frac{\sum_{i=2}^d \alpha_i^2e_i^{2l}}{{\sum_{i=1}^d \alpha^2_i e_i^{2l}}} = 2-2\frac{\alpha_1 e_1^{l}}{\sqrt{\sum_{i=1}^d \alpha^2_i e_i^{2l}}}$
$\sum_{i=1}^d \alpha^2_i e_i^{2l}\leq \alpha_1^2e_1^{2l}+(1-\alpha_1^2)e_2^{2l}\leq \alpha_1^2e_1^{2l}(1+(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}})$
So $|(R^TR)^lx - v_1|^2\leq 2 (1-\frac{1}{\sqrt{1+(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}}}})\leq(\frac{1}{\alpha_1^2}-1) \frac{e_2^{2l}}{e_1^{2l}} \leq (4d-1) \frac{e_2^{2l}}{e_1^{2l}}$
For $l>\frac{1}{2} \log(\frac{4d-1}{\epsilon^2})/\log(\frac{e_1}{e_2})$, $|(R^TR)^lx - v_1|\leq \epsilon$
Ramesh Hariharan
Time Complexity
- Computing $R^TR$ takes $d\times n\times d$ steps
Ramesh Hariharan
Time Complexity
Computing $R^TR$ takes $d\times n\times d$ steps
Each powering step takes $d\times d$, so time taken is $O(d^2n+d^2\log\frac{d}{\epsilon^2}/\log\frac{e_1}{e_2})$
Ramesh Hariharan
Time Complexity
Computing $R^TR$ takes $d\times n\times d$ steps
Each powering step takes $d\times d$, so time taken is $O(d^2n+d^2\log\frac{d}{\epsilon^2}/\log\frac{e_1}{e_2})$
What if $n$ is really large, can one reduce the dependence on n?
Ramesh Hariharan
Time Complexity
Computing $R^TR$ takes $d\times n\times d$ steps
Each powering step takes $d\times d$, so time taken is $O(d^2n+d^2\log\frac{d}{\epsilon^2}/\log\frac{e_1}{e_2})$
What if $n$ is really large, can one reduce the dependence on n?
Sampling? Uniform? If not, which points should we prefer?
Ramesh Hariharan
Time Complexity
Computing $R^TR$ takes $d\times n\times d$ steps
Each powering step takes $d\times d$, so time taken is $O(d^2n+d^2\log\frac{d}{\epsilon^2}/\log\frac{e_1}{e_2})$
What if $n$ is really large, can one reduce the dependence on n?
Sampling? Uniform? If not, which points should we prefer?
Intuitively, prefer points far away from the origin.
Ramesh Hariharan
Time Complexity
Computing $R^TR$ takes $d\times n\times d$ steps
Each powering step takes $d\times d$, so time taken is $O(d^2n+d^2\log\frac{d}{\epsilon^2}/\log\frac{e_1}{e_2})$
What if $n$ is really large, can one reduce the dependence on n?
Sampling? Uniform? If not, which points should we prefer?
Intuitively, prefer points far away from the origin.
Ramesh Hariharan
Sampling for Principal Component Analysis
- Sample each point $s_i$ with probability $p_i$; hopefully only a few points will be sampled
Ramesh Hariharan
Sampling for Principal Component Analysis
Sample each point $s_i$ with probability $p_i$; hopefully only a few points will be sampled
We will also need to weigh each point by $1/\sqrt{p_i}$. Why?
Ramesh Hariharan
Sampling for Principal Component Analysis
Sample each point $s_i$ with probability $p_i$; hopefully only a few points will be sampled
We will also need to weigh each point by $1/\sqrt{p_i}$. Why?
Note $ R^TR = \sum_{i=1}^n s_i s_i^T$
Ramesh Hariharan
Sampling for Principal Component Analysis
Sample each point $s_i$ with probability $p_i$; hopefully only a few points will be sampled
We will also need to weigh each point by $1/\sqrt{p_i}$. Why?
Note $ R^TR = \sum_{i=1}^n s_i s_i^T$
Expectation: $\sum_{i=1}^n p_i \frac{s_i}{\sqrt{p_i}} \frac{s_i^T}{\sqrt{p_i}} = R^TR $ is what we want
Ramesh Hariharan
Sampling for Principal Component Analysis
Sample each point $s_i$ with probability $p_i$; hopefully only a few points will be sampled
We will also need to weigh each point by $1/\sqrt{p_i}$. Why?
Note $ R^TR = \sum_{i=1}^n s_i s_i^T$
Expectation: $\sum_{i=1}^n p_i \frac{s_i}{\sqrt{p_i}} \frac{s_i^T}{\sqrt{p_i}} = R^TR $ is what we want
Variance?
Ramesh Hariharan
Variance and the Frobenius Norm
- Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$
Ramesh Hariharan
Variance and the Frobenius Norm
Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$
$E(\sum_{i=1}^n S_i S^T_i) = R^TR$
Ramesh Hariharan
Variance and the Frobenius Norm
Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$
$E(\sum_{i=1}^n S_i S^T_i) = R^TR$
Define Variance as sum of the variances of the individual matrix entries
Ramesh Hariharan
Variance and the Frobenius Norm
Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$
$E(\sum_{i=1}^n S_i S^T_i) = R^TR$
Define Variance as sum of the variances of the individual matrix entries
Let $||M||$, the Frobenius Norm of matrix $M$ be $\sqrt{\sum_{ij} M_{ij}^2}$
Ramesh Hariharan
Variance and the Frobenius Norm
Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$
$E(\sum_{i=1}^n S_i S^T_i) = R^TR$
Define Variance as sum of the variances of the individual matrix entries
Let $||M||$, the Frobenius Norm of matrix $M$ be $\sqrt{\sum_{ij} M_{ij}^2}$
$Var(\sum_{i=1}^n S_i S^T_i)=E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]$
Ramesh Hariharan
Variance and the Frobenius Norm
Random Variable $S_i=\frac{1}{\sqrt{p_i}}s_i$ with probablity $p_i$
$E(\sum_{i=1}^n S_i S^T_i) = R^TR$
Define Variance as sum of the variances of the individual matrix entries
Let $||M||$, the Frobenius Norm of matrix $M$ be $\sqrt{\sum_{ij} M_{ij}^2}$
$Var(\sum_{i=1}^n S_i S^T_i)=E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]$
Ramesh Hariharan
Frobenius Norm Properties
- $||M||^2=Tr(MM^T)$
Ramesh Hariharan
Frobenius Norm Properties
$||M||^2=Tr(MM^T)$
Note $Tr(M+N)=Tr(M)+Tr(N)$ and $Tr(AB)=Tr(BA)$, $Tr(ABC)=Tr(CAB)$
Ramesh Hariharan
Frobenius Norm Properties
$||M||^2=Tr(MM^T)$
Note $Tr(M+N)=Tr(M)+Tr(N)$ and $Tr(AB)=Tr(BA)$, $Tr(ABC)=Tr(CAB)$
If $M=\sum_i v_i v^T_i$ for vectors $v_i$, then $||M||^2=\sum_i |v_i|^4 + 2\sum_{i\lt i'}(v_i^Tv_{i'})^2$
Ramesh Hariharan
Frobenius Norm Properties
$||M||^2=Tr(MM^T)$
Note $Tr(M+N)=Tr(M)+Tr(N)$ and $Tr(AB)=Tr(BA)$, $Tr(ABC)=Tr(CAB)$
If $M=\sum_i v_i v^T_i$ for vectors $v_i$, then $||M||^2=\sum_i |v_i|^4 + 2\sum_{i\lt i'}(v_i^Tv_{i'})^2$
For symmetric matrices $M,N$, $||M-N||^2=||M||^2+||N||^2-2Tr(MN)$
Ramesh Hariharan
Frobenius Norm Properties
$||M||^2=Tr(MM^T)$
Note $Tr(M+N)=Tr(M)+Tr(N)$ and $Tr(AB)=Tr(BA)$, $Tr(ABC)=Tr(CAB)$
If $M=\sum_i v_i v^T_i$ for vectors $v_i$, then $||M||^2=\sum_i |v_i|^4 + 2\sum_{i\lt i'}(v_i^Tv_{i'})^2$
For symmetric matrices $M,N$, $||M-N||^2=||M||^2+||N||^2-2Tr(MN)$
If $M$ is a random symmetric matrix with $E(M)=N$, then $E(Tr(MN))=||N||^2$ and $E[||M-N||^2]=E[||M||^2]-||N||^2$
Ramesh Hariharan
An Expression for the Variance
- $E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=E[||\sum_i S_iS^T_i||^2]-||R^TR||^2$
Ramesh Hariharan
An Expression for the Variance
$E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=E[||\sum_i S_iS^T_i||^2]-||R^TR||^2$
- $||\sum_i S_iS_i^T||^2=\sum_i |S_i|^4 + 2\sum_{i\lt i'}(S_i^TS_{i'})^2$
Ramesh Hariharan
An Expression for the Variance
$E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=E[||\sum_i S_iS^T_i||^2]-||R^TR||^2$
$||\sum_i S_iS_i^T||^2=\sum_i |S_i|^4 + 2\sum_{i\lt i'}(S_i^TS_{i'})^2$
$E[||\sum_i S_iS_i^T||^2]=\sum_i p_i|\frac{s_i}{\sqrt{p_i}}|^4 + 2\sum_{i\lt i'} p_ip_{i'}(\frac{s_i^T}{\sqrt{p_i}}\frac{s_{i'}}{\sqrt{p_{i'}}})^2$
Ramesh Hariharan
An Expression for the Variance
$E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=E[||\sum_i S_iS^T_i||^2]-||R^TR||^2$
$||\sum_i S_iS_i^T||^2=\sum_i |S_i|^4 + 2\sum_{i\lt i'}(S_i^TS_{i'})^2$
$E[||\sum_i S_iS_i^T||^2]=\sum_i p_i|\frac{s_i}{\sqrt{p_i}}|^4 + 2\sum_{i\lt i'} p_ip_{i'}(\frac{s_i^T}{\sqrt{p_i}}\frac{s_{i'}}{\sqrt{p_{i'}}})^2$
$||R^TR||^2= \sum_i |s_i|^4 + 2\sum_{i\lt i'}(s_i^Ts_{i'})^2$
Ramesh Hariharan
An Expression for the Variance
$E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=E[||\sum_i S_iS^T_i||^2]-||R^TR||^2$
$||\sum_i S_iS_i^T||^2=\sum_i |S_i|^4 + 2\sum_{i\lt i'}(S_i^TS_{i'})^2$
$E[||\sum_i S_iS_i^T||^2]=\sum_i p_i|\frac{s_i}{\sqrt{p_i}}|^4 + 2\sum_{i\lt i'} p_ip_{i'}(\frac{s_i^T}{\sqrt{p_i}}\frac{s_{i'}}{\sqrt{p_{i'}}})^2$
$||R^TR||^2= \sum_i |s_i|^4 + 2\sum_{i\lt i'}(s_i^Ts_{i'})^2$
$E[||\sum_{i=1}^n S_iS_i^T - R^TR||^2]=\sum_i (\frac{1}{p_i}-1)|s_i|^4$
Ramesh Hariharan
Choosing Sampling Probabilities
- Minimize variance by minimizing $\sum_i (\frac{1}{p_i}-1)|s_i|^4$
Ramesh Hariharan
Choosing Sampling Probabilities
Minimize variance by minimizing $\sum_i (\frac{1}{p_i}-1)|s_i|^4$
Using calculus
- $p_i \propto |s_i|^2$
Ramesh Hariharan
Choosing Sampling Probabilities
Minimize variance by minimizing $\sum_i (\frac{1}{p_i}-1)|s_i|^4$
Using calculus
- $p_i \propto |s_i|^2$
- Variance: $\leq (\sum_i|s_i|^2)^2$
Ramesh Hariharan
Reducing Variance by Averaging Multiple Trials
- Choose $S_i=\frac{s_i}{\sqrt{p_i}}$ with prob. $p_i$ and the 0 vector otherwise; calculate the sampling matrix $X_1=\sum_i S_i S_i^T$
Ramesh Hariharan
Reducing Variance by Averaging Multiple Trials
Choose $S_i=\frac{s_i}{\sqrt{p_i}}$ with prob. $p_i$ and the 0 vector otherwise; calculate the sampling matrix $X_1=\sum_i S_i S_i^T$
Repeat the above $k$ times and take $X=\frac{\sum_{l=1}^k X_l}{k}$
Ramesh Hariharan
Reducing Variance by Averaging Multiple Trials
Choose $S_i=\frac{s_i}{\sqrt{p_i}}$ with prob. $p_i$ and the 0 vector otherwise; calculate the sampling matrix $X_1=\sum_i S_i S_i^T$
Repeat the above $k$ times and take $X=\frac{\sum_{l=1}^k X_l}{k}$
$E(X)=R^TR$
Ramesh Hariharan
Reducing Variance by Averaging Multiple Trials
Choose $S_i=\frac{s_i}{\sqrt{p_i}}$ with prob. $p_i$ and the 0 vector otherwise; calculate the sampling matrix $X_1=\sum_i S_i S_i^T$
Repeat the above $k$ times and take $X=\frac{\sum_{l=1}^k X_l}{k}$
$E(X)=R^TR$
$Var(X)=E[||X - R^TR||^2]\leq \frac{(\sum_i|s_i|^2)^2}{k}$
Ramesh Hariharan
Tail Bound
- Use Chebyshchev's inequality, $Pr(||X-R^TR||\geq \Delta)\leq \frac{E[||X - R^TR||^2]}{\Delta^2} \leq \frac{(\sum_i|s_i|^2)^2}{k\Delta^2}$
Ramesh Hariharan
Tail Bound
Use Chebyshchev's inequality, $Pr(||X-R^TR||\geq \Delta)\leq \frac{E[||X - R^TR||^2]}{\Delta^2} \leq \frac{(\sum_i|s_i|^2)^2}{k\Delta^2}$
$Pr(||X-R^TR||\geq \frac{c\sum_i |s_i|^2}{\sqrt{k}})\leq \frac{1}{c^2}$
Ramesh Hariharan
Tail Bound
Use Chebyshchev's inequality, $Pr(||X-R^TR||\geq \Delta)\leq \frac{E[||X - R^TR||^2]}{\Delta^2} \leq \frac{(\sum_i|s_i|^2)^2}{k\Delta^2}$
$Pr(||X-R^TR||\geq \frac{c\sum_i |s_i|^2}{\sqrt{k}})\leq \frac{1}{c^2}$
$X$ is obtained from $k$ trials, each having just 1 point on expectation, so the expected time to compute $X$ is $O(d^2k)$, instead of $O(d^2n)$
Ramesh Hariharan
Tail Bound
Use Chebyshchev's inequality, $Pr(||X-R^TR||\geq \Delta)\leq \frac{E[||X - R^TR||^2]}{\Delta^2} \leq \frac{(\sum_i|s_i|^2)^2}{k\Delta^2}$
$Pr(||X-R^TR||\geq \frac{c\sum_i |s_i|^2}{\sqrt{k}})\leq \frac{1}{c^2}$
$X$ is obtained from $k$ trials, each having just 1 point on expectation, so the expected time to compute $X$ is $O(d^2k)$, instead of $O(d^2n)$
So $X$ is quite close to $R^TR$; what happens if we find the largest eigenvector of $X$ instead of $R^TR$? How close are they to each other?
Ramesh Hariharan
Closeness Of Eigenvalues
- Let the eigenvalues (eigenvectors) of $X$ be $a_1 (x_1),a_2 (x_2),\ldots$ and those of $R^TR$ be $e_1 (v_1), e_2 (v_2),\ldots$, both in decreasing order
Ramesh Hariharan
Closeness Of Eigenvalues
Let the eigenvalues (eigenvectors) of $X$ be $a_1 (x_1),a_2 (x_2),\ldots$ and those of $R^TR$ be $e_1 (v_1), e_2 (v_2),\ldots$, both in decreasing order
Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$
Ramesh Hariharan
Closeness Of Eigenvalues
Let the eigenvalues (eigenvectors) of $X$ be $a_1 (x_1),a_2 (x_2),\ldots$ and those of $R^TR$ be $e_1 (v_1), e_2 (v_2),\ldots$, both in decreasing order
Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$
Sum of projection length squared of points $s_i$ on $v_1$ is $|Rv_1|^2=v_1^TR^TRv_1=e_1$
Ramesh Hariharan
Closeness Of Eigenvalues
Let the eigenvalues (eigenvectors) of $X$ be $a_1 (x_1),a_2 (x_2),\ldots$ and those of $R^TR$ be $e_1 (v_1), e_2 (v_2),\ldots$, both in decreasing order
Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$
Sum of projection length squared of points $s_i$ on $v_1$ is $|Rv_1|^2=v_1^TR^TRv_1=e_1$
Likewise $a_1$ in the sampled case
Ramesh Hariharan
Closeness Of Eigenvalues
Let the eigenvalues (eigenvectors) of $X$ be $a_1 (x_1),a_2 (x_2),\ldots$ and those of $R^TR$ be $e_1 (v_1), e_2 (v_2),\ldots$, both in decreasing order
Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$
Sum of projection length squared of points $s_i$ on $v_1$ is $|Rv_1|^2=v_1^TR^TRv_1=e_1$
Likewise $a_1$ in the sampled case
$|a_1-e_1|\leq \frac{c\sum_i |s_i|^2}{\sqrt{k}}$ with prob $\geq 1-\frac{1}{c^2}$
Ramesh Hariharan
Proof of Theorem
- Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$
Ramesh Hariharan
Proof of Theorem
Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$
$X=\sum_i a_i x_i x_i^T$ and $R^TR=\sum_i e_i v_i v_i^T$
Ramesh Hariharan
Proof of Theorem
Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$
$X=\sum_i a_i x_i x_i^T$ and $R^TR=\sum_i e_i v_i v_i^T$
The $x_i$'s are orthonormal as are the $v_i$'s
Ramesh Hariharan
Proof of Theorem
Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$
$X=\sum_i a_i x_i x_i^T$ and $R^TR=\sum_i e_i v_i v_i^T$
The $x_i$'s are orthonormal as are the $v_i$'s
$||X||^2=||\sum_i a_i x_i^T x_i||^2=\sum_i a_i^2$, $||R^TR||^2=||\sum_i e_i v_i^T v_i||^2=\sum_i e_i^2$
Ramesh Hariharan
Proof of Theorem
Thm: $||X-R^TR||^2\geq \sum_i (a_i-e_i)^2$
$X=\sum_i a_i x_i x_i^T$ and $R^TR=\sum_i e_i v_i v_i^T$
The $x_i$'s are orthonormal as are the $v_i$'s
$||X||^2=||\sum_i a_i x_i^T x_i||^2=\sum_i a_i^2$, $||R^TR||^2=||\sum_i e_i v_i^T v_i||^2=\sum_i e_i^2$
$||X-R^TR||^2=||X||^2+||R^TR||^2-2Tr(XR^TR)$
- $\sum_i a_i^2+\sum_i e_i^2-2Tr(XR^TR)$
Ramesh Hariharan
Proof of Theorem
- $Tr(XR^TR)=Tr((\sum_i a_i x_i x_i^T)(\sum_i e_i v_i v_i^T) )$
Ramesh Hariharan
Proof of Theorem
$Tr(XR^TR)=Tr((\sum_i a_i x_i x_i^T)(\sum_i e_i v_i v_i^T) )$
$= Tr(\sum_{i,i'} a_ie_{i'} x_ix_i^T v_{i'} v_{i'}^T)$
Ramesh Hariharan
Proof of Theorem
$Tr(XR^TR)=Tr((\sum_i a_i x_i x_i^T)(\sum_i e_i v_i v_i^T) )$
$= Tr(\sum_{i,i'} a_ie_{i'} x_ix_i^T v_{i'} v_{i'}^T)$
$= Tr(\sum_{i,i'} a_ie_{i'} v_{i'}^Tx_ix_i^T v_{i'} )$
Ramesh Hariharan
Proof of Theorem
$Tr(XR^TR)=Tr((\sum_i a_i x_i x_i^T)(\sum_i e_i v_i v_i^T) )$
$= Tr(\sum_{i,i'} a_ie_{i'} x_ix_i^T v_{i'} v_{i'}^T)$
$= Tr(\sum_{i,i'} a_ie_{i'} v_{i'}^Tx_ix_i^T v_{i'} )$
$= \sum_{i,i'} a_ie_{i'} (v_{i'}.x_i)^2 $
Ramesh Hariharan
Proof of Theorem
$Tr(XR^TR)=Tr((\sum_i a_i x_i x_i^T)(\sum_i e_i v_i v_i^T) )$
$= Tr(\sum_{i,i'} a_ie_{i'} x_ix_i^T v_{i'} v_{i'}^T)$
$= Tr(\sum_{i,i'} a_ie_{i'} v_{i'}^Tx_ix_i^T v_{i'} )$
$= \sum_{i,i'} a_ie_{i'} (v_{i'}.x_i)^2 $
Now show that the largest this can get is if $v_i=x_i$ so $v_{i'}.x_i=0, \forall i'\not=i$
Ramesh Hariharan