Sketching as a Tool for Numerical Linear Algebra - PowerPoint PPT Presentation

About This Presentation

Title:

Sketching as a Tool for Numerical Linear Algebra

Description:

Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden Simpler and Sharper Proofs [MM, NN, N] Let B = [A, b] be an n x (d+1) matrix Let U be an ... – PowerPoint PPT presentation

Number of Views:167

Avg rating:3.0/5.0

Slides: 31

Provided by: David1431

Learn more at: https://simons.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Sketching as a Tool for Numerical Linear Algebra

1
Sketching as a Tool for Numerical Linear Algebra

David Woodruff
IBM Almaden

2
Talk Outline

Regression
Exact Regression Algorithms
Sketching to speed up Least Squares Regression
Sketching to speed up Least Absolute Deviation
(l1) Regression
Low Rank Approximation
Sketching to speed up Low Rank Approximation
Recent Results and Open Questions
M-Estimators and robust regression
CUR decompositions

3
Regression

Linear Regression
Statistical method to study linear dependencies
between variables in the presence of noise.
Example
Ohm's law V R I
Find linear function that
best fits the data

4
Regression

Standard Setting
One measured variable b
A set of predictor variables a ,, a
Assumption b x a
x a x e
e is assumed to be noise and the xi are model
parameters we want to learn
Can assume x0 0
Now consider n observations of b

1
d
1
d
1
d
0
5
Regression analysis

Matrix form
Input n?d-matrix A and a vector b(b1,, bn)n
is the number of observations d is the number of
predictor variables
Output x so that Ax and b are close
Consider the over-constrained case, when n À d
Assume that A has full column rank

6
Regression analysis

Least Squares Method
Find x that minimizes Ax-b22 S (bi ltAi,
xgt)²
Ai is i-th row of A
Certain desirable statistical properties
Closed form solution x (ATA)-1 AT b
Method of least absolute deviation (l1
-regression)
Find x that minimizes Ax-b1 S bi ltAi,
xgt
Cost is less sensitive to outliers than least
squares
Can solve via linear programming
Time complexities are at least nd2, we want
better!

7
Talk Outline

Regression
Exact Regression Algorithms
Sketching to speed up Least Squares Regression
Sketching to speed up Least Absolute Deviation
(l1) Regression
Low Rank Approximation
Sketching to speed up Low Rank Approximation
Recent Results and Open Questions
M-Estimators and robust regression
CUR decompositions

8
Sketching to solve least squares regression

How to find an approximate solution x to minx
Ax-b2 ?
Goal output x for which Ax-b2 (1e) minx
Ax-b2 with high probability
Draw S from a k x n random family of matrices,
for a value k ltlt n
Compute SA and Sb
Output the solution x to minx (SA)x-(Sb)2

9
How to choose the right sketching matrix S?

Recall output the solution x to minx
(SA)x-(Sb)2
Lots of matrices work
S is d/e2 x n matrix of i.i.d. Normal random
variables
Computing SA may be slow

10
How to choose the right sketching matrix S? S

S is a Johnson Lindenstrauss Transform
S PHD
D is a diagonal matrix with 1, -1 on diagonals
H is the Hadamard transform
P just chooses a random (small) subset of rows of
HD
SA can be computed much faster

11
Even faster sketching matrices CW

CountSketch matrix
Define k x n matrix S, for k O(d2/e2)
S is really sparse single randomly chosen
non-zero entry per column

12
Simpler and Sharper Proofs MM, NN, N

Let B A, b be an n x (d1) matrix
Let U be an orthonormal basis for the columns of
B
Suffices to show SUx2 1 e for all unit x
Implies S(Ax-b)2 (1 e) Ax-b2 for all x
SU is a (d1)2/e2 x (d1) matrix
Suffices to show UTST SU I2 UTST SU IF
e
Matrix product result CSTSD CDF2 1/(
rows of S) CF2 DF2
Set C UT and D U. Then U2F (d1) and (
rows of S) (d1)2/e2

SBx2 (1e) Bx2 for all x S is called a
subspace embedding
13
Talk Outline

Regression
Exact Regression Algorithms
Sketching to speed up Least Squares Regression
Sketching to speed up Least Absolute Deviation
(l1) Regression
Low Rank Approximation
Sketching to speed up Low Rank Approximation
Recent Results and Open Questions
M-Estimators and robust regression
CUR decompositions

14
Sketching to solve l1-regression

How to find an approximate solution x to minx
Ax-b1 ?
Goal output x for which Ax-b1 (1e) minx
Ax-b1 with high probability
Natural attempt Draw S from a k x n random
family of matrices, for a value k ltlt n
Compute SA and Sb
Output the solution x to minx (SA)x-(Sb)1
Turns out this does not work

15
Sketching to solve l1-regression SW

Why doesnt outputting the solution x to minx
(SA)x-(Sb)1 work?
Dont know of k x n matrices S with small k for
which if x is solution to minx (SA)x-(Sb)1
then
Ax-b1 (1e) minx Ax-b1
with high probability
Instead can find an S so that
Ax-b1 (d log d) minx Ax-b1
S is a matrix of i.i.d. Cauchy random variables
Property Ax-b1 S(Ax-b)1 (d log d) Ax-b1

16
Cauchy random variables

Cauchy random variables not as nice as Normal
(Gaussian) random variables
They dont have a mean and have infinite variance
Ratio of two independent Normal random variables
is Cauchy

If a and b are scalars and C1 and C2 independent
Cauchys, then aC1 bC2 (ab)C for a
Cauchy C
17
Sketching to solve l1-regression

Main Idea Let B A, b. Compute a
QR-factorization of SB
Q has orthonormal columns and QR SB
BR-1 is a well-conditioning of B
Si1d BR-1ei1 Si1d SBR-1ei1
(d log d).5 Si1d SBR-1ei2
d (d log d).5
x1 x2
SBR-1x2
SBR-1x1
(d log d) BR-1x1
These two properties make importance sampling
work!

18
Importance Sampling

Want to estimate Si1n yi by sampling, for yi 0
Suppose we sample yi with probability pi
T Si1n d(yi sampled) yi/pi
ET Si1n pi yi/pi Si1n yi
VarT ET2 Si1n pi yi 2 / pi2 (Si1n
yi ) maxi yi/pi
Bound maxi yi/pi by e2 (Si1n yi )
For us, yi (Ax-b)i and this holds if pi
eiBR-11 poly(d/e) !

19
Importance Sampling

To get a bound for all x, use Bernsteins
inequality and a net argument
Sample poly(d/e) rows of BR-1 where the i-th row
is sampled proportional to its 1-norm
T is diagonal matrix with Ti,i 0 if row i not
sampled, otherwise Ti,i 1/Prrow i sampled
TBx1 (1 e )Bx1 for all x
Solve regression on the (reweighted) samples!

20
Sketching to solve l1-regression MM

Most expensive operation is computing SA where S
is the matrix of i.i.d. Cauchy random variables
All other operations are in the smaller space
Can speed this up by choosing S as follows

21
Further sketching improvements WZ
Uses max-stability of exponentials Andoni max
yi/ei y1/e
For recent work on fast sampling-based
algorithms, see Richards talk!

Can show you need a fewer number of sampled rows
in later steps if instead choose S as follows
Instead of diagonal of Cauchy random variables,
choose diagonal of reciprocals of exponential
random variables

22
Talk Outline

Regression
Exact Regression Algorithms
Sketching to speed up Least Squares Regression
Sketching to speed up Least Absolute Deviation
(l1) Regression
Low Rank Approximation
Sketching to speed up Low Rank Approximation
Recent Results and Open Questions
M-Estimators and robust regression
CUR decompositions

23
Low rank approximation

A is an n x d matrix
Typically well-approximated by low rank matrix
E.g., only high rank because of noise
Want to output a rank k matrix A, so that
A-AF (1e) A-AkF,
w.h.p., where Ak argminrank k matrices B
A-BF
(For matrix C, CF (Si,j Ci,j2)1/2 )

24
Solution to low-rank approximation S

Given n x d input matrix A
Compute SA using a sketching matrix S with k ltlt
n rows. SA takes random linear combinations of
rows of A

A
SA

Project rows of A onto SA, then find best rank-k
approximation to points inside of SA.

25
Low Rank Approximation Idea

S can be matrix of i.i.d. Normals
S can be a Fast Johnson Lindenstrauss Matrix
S can be a CountSketch matrix

Regression problem minX Ak X AF
Solution is X I, and minimum is Ak AF
This is a generalized regression problem!
If S is a subspace embedding for column space of
Ak and also if for any matrices B, C,
BSTSC BCF2 1/( rows of S) BF2 CF2
Then if X is the minimizer to minX SAkX SAF
, then
Ak X AF (1e) minX AkX-AF (1e)
Ak-AF
But minimizer X (SAk)- SA is in the row span
of SA!

26
Caveat projecting the points onto SA is slow

Current algorithm
Compute SA (easy)
Project each of the rows onto SA
Find best rank-k approximation of projected
points inside of rowspace of SA (easy)
Bottleneck is step 2
CW Turns out you can approximate the projection
Sketching for generalized regression again minX
X(SA)-AF2

27
Talk Outline

Regression
Exact Regression Algorithms
Sketching to speed up Least Squares Regression
Sketching to speed up Least Absolute Deviation
(l1) Regression
Low Rank Approximation
Sketching to speed up Low Rank Approximation
Recent Results and Open Questions
M-Estimators and robust regression
CUR decompositions

28
M-Estimators and Robust Regression

Solve minx Ax-bM
M R -gt R 0
yM Si1n M(yi)
Least squares and L1-regression are special cases

Huber function, given a parameter c
M(y) y2/(2c) for y c
M(y) y-c/2 otherwise
Enjoys smoothness properties of l2
and
robustness properties of l1

CW15 For M-estimators with at least linear and
at most quadratic growth, can get
O(1)-approximation in nnz(A) poly(d) time
29
CUR Decompositions
BW14 Can find a CUR decomposition in O(nnz(A)
log n) npoly(k/e) time with
O(k/e) columns, O(k/e) rows, and rank(U) k
30
Open Questions

Recent monograph in NOW Publishers
D. Woodruff, Sketching as a Tool for Numerical
Linear Algebra
Other types of low rank approximation
(Spectral) How quickly can we find a rank k
matrix A, so that
A-A2 (1e) A-Ak2,
w.h.p., where Ak argminrank k
matrices B A-B2
(Robust) How quickly can we find a rank k
matrix A, so that
A-A1 (1e) A-Ak1,
w.h.p., where Ak argminrank k
matrices B A-B1
For other questions regarding Schatten norms and
communication-efficiency, see reference above.
Thanks!