Sketching as a Tool for Numerical Linear Algebra - PowerPoint PPT Presentation

About This Presentation
Title:

Sketching as a Tool for Numerical Linear Algebra

Description:

Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden Simpler and Sharper Proofs [MM, NN, N] Let B = [A, b] be an n x (d+1) matrix Let U be an ... – PowerPoint PPT presentation

Number of Views:163
Avg rating:3.0/5.0
Slides: 31
Provided by: David1431
Category:

less

Transcript and Presenter's Notes

Title: Sketching as a Tool for Numerical Linear Algebra


1
Sketching as a Tool for Numerical Linear Algebra
  • David Woodruff
  • IBM Almaden

2
Talk Outline
  • Regression
  • Exact Regression Algorithms
  • Sketching to speed up Least Squares Regression
  • Sketching to speed up Least Absolute Deviation
    (l1) Regression
  • Low Rank Approximation
  • Sketching to speed up Low Rank Approximation
  • Recent Results and Open Questions
  • M-Estimators and robust regression
  • CUR decompositions

3
Regression
  • Linear Regression
  • Statistical method to study linear dependencies
    between variables in the presence of noise.
  • Example
  • Ohm's law V R I
  • Find linear function that
  • best fits the data

4
Regression
  • Standard Setting
  • One measured variable b
  • A set of predictor variables a ,, a
  • Assumption b x a
    x a x e
  • e is assumed to be noise and the xi are model
    parameters we want to learn
  • Can assume x0 0
  • Now consider n observations of b

1
d
1
d
1
d
0
5
Regression analysis
  • Matrix form
  • Input n?d-matrix A and a vector b(b1,, bn)n
    is the number of observations d is the number of
    predictor variables
  • Output x so that Ax and b are close
  • Consider the over-constrained case, when n À d
  • Assume that A has full column rank

6
Regression analysis
  • Least Squares Method
  • Find x that minimizes Ax-b22 S (bi ltAi,
    xgt)²
  • Ai is i-th row of A
  • Certain desirable statistical properties
  • Closed form solution x (ATA)-1 AT b
  • Method of least absolute deviation (l1
    -regression)
  • Find x that minimizes Ax-b1 S bi ltAi,
    xgt
  • Cost is less sensitive to outliers than least
    squares
  • Can solve via linear programming
  • Time complexities are at least nd2, we want
    better!

7
Talk Outline
  • Regression
  • Exact Regression Algorithms
  • Sketching to speed up Least Squares Regression
  • Sketching to speed up Least Absolute Deviation
    (l1) Regression
  • Low Rank Approximation
  • Sketching to speed up Low Rank Approximation
  • Recent Results and Open Questions
  • M-Estimators and robust regression
  • CUR decompositions

8
Sketching to solve least squares regression
  • How to find an approximate solution x to minx
    Ax-b2 ?
  • Goal output x for which Ax-b2 (1e) minx
    Ax-b2 with high probability
  • Draw S from a k x n random family of matrices,
    for a value k ltlt n
  • Compute SA and Sb
  • Output the solution x to minx (SA)x-(Sb)2

9
How to choose the right sketching matrix S?
  • Recall output the solution x to minx
    (SA)x-(Sb)2
  • Lots of matrices work
  • S is d/e2 x n matrix of i.i.d. Normal random
    variables
  • Computing SA may be slow

10
How to choose the right sketching matrix S? S
  • S is a Johnson Lindenstrauss Transform
  • S PHD
  • D is a diagonal matrix with 1, -1 on diagonals
  • H is the Hadamard transform
  • P just chooses a random (small) subset of rows of
    HD
  • SA can be computed much faster

11
Even faster sketching matrices CW
  • CountSketch matrix
  • Define k x n matrix S, for k O(d2/e2)
  • S is really sparse single randomly chosen
    non-zero entry per column

12
Simpler and Sharper Proofs MM, NN, N
  • Let B A, b be an n x (d1) matrix
  • Let U be an orthonormal basis for the columns of
    B
  • Suffices to show SUx2 1 e for all unit x
  • Implies S(Ax-b)2 (1 e) Ax-b2 for all x
  • SU is a (d1)2/e2 x (d1) matrix
  • Suffices to show UTST SU I2 UTST SU IF
    e
  • Matrix product result CSTSD CDF2 1/(
    rows of S) CF2 DF2
  • Set C UT and D U. Then U2F (d1) and (
    rows of S) (d1)2/e2

SBx2 (1e) Bx2 for all x S is called a
subspace embedding
13
Talk Outline
  • Regression
  • Exact Regression Algorithms
  • Sketching to speed up Least Squares Regression
  • Sketching to speed up Least Absolute Deviation
    (l1) Regression
  • Low Rank Approximation
  • Sketching to speed up Low Rank Approximation
  • Recent Results and Open Questions
  • M-Estimators and robust regression
  • CUR decompositions

14
Sketching to solve l1-regression
  • How to find an approximate solution x to minx
    Ax-b1 ?
  • Goal output x for which Ax-b1 (1e) minx
    Ax-b1 with high probability
  • Natural attempt Draw S from a k x n random
    family of matrices, for a value k ltlt n
  • Compute SA and Sb
  • Output the solution x to minx (SA)x-(Sb)1
  • Turns out this does not work

15
Sketching to solve l1-regression SW
  • Why doesnt outputting the solution x to minx
    (SA)x-(Sb)1 work?
  • Dont know of k x n matrices S with small k for
    which if x is solution to minx (SA)x-(Sb)1
    then
  • Ax-b1 (1e) minx Ax-b1
  • with high probability
  • Instead can find an S so that
  • Ax-b1 (d log d) minx Ax-b1
  • S is a matrix of i.i.d. Cauchy random variables
  • Property Ax-b1 S(Ax-b)1 (d log d) Ax-b1

16
Cauchy random variables
  • Cauchy random variables not as nice as Normal
    (Gaussian) random variables
  • They dont have a mean and have infinite variance
  • Ratio of two independent Normal random variables
    is Cauchy

If a and b are scalars and C1 and C2 independent
Cauchys, then aC1 bC2 (ab)C for a
Cauchy C
17
Sketching to solve l1-regression
  • Main Idea Let B A, b. Compute a
    QR-factorization of SB
  • Q has orthonormal columns and QR SB
  • BR-1 is a well-conditioning of B
  • Si1d BR-1ei1 Si1d SBR-1ei1
  • (d log d).5 Si1d SBR-1ei2
  • d (d log d).5
  • x1 x2
  • SBR-1x2
  • SBR-1x1
  • (d log d) BR-1x1
  • These two properties make importance sampling
    work!

18
Importance Sampling
  • Want to estimate Si1n yi by sampling, for yi 0
  • Suppose we sample yi with probability pi
  • T Si1n d(yi sampled) yi/pi
  • ET Si1n pi yi/pi Si1n yi
  • VarT ET2 Si1n pi yi 2 / pi2 (Si1n
    yi ) maxi yi/pi
  • Bound maxi yi/pi by e2 (Si1n yi )
  • For us, yi (Ax-b)i and this holds if pi
    eiBR-11 poly(d/e) !

19
Importance Sampling
  • To get a bound for all x, use Bernsteins
    inequality and a net argument
  • Sample poly(d/e) rows of BR-1 where the i-th row
    is sampled proportional to its 1-norm
  • T is diagonal matrix with Ti,i 0 if row i not
    sampled, otherwise Ti,i 1/Prrow i sampled
  • TBx1 (1 e )Bx1 for all x
  • Solve regression on the (reweighted) samples!

20
Sketching to solve l1-regression MM
  • Most expensive operation is computing SA where S
    is the matrix of i.i.d. Cauchy random variables
  • All other operations are in the smaller space
  • Can speed this up by choosing S as follows

21
Further sketching improvements WZ
Uses max-stability of exponentials Andoni max
yi/ei y1/e
For recent work on fast sampling-based
algorithms, see Richards talk!
  • Can show you need a fewer number of sampled rows
    in later steps if instead choose S as follows
  • Instead of diagonal of Cauchy random variables,
    choose diagonal of reciprocals of exponential
    random variables

22
Talk Outline
  • Regression
  • Exact Regression Algorithms
  • Sketching to speed up Least Squares Regression
  • Sketching to speed up Least Absolute Deviation
    (l1) Regression
  • Low Rank Approximation
  • Sketching to speed up Low Rank Approximation
  • Recent Results and Open Questions
  • M-Estimators and robust regression
  • CUR decompositions

23
Low rank approximation
  • A is an n x d matrix
  • Typically well-approximated by low rank matrix
  • E.g., only high rank because of noise
  • Want to output a rank k matrix A, so that
  • A-AF (1e) A-AkF,
  • w.h.p., where Ak argminrank k matrices B
    A-BF
  • (For matrix C, CF (Si,j Ci,j2)1/2 )

24
Solution to low-rank approximation S
  • Given n x d input matrix A
  • Compute SA using a sketching matrix S with k ltlt
    n rows. SA takes random linear combinations of
    rows of A

A
SA
  • Project rows of A onto SA, then find best rank-k
    approximation to points inside of SA.

25
Low Rank Approximation Idea
  • S can be matrix of i.i.d. Normals
  • S can be a Fast Johnson Lindenstrauss Matrix
  • S can be a CountSketch matrix
  • Regression problem minX Ak X AF
  • Solution is X I, and minimum is Ak AF
  • This is a generalized regression problem!
  • If S is a subspace embedding for column space of
    Ak and also if for any matrices B, C,
  • BSTSC BCF2 1/( rows of S) BF2 CF2
  • Then if X is the minimizer to minX SAkX SAF
    , then
  • Ak X AF (1e) minX AkX-AF (1e)
    Ak-AF
  • But minimizer X (SAk)- SA is in the row span
    of SA!

26
Caveat projecting the points onto SA is slow
  • Current algorithm
  • Compute SA (easy)
  • Project each of the rows onto SA
  • Find best rank-k approximation of projected
    points inside of rowspace of SA (easy)
  • Bottleneck is step 2
  • CW Turns out you can approximate the projection
  • Sketching for generalized regression again minX
    X(SA)-AF2

27
Talk Outline
  • Regression
  • Exact Regression Algorithms
  • Sketching to speed up Least Squares Regression
  • Sketching to speed up Least Absolute Deviation
    (l1) Regression
  • Low Rank Approximation
  • Sketching to speed up Low Rank Approximation
  • Recent Results and Open Questions
  • M-Estimators and robust regression
  • CUR decompositions

28
M-Estimators and Robust Regression
  • Solve minx Ax-bM
  • M R -gt R 0
  • yM Si1n M(yi)
  • Least squares and L1-regression are special cases
  • Huber function, given a parameter c
  • M(y) y2/(2c) for y c
  • M(y) y-c/2 otherwise
  • Enjoys smoothness properties of l2
  • and
  • robustness properties of l1

CW15 For M-estimators with at least linear and
at most quadratic growth, can get
O(1)-approximation in nnz(A) poly(d) time
29
CUR Decompositions
BW14 Can find a CUR decomposition in O(nnz(A)
log n) npoly(k/e) time with
O(k/e) columns, O(k/e) rows, and rank(U) k
30
Open Questions
  • Recent monograph in NOW Publishers
  • D. Woodruff, Sketching as a Tool for Numerical
    Linear Algebra
  • Other types of low rank approximation
  • (Spectral) How quickly can we find a rank k
    matrix A, so that
  • A-A2 (1e) A-Ak2,
  • w.h.p., where Ak argminrank k
    matrices B A-B2
  • (Robust) How quickly can we find a rank k
    matrix A, so that
  • A-A1 (1e) A-Ak1,
  • w.h.p., where Ak argminrank k
    matrices B A-B1
  • For other questions regarding Schatten norms and
    communication-efficiency, see reference above.
    Thanks!
Write a Comment
User Comments (0)
About PowerShow.com