LargeScale Sparse Logistic Regression - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

LargeScale Sparse Logistic Regression

Description:

Let us consider the gradient descent for solving the optimization problem: ... Gradient Descent. Nesterov's method. smooth and convex. O(1/k) O(1/k2) ... – PowerPoint PPT presentation

Number of Views:163
Avg rating:3.0/5.0
Slides: 31
Provided by: sud22
Category:

less

Transcript and Presenter's Notes

Title: LargeScale Sparse Logistic Regression


1
Large-Scale Sparse Logistic Regression
  • Jieping Ye
  • Arizona State University
  • Joint work with Jun Liu and Jianhui Chen

2
  • Prediction Disease or not
  • Confidence (probability)
  • Identify Informative features

Sparse Logistic Regression
3
Logistic Regression
  • Logistic Regression (LR) has been applied to
  • Document classification (Brzezinski, 1999)
  • Natural language processing (Jurafsky and Martin,
    2000)
  • Computer vision (Friedman et al., 2000)
  • Bioinformatics (Liao and Chin, 2007)
  • Regularization is commonly applied to reduce
    overfitting and obtain a robust classifier. Two
    well-known regularizations are
  • L2-norm regularization (Minka, 2007)
  • L1-norm regularization (Koh et al., 2007)

4
Sparse Logistic Regression
  • L1-norm regularization leads to sparse logistic
    regression (SLR)
  • Simultaneous feature selection and classification
  • Enhanced model interpretability
  • Improved classification performance
  • Applications
  • M.-Y. Park and T. Hastie, Penalized Logistic
    Regression for Detecting Gene Interactions.
    Biostatistics, 2008.
  • T. Wu et al. Genomewide Association Analysis by
    Lasso Penalized Logistic Regression.
    Bioinformatics, 2009.

5
Large-Scale Sparse Logistic Regression
  • Many applications involve data of large
    dimensionality
  • The MRI images used in Alzheimers Disease study
    contain more than 1 million voxels (features)
  • Major Challenge
  • How to scale sparse logistic regression to
    large-scale problems?

6
The Proposed Lassplore Algorithm
  • Lassplore (LArge-Scale SParse LOgistic
    REgression) is a first-order method
  • Each iteration of Lassplore involves the
    matrix-vector multiplication only
  • Scale to large-size problems
  • Efficient for sparse data
  • Lassplore achieves the optimal convergence rate
    among all first-order methods

7
Outline
  • Logistic Regression
  • Sparse Logistic Regression
  • Lassplore
  • Experiments

8
Logistic Regression (1)
  • Logistic regression model is given by

9
Logistic Regression (2)
overfitting
10
L1-ball Constrained Logistic Regression
  • Favorable Properties
  • Obtaining sparse solution
  • Performing feature selection and classification
    simultaneously
  • Improving classification performance
  • How to solve the L1-ball constrained optimization
    problem?

11
Gradient Method for Sparse Logistic Regression
Let us consider the gradient descent for solving
the optimization problem
12
Euclidean Projection onto the L1-Ball
The Euclidean projection onto the L1-ball (Duchi
et al., 2008) is a building block, and it can be
solved in linear time (Liu and Ye, 2009).
13
Gradient Method Nesterovs Method (1)
Convergence rates
Nesterovs method achieves the lower-complexity
bound of smooth optimization by first-order
black-box method, and thus is an optimal method.
14
Gradient Method Nesterovs Method (2)
  • The theoretical number of iterations (up to a
    constant factor) for achieving an accuracy of
    10-8

15
Characteristics of the Lassplore
  • First-order black-box Oracle based method
  • At each iteration, we only need to evaluate
    the function value and gradient
  • Utilizing the Nesterovs method (Nesterov, 2003)
  • Global convergence rate of O(1/k2) for the
    general case
  • Linear convergence rate for the strongly
    convex case
  • An adaptive line search scheme
  • The step size is allowed to increase
    during the iterations
  • This line search scheme is applicable to
    the general smooth convex optimization

16
Key Components and Settings
  • Previous schemes for
  • Nesterovs constant scheme (Nesterov, 2003)
  • Nemirovskis line search scheme (Nemirovski, 1994)

17
Previous Line Search Schemes
  • Nesterovs constant scheme (Nesterov, 2003)
  • is set to a constant value L, the
    Lipschitz continuous gradient of the function
    g(.)
  • is dependent on the conditional number C
  • Nemirovskis line search scheme (Nemirovski,
    1994)
  • is allowed to increase, and upper-bounded
    by 2L
  • is identical for every function g(.)

18
Proposed Line Search Scheme
  • Characteristics
  • is allowed to adaptively tuned (increasing
    and decreasing) and upper-bounded by 2L
  • is dependent on
  • It preserves the optimal convergence rate
    (technical proof refers to the paper)

19
Related Work
  • Y. Nesterov. Gradient methods for minimizing
    composite objective function (Technical Report
    2007/76).
  • S. Becker, J. Bobin, and E. J. Candès. NESTA a
    fast and accurate first-order method for sparse
    recovery. 2009.
  • A. Beck and M. Teboulle. A fast iterative
    shrinkage-thresholding algorithm for linear
    inverse problems. SIAM Journal on Imaging
    Sciences, 2, 183-202, 2009.
  • K.-C. Toh and S. Yun. An accelerated proximal
    gradient algorithm for nuclear norm regularized
    least squares problems. Preprint, National
    University of Singapore, March 2009.
  • S. Ji and J. Ye. An Accelerated Gradient Method
    for Trace Norm Minimization. The Twenty-Sixth
    International Conference on Machine Learning,
    2009.

20
Experiments Data Sets
21
Comparison of the Line Search Schemes
Comparison the proposed adaptive scheme (Adap)
with the one proposed by Nemirovski (Nemi)
Objective
22
Pathwise Solutions Warm Start vs. Cold Start
23
Comparison with ProjectionL1 (Schmidt et al.,
2007)
24
Comparison with ProjectionL1 (Schmidt et al.,
2007)
25
Comparison with l1-logreg (Koh et al., 2007)
26
Drosophila Gene Expression Image Analysis
Drosophila embryogenesis is divided into 17
developmental stages (1-17)
27
Sparse Logistic Regression Application (2)
28
Summary
  • The Lassplore algorithm for sparse logistic
    regression
  • First-order black-box method
  • Optimal convergence rate
  • Adaptive line search scheme
  • Future work
  • Apply the proposed approach for other mixed-norm
    regularized optimization
  • Biological image analysis

29
The Lassplore Package
http//www.public.asu.edu/jye02/Software/lassplor
e/
30
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com