LargeScale Sparse Logistic Regression presentation

About This Presentation

Transcript and Presenter's Notes

Title: LargeScale Sparse Logistic Regression

1
Large-Scale Sparse Logistic Regression

Jieping Ye
Arizona State University
Joint work with Jun Liu and Jianhui Chen

Prediction Disease or not
Confidence (probability)
Identify Informative features

Sparse Logistic Regression
3
Logistic Regression

Logistic Regression (LR) has been applied to
Document classification (Brzezinski, 1999)
Natural language processing (Jurafsky and Martin,
2000)
Computer vision (Friedman et al., 2000)
Bioinformatics (Liao and Chin, 2007)
Regularization is commonly applied to reduce
overfitting and obtain a robust classifier. Two
well-known regularizations are
L2-norm regularization (Minka, 2007)
L1-norm regularization (Koh et al., 2007)

4
Sparse Logistic Regression

L1-norm regularization leads to sparse logistic
regression (SLR)
Simultaneous feature selection and classification
Enhanced model interpretability
Improved classification performance
Applications
M.-Y. Park and T. Hastie, Penalized Logistic
Regression for Detecting Gene Interactions.
Biostatistics, 2008.
T. Wu et al. Genomewide Association Analysis by
Lasso Penalized Logistic Regression.
Bioinformatics, 2009.

5
Large-Scale Sparse Logistic Regression

Many applications involve data of large
dimensionality
The MRI images used in Alzheimers Disease study
contain more than 1 million voxels (features)
Major Challenge
How to scale sparse logistic regression to
large-scale problems?

6
The Proposed Lassplore Algorithm

Lassplore (LArge-Scale SParse LOgistic
REgression) is a first-order method
Each iteration of Lassplore involves the
matrix-vector multiplication only
Scale to large-size problems
Efficient for sparse data
Lassplore achieves the optimal convergence rate
among all first-order methods

7
Outline

Logistic Regression
Sparse Logistic Regression
Lassplore
Experiments

8
Logistic Regression (1)

Logistic regression model is given by

9
Logistic Regression (2)
overfitting
10
L1-ball Constrained Logistic Regression

Favorable Properties
Obtaining sparse solution
Performing feature selection and classification
simultaneously
Improving classification performance
How to solve the L1-ball constrained optimization
problem?

11
Gradient Method for Sparse Logistic Regression
Let us consider the gradient descent for solving
the optimization problem
12
Euclidean Projection onto the L1-Ball
The Euclidean projection onto the L1-ball (Duchi
et al., 2008) is a building block, and it can be
solved in linear time (Liu and Ye, 2009).
13
Gradient Method Nesterovs Method (1)
Convergence rates
Nesterovs method achieves the lower-complexity
bound of smooth optimization by first-order
black-box method, and thus is an optimal method.
14
Gradient Method Nesterovs Method (2)

The theoretical number of iterations (up to a
constant factor) for achieving an accuracy of
10-8

15
Characteristics of the Lassplore

First-order black-box Oracle based method
At each iteration, we only need to evaluate
the function value and gradient
Utilizing the Nesterovs method (Nesterov, 2003)
Global convergence rate of O(1/k2) for the
general case
Linear convergence rate for the strongly
convex case
An adaptive line search scheme
The step size is allowed to increase
during the iterations
This line search scheme is applicable to
the general smooth convex optimization

16
Key Components and Settings

Previous schemes for
Nesterovs constant scheme (Nesterov, 2003)
Nemirovskis line search scheme (Nemirovski, 1994)

17
Previous Line Search Schemes

Nesterovs constant scheme (Nesterov, 2003)
is set to a constant value L, the
Lipschitz continuous gradient of the function
g(.)
is dependent on the conditional number C

Nemirovskis line search scheme (Nemirovski,
1994)
is allowed to increase, and upper-bounded
by 2L
is identical for every function g(.)

18
Proposed Line Search Scheme

Characteristics
is allowed to adaptively tuned (increasing
and decreasing) and upper-bounded by 2L
is dependent on
It preserves the optimal convergence rate
(technical proof refers to the paper)

19
Related Work

Y. Nesterov. Gradient methods for minimizing
composite objective function (Technical Report
2007/76).
S. Becker, J. Bobin, and E. J. Candès. NESTA a
fast and accurate first-order method for sparse
recovery. 2009.
A. Beck and M. Teboulle. A fast iterative
shrinkage-thresholding algorithm for linear
inverse problems. SIAM Journal on Imaging
Sciences, 2, 183-202, 2009.
K.-C. Toh and S. Yun. An accelerated proximal
gradient algorithm for nuclear norm regularized
least squares problems. Preprint, National
University of Singapore, March 2009.
S. Ji and J. Ye. An Accelerated Gradient Method
for Trace Norm Minimization. The Twenty-Sixth
International Conference on Machine Learning,
2009.

20
Experiments Data Sets
21
Comparison of the Line Search Schemes
Comparison the proposed adaptive scheme (Adap)
with the one proposed by Nemirovski (Nemi)
Objective
22
Pathwise Solutions Warm Start vs. Cold Start
23
Comparison with ProjectionL1 (Schmidt et al.,
2007)
24
Comparison with ProjectionL1 (Schmidt et al.,
2007)
25
Comparison with l1-logreg (Koh et al., 2007)
26
Drosophila Gene Expression Image Analysis
Drosophila embryogenesis is divided into 17
developmental stages (1-17)
27
Sparse Logistic Regression Application (2)
28
Summary

The Lassplore algorithm for sparse logistic
regression
First-order black-box method
Optimal convergence rate
Adaptive line search scheme
Future work
Apply the proposed approach for other mixed-norm
regularized optimization
Biological image analysis

29
The Lassplore Package
http//www.public.asu.edu/jye02/Software/lassplor
e/
30
Thank you!

Write a Comment

User Comments (0)

About PowerShow.com

LargeScale Sparse Logistic Regression PowerPoint PPT Presentation