Sparse, Flexible and Efficient Modeling using L1-Regularization - PowerPoint PPT Presentation

About This Presentation
Title:

Sparse, Flexible and Efficient Modeling using L1-Regularization

Description:

Sparse, Flexible and Efficient Modeling using L1-Regularization. Saharon Rosset and Ji Zhu ... Modeling Using L1-Regularization. Feature Extraction. Part 1: ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 22
Provided by: marku5
Category:

less

Transcript and Presenter's Notes

Title: Sparse, Flexible and Efficient Modeling using L1-Regularization


1
Sparse, Flexible and Efficient Modeling using
L1-Regularization
  • Saharon Rosset and Ji Zhu

2
Contents
  1. Idea
  2. Algorithm
  3. Results

3
  • Part 1 Idea

4
Introduction
  • Setting
  • Implicit dependency on training data
  • Linear model ( use j-functions)
  • Model

5
Introduction
  • Problem How to choose weight l of
    regularization?
  • Answer Find for all ? ? 0, ?)
  • Can this be done efficiently (time, memory)?
  • Yes, if we impose restrictions on

6
Restrictions
  • shall be piecewise linear
  • What impact on L(w) and J(w)?
  • Can we still solve real world problems?

7
Restrictions
  • must be piecewise constant
  • L(w) quadratic in w
  • J(w) linear in w

8
Quadratic Loss Functions
  • square loss in regression
  • hinge loss for classification (SVM)

9
Linear Penalty Functions
  • Sparseness property

10
Bet on Sparseness
  • 50 samples with 300 independent Gaussian
    variables
  • Row 3 non-zero variables
  • Row 30 non-zero variables
  • Row 300 non-zero variables

11
  • Part 2 Algorithm

12
Linear Toolbox
  • a(r), b(r) and c(r) piecewise constant
    coefficients
  • Regression
  • Classification

13
Optimization Problem
14
Algorithm Initialization
  • start at t0 w0
  • determine set of non-zerocomponents
  • starting direction

15
Algorithm Loop
  • follow the direction until one of
  • the following happens
  • addition of new component
  • vanishing of a non-zero component
  • hit of a knot (discontinuity of a(r), b(r),
    c(r) )

16
Algorithm Loop
  • direction update
  • stopping criterion

17
  • Part 3 Results

18
NIPS Results
  • General procedure
  • pre-selection(univariate t-statistic)
  • Algorithm loss functionHuberized hinge loss
  • Find best ? basedon validation dataset

19
NIPS Results
  • Dexter Dataset
  • m300, n20'000, pre-selection n1152
  • linear pieces of 452
  • Optimum at ( 120
    non-zero components)

20
NIPS Results
  • Not very happy with the results working with
    the original variables simple linear model L1
    regularization for feature selection

21
Conclusion
  • theory practice
  • limited to linear classifier
  • other extensionsRegularization Path for the SVM
    (L2)
Write a Comment
User Comments (0)
About PowerShow.com