Additive Models and Trees - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Additive Models and Trees

Description:

CART: Classification and Regression Trees. MARS: Multiple Adaptive Regression Splines ... MARS & CART relationship. IF. replace piecewise linear basis by step ... – PowerPoint PPT presentation

Number of Views:198

Avg rating:3.0/5.0

Slides: 35

Provided by: Nila1

Category:

more less

Transcript and Presenter's Notes

Title: Additive Models and Trees

1
Additive Models and Trees

Lecture Notes for CMPUT 466/551
Nilanjan Ray

Principal Source Department of Statistics, CMU
2
Topics to cover

GAM Generalized Additive Models
CART Classification and Regression Trees
MARS Multiple Adaptive Regression Splines

3
Generalized Additive Models
What is GAM?
The functions fj are smoothing functions in
general, such as splines, kernel functions,
linear functions, and so on Each function could
be different, e.g., f1 can be linear, f2 can be a
natural spline, etc.
Compare GAM with Linear Basis Expansions (Ch. 5
of HTF) Similarities? Dissimilarities? Any
similarity (in principle) with Naïve Bayes model?
4
Smoothing Functions in GAM

Non-parametric functions (linear smoother)
Smoothing splines (Basis expansion)
Simple k-nearest neighbor (raw moving average)
Locally weighted average by using kernel
weighting
Local linear regression, local polynomial
regression
Linear functions
Functions of more than one variables (interaction
term)
Example

5
Learning GAM Backfitting

Backfitting algorithm
Initialize
Cycle j 1,2,, p,,1,2,, p,, (m cycles)
Until the functions change less than a
prespecified threshold

6
Backfitting Points to Ponder
Computational Advantage? Convergence? How to
choose fitting functions?
7
Example Generalized Logistic Regression
Model
8
Additive Logistic Regression Backfitting
Fitting logistic regression (P99)
Fitting additive logistic regression (P262)
1. where
1.
2.
2.
Iterate
Iterate
a.
a.
b.
b.
Using weighted least squares to fit a linear
model to zi with weights wi, give new estimates
c.
c. Using weighted backfitting algorithm to fit
an additive model to zi with weights wi, give new
estimates
3. Continue step 2 until converge
3.Continue step 2 until converge
9
SPAM Detection via Additive Logistic Regression

Input variables (predictors)
48 quantitative variables percentage of words in
the email that match a given word. Examples
include business, address, internet, etc.
6 quantitative variables percentage of
characters in the email that match a given
character, such as ch, ch(, etc.
The average length of uninterrupted sequences of
capital letters
The length of the longest uninterrupted sequence
of capital letters
The sum of length of uninterrupted length of
capital letters
Output variable SPAM (1) or Email (0)
fjs are taken as cubic smoothing splines

10
SPAM Detection Results
Sensitivity Probability of predicting spam given
true state is spam Specificity Probability
of predicting email given true state is email
11
GAM Summary

Useful flexible extensions of linear models
Backfitting algorithm is simple and modular
Interpretability of the predictors (input
variables) are not obscured
Not suitable for very large data mining
applications (why?)

12
CART

Overview
Principle behind Divide and conquer
Partition the feature space into a set of
rectangles
For simplicity, use recursive binary partition
Fit a simple model (e.g. constant) for each
rectangle
Classification and Regression Trees (CART)
Regress Trees
Classification Trees
Popular in medical applications

13
CART

An example (in regression case)

14
Basic Issues in Tree-based Methods

How to grow a tree?
How large should we grow the tree?

15
Regression Trees

Partition the space into M regions R1, R2, ,
RM.

Note that this is still an additive model
16
Regression Trees Grow the Tree

The best partition to minimize the sum of
squared error
Finding the global minimum is computationally
infeasible
Greedy algorithm at each level choose variable j
and value s as
The greedy algorithm makes the tree unstable
The error made at the upper level will be
propagated to the lower level

17
Regression Tree how large should we grow the
tree ?

Trade-off between bias and variance
Very large tree overfit (low bias, high
variance)
Small tree (low variance, high bias) might not
capture the structure
Strategies
1 split only when we can decrease the error
(usually short-sighted)
2 Cost-complexity pruning (preferred)

18
Regression Tree - Pruning

Cost-complexity pruning
Pruning collapsing some internal nodes
Cost complexity
Choose best alpha weakest link pruning (p.270,
HTF)
Each time collapse an internal node which add
smallest error
Choose from this tree sequence the best one by
cross-validation

19
Classification Trees

Classify the observations in node m to the major
class in the node
Pmk is the proportion of observation of class k
in node m
Define impurity for a node
Misclassification error
Entropy
Gini index

20
Classification Trees
Node impurity measures versus class proportion
for 2-class problem

Entropy and Gini are more sensitive
To grow the tree use Entropy or Gini
To prune the tree use Misclassification rate (or
any other method)

21
Tree-based Methods Discussions

Categorical Predictors
Problem Consider splits of sub tree t into tL
and tR based on categorical predictor x which has
q possible values 2(q-1)-1 ways !
Treat the categorical predictor as ordered by say
proportion of class 1

22
Tree-based Methods Discussions

Linear Combination Splits
Split the node based on
Improve the predictive power
Hurt interpretability
Instability of Trees
Inherited from the hierarchical nature
Bagging (section 8.7 of HTF) can reduce the
variance

23
Bootstrap Trees
Construct B number of trees from B bootstrap
samples bootstrap trees
24
Bootstrap Trees
25
Bagging The Bootstrap Trees
is computed from the bth bootstrap sample
in this case a tree
Bagging reduces the variance of the original tree
by aggregation
26
Bagged Tree Performance
Majority vote
Average
27
MARS

In multi-dimensional spline the basis functions
grow exponentially curse of dimensionality
A partial remedy is a greedy forward search
algorithm
Create a simple basis-construction dictionary
Construct basis functions on-the-fly
Choose the best-fit basis function at each step

28
Basis functions