Identifying Feature Relevance Using a Random Forest - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Identifying Feature Relevance Using a Random Forest

Description:

http://www.isis.ecs.soton.ac.uk. Identifying Feature Relevance ... Information Gain Density Functions. RF used to construct 500 trees on an artificial dataset ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 26
Provided by: jem68
Category:

less

Transcript and Presenter's Notes

Title: Identifying Feature Relevance Using a Random Forest


1
Identifying Feature Relevance Using a Random
Forest
  • Jeremy Rogers Steve Gunn

2
Overview
  • What is a Random Forest?
  • Why do Relevance Identification?
  • Estimating Feature Importance with a Random
    Forest
  • Node Complexity Compensation
  • Employing Feature Relevance
  • Extension to Feature Selection

3
Random Forest
  • Combination of base learners using Bagging
  • Uses CART-based decision trees

4
Random Forest (cont...)
  • Optimises split using Information Gain
  • Selects feature randomly to perform each split
  • Implicit Feature Selection of CART is removed

5
Feature Relevance Ranking
  • Analyse Features individually
  • Measures of Correlation to the target
  • Feature is relevant if

Assumes no feature interaction Fails to identify
relevant features in parity problem
6
Feature Relevance Subset Methods
  • Use implicit feature selection of decision tree
    induction
  • Wrapper methods
  • Subset search methods
  • Identifying Markov Blankets
  • Feature is relevant if

7
Relevance Identification using Average
Information Gain
  • Can identify feature interaction
  • Reliability dependant upon node composition
  • Irrelevant features give non-zero relevance

8
Node Complexity Compensation
  • Some nodes are easier to split
  • Requires each sample to be weighted by some
    measure of node complexity
  • Data projected on to one-dimensional space
  • For Binary Classification

9
Unique Non-Unique Arrangements
  • Some arrangements are reflections (non-unique)

Some arrangements are symmetrical about their
centre (unique)
10
Node Complexity Compensation (cont)
Au - No. Unique Arrangements
11
Information Gain Density Functions
  • Node Complexity improves measure of average IG
  • The effect is visible when examining the IG
    density functions for each feature
  • These are constructed by building a forest and
    recording the frequencies of IG values achieved
    by each feature

12
Information Gain Density Functions
  • RF used to construct 500 trees on an artificial
    dataset
  • IG density functions recorded for each feature

13
Employing Feature Relevance
  • Feature Selection
  • Feature Weighting
  • Random Forest uses a Feature Sampling
    distribution to select each feature.
  • Distribution can be altered in two ways
  • Parallel Update during forest construction
  • Two-stage Fixed prior to forest construction

14
Parallel
  • Control update rate using confidence intervals.
  • Assume Information Gain values have normal
    distribution.

Statistic has a Students t distribution with n-1
degrees of freedom
Maintain most uniform distribution within
confidence bounds
15
Convergence Rates
16
Results
  • 90 of data used for training, 10 for testing
  • Forests of 100 trees were tested and averaged
    over 100 trials

17
Irrelevant Features
  • Average IG is the mean of a non-negative sample.
  • Expected IG of an irrelevant feature is non-zero.
  • Performance is degraded when there is a high
    proportion of irrelevant features.

18
Expected Information Gain
nL - No. examples in left descendant iL -
No. positive examples in left descendant
19
Expected Information Gain
No. positive examples
No. negative examples
20
Bounds on Expected Information Gain
Lower Bound is given by
  • Upper can be approximated as

21
Irrelevant Features Bounds
  • 100 trees built on artificial dataset
  • Average IG recorded and bounds calculated

22
Friedman
FS
CFS
23
Simple
FS
CFS
24
Results
  • 90 of data used for training, 10 for testing
  • Forests of 100 trees were tested and averaged
    over 100 trials
  • 100 trees constructed for feature evaluation in
    each trial

25
Summary
  • Node complexity compensation improves measure of
    feature relevance by examining node composition
  • Feature sampling distribution can be updated
    using confidence intervals to control the update
    rate
  • Irrelevant features can be removed by calculating
    their expected performance
Write a Comment
User Comments (0)
About PowerShow.com