Feature Selection on TimeSeries Cab Data - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Feature Selection on TimeSeries Cab Data

Description:

Fast Correlation-Based Filter. Algorithm: ... Here the algorithm breaks down and only chooses feature 2, the 'time of day' ... values of time of day variable) ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 15
Provided by: www1CsC
Category:

less

Transcript and Presenter's Notes

Title: Feature Selection on TimeSeries Cab Data


1
Feature Selection on Time-Series Cab Data
  • Yingkit (Keith) Chow

2
Contents
  • Introduction
  • Features Considered
  • FCBF (Filter-type feature selection)
  • FCBF-PCA (my variation)
  • Conclusion

3
All Features Considered
  • Features
  • Each time sample consists of the following
    features
  • Day of Week, Time of Day (1st two features)
  • taxist, 69, taxist-1, 69,, taxist-5, 69
  • 69 represents the index to the matrix taxis,
    which is the cab entering with meter off, cab
    enter on, cab exit off, cab exit on
  • Not all features here will be relevant to
    classifying whether a game is present.

4
Fast Correlation-Based Filter
  • Algorithm
  • Finds features that are relevant ( SU(I, C) gt
    threshold),
  • where SU is symmetric uncertainty and will be
    described in the next slide
  • Remove redundant features by comparing remaining
    features (after the first step)
  • Remove feature j if SU(i, j) gt SU(j, C)

5
Equations1
  • Information Gain (IG)
  • IG(XY) H(X) H(XY)
  • Symmetric Uncertainty (SU)
  • SU(X,Y) 2 IG(XY) / H(X)H(Y)
  • SU is used instead of IG because it compensates
    for features having more values and normalizes
    data1

6
FCBF
  • Classifier (MATLAB Classify- Linear)
  • Number Bins 96
  • Threshold 0.01
  • Accuracy 91.9

7
Choice of Number Bins
  • Num Bins 96 results shown in previous slide
    (red is ground truth of game and blue is my
    classification)
  • Num Bins 20
  • Accuracy 58.6
  • Here the algorithm breaks down and only chooses
    feature 2, the time of day. The blue is
    periodic here, where a certain time segment a
    day, everyday will be classed as a game.

8
FCBF - PCA
  • FCBF compares individual features with each other
  • We can use PCA to try and capture a group of
    features. (for example, maybe one eigenvector
    can capture the shape of the number of cabs
    incoming with meters on initially before a game
    or the increase in the number of cabs entering
    with meters off prior to the end of game)
  • Example shown in the next slide

9
Cab Traffic Behavior
  • Before Start of Game
  • Cab On Enter and Cab Off Exit are high
  • Towards End of Game
  • Cab Off Enter and Cab On Exit are high

10
FCBF-PCA
  • Classifier (MATLAB Classify- Linear)
  • Number Bins 20
  • Threshold 0.01
  • Accuracy 92.9
  • Note the features here are projections onto the
    eigenvectors and not the original feature
    dimension

11
Conclusions
  • The choice of number of bins have an enormous
    impact on the performance. (possibly due to 96
    discrete values of time of day variable)
  • FCBF-PCA was less susceptible to the choice of
    numBins (10, 20, 100 numBins all resulted in
    approximately 91 accuracy)

12
Future Work
  • Currently using labels of game or not game.
  • Ill try to make it work for detecting the first
    sample of a game and another classifier to detect
    the last sample of a game since the mid-game
    generally has an entirely different
    characteristic from the beginning and end of
    game. However, I might be limited by the number
    of samples.

13
Questions
  • Im not currently in NYC so please send questions
    or comments to
  • yingkit.chow_at_gmail.com

14
Citations
  • Feature Selection for High Dimensional Data A
    Fast Correlation-Based Filter Solution, by Lei
    Yu and Huan Liu, ICML (2003)
  • Efficient Feature Selection via Analysis of
    Relevance and Redundancy, by Lei Yu and Huan
    Liu, Journal of Machine Learning Research 5
    (2004)
Write a Comment
User Comments (0)
About PowerShow.com