OBSTree Analysis of Handwritten Digits - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

OBSTree Analysis of Handwritten Digits

Description:

Using OBSTree to identify handwritten digits collected for zip code recognition. ... search. Stochastic search for explanatory variables. Exhaustive search for ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 25
Provided by: Bro9150
Category:

less

Transcript and Presenter's Notes

Title: OBSTree Analysis of Handwritten Digits


1
OBSTree Analysis of Handwritten Digits
  • Using OBSTree to identify handwritten digits
    collected for zip code recognition.
  • Atina Dunlap Brooks (adbrook2_at_stat.ncsu.edu)
  • Jacqueline Hughes-Oliver (hughesol_at_stat.ncsu.edu)
  • North Carolina State University

2
Outline
  • OBSTree method
  • Zip code dataset
  • Results
  • Interpretation
  • Completeness penalty

3
OBSTree Optimal Bit String Tree
  • Classification tree
  • Branches on sets of explanatory variables
  • Not individual explanatory variables
  • Computationally intensive
  • Split search
  • Stochastic search for explanatory variables
  • Exhaustive search for values (2m)
  • Exhaustive test to trim variable set (2m)

4
Example
  • 1 responses
  • X11 X21
  • 2 responses
  • X31 X41
  • 0 responses
  • unstructured

5
Traditional Trees
  • 1s found
  • X11, X21
  • 2s NOT found
  • confounded

16 24 06
X11
X10
16 22 03
03 22
X21
X20
03 22
16
6
OBSTree
  • 1s found
  • X11, X21
  • 2s found
  • X31, X41

16 24 06
X11, X21
16
24 06
X31, X41
24
06
7
Algorithm Modifications
  • Originally developed for finding QSARs
  • For non-QSAR
  • C code (speed)
  • Balanced multi-class
  • Starting point
  • Tie breakers
  • Depth selection
  • Penalty function

8
USPS Zip Code Dataset
  • 256 covariates (16x16)
  • Training -7291 observations
  • 10 responses
  • 0 1194 5 - 556
  • 1 1005 6 - 664
  • 2 731 7 - 645
  • 3 658 8 - 542
  • 4 652 9 - 644
  • Test -2007 observations
  • 10 responses
  • 0 359 5 - 160
  • 1 264 6 - 170
  • 2 198 7 - 147
  • 3 166 8 - 166
  • 4 200 9 - 177

9
Binary Conversion
  • OBSTree requires binary variables
  • Grayscale -1 , 1
  • Converted to 0 1

10
Training Branches 1-3
Present 40,72,168,216 Absent 7,27,117,124,171,23
0
1806
Present 59,116,230 Absent 105,121,137,169,193
0557
Present 22,24,26 Absent 88,105,116,136,198,201,2
12,230
7315
160 more branches
11
Branch 1
  • 806 1s
  • Present
  • 40,72,168,216
  • Absent
  • 7, 27,117,124,171,230
  • Examples

12
Why So Many Branches?
  • Examples from Branch 6
  • Examples from Branch 17

13
Training Confusion Matrix
  • Misclassified 17 (0.23)
  • Depth 163 branches

14
Test Confusion Matrix
  • Misclassified 302 (15.05)

15
Method Comparison
  • Human1 2.5
  • CART2 17
  • C4.53 16
  • OBSTree 15.1
  • Random Forest4 6.5
  • Neural Net5 5.1
  • SVM6 4.2
  • - first 149 nodes are pure

16
Completeness Penalty
  • Entropy
  • n in node
  • nk of class k in node

17
Branches 1-3Completeness Penalty ct.75
Present 40,72,168,216 Absent 22,27,117,124,171,2
30
1916, 41
Present 24 Absent 72,88,103,105,119,136,149,
151,198,213,220,229
7454, 92
before penalty
Present 76,116,230 Absent 105,121, 136,169
1806
0557
0574
7315
147 more branches
18
Branch 1Completeness Penalty ct.75
  • 916 1s, 1 4s
  • Present
  • 40,72,168,216
  • Absent Descriptors
  • 22,27,117,124,171,230
  • Examples

19
4 Misclassified as 1
20
Training ConfusionMatrix - Completeness Penalty
  • Misclassified 25 (0.34)
  • Depth 150 branches

21
Test ConfusionMatrix - Completeness Penalty
  • Misclassified 286 (14.25)

22
Method Comparison
  • Human1 2.5
  • CART2 17
  • C4.53 16
  • OBSTree 15.1
  • OBSTree with penalty 14.3
  • Random Forest4 6.5
  • Neural Net5 5.1
  • SVM6 4.2

23
Acknowledgements
  • The authors wish to acknowledge the work and
    discussions with Ke Zhang, Stan Young, Jaijun
    Liu, and Haojun Ouyang from NC State University
    who were invaluable in performing this research.

24
Thank You
Write a Comment
User Comments (0)
About PowerShow.com