Learning CRFs with Hierarchical Features: An Application to Go - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Learning CRFs with Hierarchical Features: An Application to Go

Description:

Ralf Herbrich. Tom Minka. TexPoint fonts used in EMF. ... The Game of Go. Started about 4000 years ago in ancient China. About 60 million players worldwide ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 27
Provided by: scotts103
Category:

less

Transcript and Presenter's Notes

Title: Learning CRFs with Hierarchical Features: An Application to Go


1
Learning CRFs with Hierarchical Features An
Application to Go
  • Scott Sanner
  • Thore Graepel
  • Ralf Herbrich
  • Tom Minka

University of Toronto Microsoft Research
(with thanks to David Stern and Mykel
Kochenderfer)
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAAAAAAAA
2
The Game of Go
  • Started about 4000 years ago in ancient China
  • About 60 million players worldwide
  • 2 Players Black and White
  • Board 1919 grid
  • Rules
  • Turn One stone placed on vertex.
  • Capture.
  • Aim Gather territory by surrounding it

3
Territory Prediction
  • Goal predict territory distribution given
    board...
  • How to predict territory?
  • Could use simulated play
  • Monte Carlo averaging is an excellent estimator
  • Avg. 180 moves/turn, gt100 moves/game ? costly
  • We learn to directly predict territory
  • Learn from expert data

4
Talk Outline
  • Hierarchical pattern features
  • Independent pattern-based classifiers
  • Best way to combine features?
  • CRF models
  • Coupling factor model (w/ patterns)
  • Best training / inference approximationto
    circumvent intractability?
  • Evaluation and Conclusions

5
Hierarchical Patterns
  • Centered on a single position
  • Exact config.of stones
  • Fixed match region (template)
  • 8 nested templates
  • 3.8 million patterns mined

6
Models
(a) Independent pattern- based classifiers
(b) CRF
(c) Pattern CRF
7
Independent Pattern-based Classifiers
8
Inference and Training
  • Up to 8 pattern sizes may match at any vertex
  • Which pattern to use?
  • Smallest pattern
  • Largest pattern
  • Or, combine all patterns
  • Logistic regression
  • Bayesian model averaging

9
Bayesian Model Averaging
  • Bayesian approach to combining models
  • Now examine the model weight
  • Model ? must apply to all data!

10
Hierarchical Tree Models
  • Arrange patterns into decision trees ?i
  • Model ?i provides predictions on all data

11
CRF Pattern CRF
12
Inference and Training
  • Inference
  • Exact is slow for 19x19 grids
  • Loopy BP is faster
  • but biased
  • Sampling is unbiased
  • but slower than Loopy BP
  • Training
  • Max likelihood requires inference!
  • Other approximate methods

13
Pseudolikelihood
  • Standard log-likelihood
  • Edge-based pseudo log-likelihood
  • Then inference during training is purely local
  • Long range effects captured in data
  • Note only valid for training
  • in presence of fully labeled data

14
Local Training
  • Piecewise
  • Shared Unary Piecewise

15
Evaluation
16
Models Algorithms
  • Model algorithm specification
  • Model / Training (/ Inference, if not obvious)
  • Models algorithms evaluated
  • Indep / Smallest, Largest Pattern
  • Indep / BMA-Tree Uniform, Exp
  • Indep / Log Regr
  • CRF / ML Loopy BP (/ Swendsen-Wang)
  • Pattern CRF / Pseudolikelihood (Edge)
  • Pattern CRF / (S. U.) Piecewise
  • Monte Carlo

17
Training Time
  • Approximate time for various models and
    algorithms to reach convergence

18
Inference Time
  • Average time to evaluate for various
    models and algorithms on a 19x19 board

19
Performance Metrics
  • Vertex Error (classification error)
  • Net Error (score error)
  • Log Likelihood (model fit)

20
Performance Tradeoffs I
21
Why is Vertex Error better for CRFs?
  • Coupling factors help realize stable
    configurations
  • Compare previous unary-only independent model to
    unary and coupling model
  • Independent models make inconsistent predictions
  • Loopy BP smoothes these predictions (but too
    much?)

BMA-Tree Model
Coupling Model with Loopy BP
22
Why is Net Error worse for CRFs?
  • Use sampling to examine bias of Loopy BP
  • Unbiased inference in limit
  • Can run over all test data but still too costly
    for training
  • Smoothing gets rid of local inconsistencies
  • But errors reinforce each other!

Loopy Belief Propagation
Swendsen-Wang Sampling
23
Bias of Local Training
  • Problems with Piecewise training
  • Very biased when used in conjunction with Loopy
    BP
  • Predictions good (low Vertex Error), just
    saturated
  • Accounts for poor Log Likelihood Net Error

ML Trained
Piecewise Trained
24
Performance Tradeoffs II
25
Conclusions
  • Two general messages
  • (1) CRFs vs. Independent Models
  • Pattern CRFs should theoretically be better
  • However, time cost is high
  • Can save time with approximate training /
    inference
  • But then CRFs may perform worse than independent
    classifiers depends on metric
  • (2) For Independent Models
  • Problem of choosing appropriate neighborhood can
    be finessed by Bayesian model averaging

26
Thank you!Questions?
Write a Comment
User Comments (0)
About PowerShow.com