AAAI Presentation on "ELR" -- 2002 - PowerPoint PPT Presentation

1 / 2
About This Presentation
Title:

AAAI Presentation on "ELR" -- 2002

Description:

Structure Extension to Logistic Regression: Discriminative Parameter Learning of ... [Minka,2001] confirms these effectie for Logistic Regression) ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 3
Provided by: russgr
Category:
Tags: aaai | elr | minka | presentation

less

Transcript and Presenter's Notes

Title: AAAI Presentation on "ELR" -- 2002


1
Structure Extension to Logistic
Regression Discriminative Parameter Learning of
Belief Net Classifiers Russell Greiner and
Wei Zhou University of Alberta
University of Waterloo
greiner_at_cs.ualberta.ca
  • If goal is
  • Generative (learn distribution)
  • B(ML) arg maxB 1/S ?i ln PB( ci , ei)
  • Discriminative (learn classifier)
  • B arg minB err( B )
  • arg minB ?i ?( ci ? hB(ei) ) ?
  • B(MCL) arg maxB 1/S ?i ln PB( ci ei)

E1 E2 E3 En
Belief Net B ? V, A,??
Learners task
Ideally, minimize
-1 T -3
P( c,e )
E1 E2 E3 En C
B
truth
  • Nodes V (Variables)
  • Arcs A (Dependencies)
  • Parameters ? (Conditional probabilities)

Distribution
KL( truth, B )
3 T 0 0 F -1 - 4 F
0.2 - -2 T -3
- -
Performer


?c,e?
hB(e) ? truth( e )
err( B ) ??c,e? P(c,e) ? ?(c ? hB(e) )
Classifier
  • Computational Complexity
  • NP-hard to find values ? that minimize
  • even if only All? G, ?? for ? O(1/N)
  • Sample Complexity
  • Given structure G ?V,A? let
  • For any ?, ? gt 0, let
    be the parameters that
    optimizefor a sample S of size
  • Then, with probability at least 1-?, LCL(
    ) within ? of LCL(?G, ?? ).
  • Our specific task
  • Given
  • Structure (node, arcs not parameters)
  • Labeled data sample
  • Find parameters ? that maximize

All? G, ?? ? ? ParamFor(G) ? ?df ??,
?df ? ?


Proof
X1
X2
XN
?G, ?? argmax CL(? ) ? ? All? G, ??
3 T 0 0 F -1 - 4 F
0.2 - -2 T -3
- -

C1
CR
G has ? K parameters over V N
variables
D2
D3
.
.
.
DK
  • Notes
  • Similar bounds when dealing with err(?), as
    with LCL(?)
  • Dasgupta,1997 proves complete
    tuples sufficient wrt Likehood.
  • Same O(.) as our bound, ignoring ln2(.) and
    ln3(.) terms
  • The ? is unavoidable here (unlike likelihood
    case ATW91)

C
E2
Ek

E1
3 C1s
So ?C1 3/5
  • Other Algorithms
  • When given complete data
  • Compare to OFE (Observed Frequency Estimate)
  • Trivial algorithm maximizes Likelihood
  • When given incomplete data
  • EM (Expectation Maximation)
  • APN BKRK97 hillclimb in (unconditional)
    Likelihood
  • Relation to Logistic Regression
  • ELR on Naïve Bayes structure ? standard
    Logistic Regression
  • ELR deals with arbitrary structures, incomplete
    data
  • How to HillClimb?
  • Not just changing ?df , as constraints
  • a. ?df ? 0
  • b. ?d ?df 1
  • Souse softmax terms climb along ?dfs !
  • Need derivative
  • Optimizations
  • Initialize using OFE values (not random) plug
    in parameters
  • Line-search, conjugate gradient (Minka,2001
    confirms these effectie for Logistic Regression)
  • Deriv 0 when D and F are d-separated from E and
    C and so can be ignored!

C
  • ELR Learning Algorithm
  • Input
  • Structure
  • Labeled data sample
  • Output
  • parameters ?
  • Goal find
  • As NP-hard Hillclimb !
  • Change each ?df to improve
  • How??


F1
F2
2 E11, C1s
E2
D
So ?E11C1 2/3

E1
?C1
3/5
?E11C1 2/3 ?E11C0
2/3
2
Empirical Results
NaïveBayes Structure
TAN Structure
C
  • NaïveBayes Structure
  • Attributes independent, given Class
  • Complete Data
  • Every attribute of every instance specified
  • 25 Datasets
  • 23 from UCI, continuous discrete
  • 2 from SelectiveNB study
  • (used by FGG96)

E2
Ek

E1
  • TAN structure
  • Link from Class node to each attribute
  • Tree-structure connecting attributes
  • Permits dependencies between attributes
  • Efficient Learning alg Classification alg
  • Works well in practice FGG97
  • TAN can deal with depend attributes, NB cannot
  • but ELR is designed to help classify OFE is
    not
  • NB does poorly on CORRAL
  • artificial dataset, fn of 4 attribute
  • Genl NBELR ? TANOFE
  • TANELR did perfectly on CORRAL!
  • TANELR ? NBELR
  • TANELR gt TANOFE (plt0.025)
  • Chess domain
  • ELR-OFE
  • Initialize params using OFE values
  • Then run ELR
  • All 25 Domains
  • Below yx ? NBELR better than NBOFE
  • Bars are 1 standard deviation
  • ? ELR better than OFE ! (plt0.005)

Complete data
Missing Data
Correctness of Structure
  • Compare NBELR to NBOFE wrt
  • increasingly non-NB data
  • So far, each dataset complete
  • includes value of
  • every attribute in each instance
  • Now some omissions
  • Omit values of attributes
  • w/ prob 0.25
  • Missing Completely at Random
  • OFE works only with COMPLETE data
  • Given INCOMPLETE data
  • EM (Expectation Maximization)
  • APN (Adaptive Probabilistic Networks
  • BKRK97 )
  • Experiments using
  • NaïveBayes, TAN
  • Why does ELR work so well
  • vs OFE (complete data)
  • vs EM / APN (incomplete data)
  • for fixed simple structure (NB, TAN) ?
  • Generative Learner (OFE/APN/EM)
  • very constrained by structure
  • So if structure is wrong, cannot do well!
  • Discriminative Learner (ELR)
  • not as constrained!

C
0
C
C
1
2
E1
E2
E4
E3
E1
E2
E4
E3
E1
E2
E4
E3
25 MCAR omissions
  • P(C) 0.9 P(EiC) 0.2 P(EiC) 0.8
  • then P(EiE1)1.0, P(EiE1)0.0 when
    joinedfor model2, model3,
  • Measured Classification Error
  • k5, 400 records,
  • NBELR better than NBEM,
  • NBAPN
  • (plt0.025)
  • TANELR ? TANEM ?
    TANAPN
  • TAN algorithm problematic
  • as incomplete data
  • Future work
  • Now assume fixed structure
  • Learn STRUCTURE as well discriminately
  • NP-hard to learning LCL-optimal parameters
  • arbitrary structure
  • incomplete data
  • What is complexity if complete data? simple
    structure?

Summary of Results
Other Studies
Analysis
  • OFE guaranteed to find parameters
  • optimal wrt Likelihood
  • for structure G
  • If G incorrect
  • optimal-for-G is bad wrt true distribution
  • ? wrong answers to queries
  • ELR not as constrained by G
  • can do well, even when structure incorrect!
  • ELR useful, as structure often incorrect
  • to avoid overfitting
  • constrained set of structures (NB, TAN, )
  • See Discriminative vs Generative learning
  • Complete Data
  • Incomplete data
  • Nearly correct structure
  • Given data
  • Use PowerConstructor CG02,CG99 to build
    structure
  • Use OFE vs ELR to find parameters
  • For Chess

TAN ELR gt TAN OFE
NB ELR gt NB OFE
Insert fig 2b from paper!
  • Contributions
  • Motivate/Describe
  • discriminative learning for BN-parameters
  • Complexity of task (NP-hard, poly sample size)
  • Algorithm for task, ELR
  • complete or incomplete data
  • arbitrary structures
  • soft-max version, optimizations,
  • Empirical results showing ELR works
  • study to show why
  • Clearly a good idea
  • should be used for Classification Tasks!
  • ELR was relatively slow
  • ? 0.5 sec/iteration for small, minutes for
    large data
  • much slower than OFE
  • ? APN/EM
  • same alg for Complete/INcomplete data
  • ELR used unoptimized JAVA code
  • Correct structure, incomplete data
  • Consider Alarm BSCC89 structure ( param)
  • 36 nodes, 47 links, 505 params
  • Multiple queries
  • 8 vars as pool of query vars
  • 16 other vars as pool of evidence vars
  • Each query 1 q.var each evid var w/prob ½ so
    expect ?16/2 evidence
  • NOTE different q.var for different
    queries! (Like multi-task learning)
  • Results

TradeOff
  • Most BN-learners
  • Spend LOTS of time learning structure
  • Little time learning parameters
  • Why not
  • Use SIMPLE (quick-to-learn) structure
  • Focus computational effort on getting good
    parameters

C
Related Work

E2
Ek
E1
  • Lots of work on learning BNs most Generative
    learning
  • Some discriminative learners but most
  • learn STRUCTURE discriminatively
  • then parameters generatively !
  • See also Logistic Learning
  • GGS97 learns params discriminatively but
  • different queries, L2-norm (not LCL)
  • needed 2 types of data-samples,

?C1
Insert fig 6c from paper!
?E1C1
Write a Comment
User Comments (0)
About PowerShow.com