Title: Online%20Max-Margin%20Weight%20Learning%20with%20Markov%20Logic%20Networks
1Online Max-Margin Weight Learning with Markov
Logic Networks
- Tuyen N. Huynh and Raymond J. Mooney
Machine Learning Group Department of Computer
Science The University of Texas at Austin
Star AI 2010, July 12, 2010
2Outline
- Motivation
- Background
- Markov Logic Networks
- Primal-dual framework
- New online learning algorithm for structured
prediction - Experiments
- Citation segmentation
- Search query disambiguation
- Conclusion
3Motivation
- Most of the existing weight learning for MLNs are
in the batch setting. - Need to run inference over all the training
examples in each iteration - Usually take a few hundred iterations to converge
- Cannot fit all the training examples in the
memory - ? Conventional solution online learning
4Background
5Markov Logic Networks (MLNs)
Richardson Domingos, 2006
- An MLN is a weighted set of first-order formulas
- Larger weight indicates stronger belief that the
clause should hold - Probability of a possible world (a truth
assignment to all ground atoms) x
2.5 Center(i,c) gt InField(Ftitle,i,c) 1.2
InField(f,i,c) Next(j,i) HasPunc(c,i)gt
InField(f,j,c)
6Existing discriminative weight learning methods
for MLNs
- maximize the Conditional Log Likelihood (CLL)
Singla Domingos, 2005, Lowd Domingos,
2007, Huynh Mooney, 2008 - maximize the margin, the log ratio between the
probability of the correct label and the closest
incorrect one Huynh Mooney, 2009
7Online learning
8Primal-dual framework Shalev-Shwartz et al.,
2006
- A general and latest framework for deriving
low-regret online algorithms - Rewriting the regret bound as an optimization
problem (called the primal problem), then
considering the dual problem of the primal one - A condition that guarantees the increase in the
dual objective in each step - ? Incremental-Dual-Ascent (IDA) algorithms. For
example subgradient methods
9Primal-dual framework (cont.)
- Proposed a new class of IDA algorithms called
Coordinate-Dual-Ascent (CDA) algorithm - The CDA update rule only optimizes the dual w.r.t
the last dual variable - A closed-form solution of CDA update rule ? CDA
algorithms have the same cost as subgradient
methods but increase the dual objective more in
each step ? converging to the optimal value
faster
10Primal-dual framework (cont.)
11CDA algorithms for max-margin structured
prediction
12Max-margin structured prediction
13Steps for deriving new CDA algorithms
- Define the regularization and loss functions
- Find the conjugate functions
- Derive a closed-form solution for the CDA update
rule
141. Define the regularization and loss functions
Label loss function
151. Define the regularization and loss functions
(cont.)
162. Find the conjugate functions
172. Find the conjugate functions (cont.)
183. Closed-form solution for the CDA update rule
- Optimization problem
- Solution
19CDA algorithms for max-margin structured
prediction
20Experiments
21Citation segmentation
- Citeseer dataset Lawrence et.al., 1999 Poon
and Domingos, 2007 - 1,563 citations, divided into 4 research topics
- Each citation is segmented into 3 fields Author,
Title, Venue - Used the simplest MLN in Poon and Domingos,
2007 - Similar to a linear chain CRF
- Next(j,i) !HasPunc(c,i) InField(c,f,i)
gt InField(c,f,j)
22Experimental setup
- Systems compared
- MM the max-margin weight learner for MLNs in
batch setting Huynh Mooney, 2009 - 1-best MIRA Crammer et al., 2005
- Subgradient Ratliff et al., 2007
- CDA1/PA1
- CDA2
23Experimental setup (cont.)
- 4-fold cross-validation
- Metric
- CiteSeer micro-average F1 at the token level
- Used exact MPE inference (Integer Linear
Programming) for all online algorithms and
approximate MPE inference (LP-relaxation) for the
batch one. - Used Hamming loss as the label loss function
24Average F1
25Average training time in minutes
26Microsoft web search query dataset
- Used the clean-up dataset created by Mihalkova
Mooney 2009 - Has thousands of search sessions where an
ambiguous queries was asked - Goal disambiguate search query based on previous
related search sessions - Used 3 MLNs proposed in Mihalkova Mooney, 2009
27Experimental setup
- Systems compared
- Contrastive Divergence (CD) Hinton 2002 used
in Mihalkova Mooney, 2009 - 1-best MIRA
- Subgradient
- CDA1/PA1
- CDA2
- Metric
- Mean Average Precision (MAP) how close the
relevant results are to the top of the rankings
28MAP scores
29Conclusion
- Derived CDA algorithms for max-margin structured
prediction - Have same computational cost as existing online
algorithms but increase the dual objective more - Experimental results on two real-world problems
show that the new algorithms generally achieve
better accuracy and also have more consistent
performance.
30Thank you!
Questions?