Title: UMass and Learning for CALO
1UMass andLearning for CALO
- Andrew McCallum
- Information Extraction Synthesis Laboratory
- Department of Computer Science
- University of Massachusetts
2Outline
- CC-Prediction
- Learning in the wild from user email usage
- DEX
- Learning in the wild from user correction...as
well as KB records filled by other CALO
components - Rexa
- Learning in the wild from user corrections to
coreference... propagating constraints in a
Markov-Logic-like system that scales to 20
million objects - Several new topic models
- Discover interesting useful structure without the
need for supervision... learning from newly
arrived data on the fly
3CC Prediction Using Various Exponential Family
Factor Graphs
- Learning to keep an org. connected avoid
stove-piping. - First steps toward ad-hoc team creation.
- Learning in the wild from users CC behavior,and
from other parts of the CALO ontology.
4Graphical Models for Email
- Compute P(yx) for CC prediction
- function - random variable - N
replications
Recipient of Email
y
N
The graph describes the joint distribution of
random variables in term of the product of local
functions
xb
xs
xr
Nb
Ns
Nr-1
Email Model Nb words in the body, Ns words in
the subject, Nr recipients
Body Subject Other Words Words
Recipients
Nr
- Local functions facilitate system engineering
through modularity
5Document Models
- Models may relational attributes
Na
Author ofDocument
y
xb
xs
xb
xr
xt
Nb
Ns
Na-1
Nr
Nt
Title Abstract Body Co-authors
References
- We can optimize P(yx) for classification
performance and P(xy) for model interpretability
and parameter transfer (to other models)
6CC Prediction and Relational Attributes
Nr
Target Recipient
y
xb
xs
xr
xr
xtr
Nb
Ns
Nr-1
Ntr
Thread Body Subject Other
Relation Relation Words Words Recipients
Thread Relations e.g. Was a given recipient
ever included on this email thread? Recipient
Relationships e.g. Does one of the other
recipients report to the target recipient?
7CC-Prediction Learning in the Wild
- As documents are added to Rexa, models of
expertise for authors grows - As DEX obtains more contact information and
keywords, organizational relations emerge - Model parameters can be adapted on-line
- Priors on parameters can be used to transfer
learned information between models - New relations can be added on-line
- Modular model construction and intelligent model
optimization enable these goals
8CC Prediction Upcoming work on
Multi-Conditional Learning
- A discriminatively-trained topic model,
- discovering low-dimensional representations for
- transfer learning and improved regularization
generalization.
9Objective Functions for Parameter Estimation
Traditional
New, multi-conditional
10Multi-Conditional Learning (Regularization)
McCallum, Pal, Wang, 2006
11Multi-Conditional Mixtures
12Predictive Random Fieldsmixture of Gaussians on
synthetic data
McCallum, Wang, Pal, 2005
Data, classify by color
Generatively trained
Multi-Conditional
Conditionally-trained Jebara 1998
13Multi-Conditional Mixturesvs. Harmoniunon
document retrieval task
McCallum, Wang, Pal, 2005
Multi-Conditional,multi-way conditionally trained
Conditionally-trained,to predict class labels
Harmonium, joint,with class labels and words
Harmonium, joint with words, no labels
14DEX
- Beginning with a review of previous work,
- then new work on record extraction,
- with the ability to leverage new KBs in the wild,
and for transfer
15System Overview
CRF
WWW
Email
names
16An Example
To Andrew McCallum mccallum_at_cs.umass.edu Subjec
t ...
First Name Andrew
Middle Name Kachites
Last Name McCallum
JobTitle Associate Professor
Company University of Massachusetts
Street Address 140 Governors Dr.
City Amherst
State MA
Zip 01003
Company Phone (413) 545-1323
Links Fernando Pereira, Sam Roweis,
Key Words Information extraction, social network,
Search for new people
17Summary of Results
Example keywords extracted
Person Keywords
William Cohen Logic programming Text categorization Data integration Rule learning
Daphne Koller Bayesian networks Relational models Probabilistic models Hidden variables
Deborah McGuiness Semantic web Description logics Knowledge representation Ontologies
Tom Mitchell Machine learning Cognitive states Learning apprentice Artificial intelligence
Contact info and name extraction performance (25
fields)
Token Acc Field Prec Field Recall Field F1
CRF 94.50 85.73 76.33 80.76
- Expert Finding When solving some task, find
friends-of-friends with relevant expertise.
Avoid stove-piping in large orgs by
automatically suggesting collaborators. Given a
task, automatically suggest the right team for
the job. (Hiring aid!) - Social Network Analysis Understand the social
structure of your organization. Suggest
structural changes for improved efficiency.
18Importance of accurate DEX fields in IRIS
- Information about
- people
- contact information
- email
- affiliation
- job title
- expertise
- ...
- are key to answering many CALO questions...
- both directly, and as supporting inputs to
higher-level questions.
19Learning Field Compatibilities in DEX
Professor Jane Smith University of
California 209-555-5555 Professor Smith chairs
the Computer Science Department. She hails from
Boston, her administrative assistant John
Doe Administrative Assistant University of
California 209-444-4444
20Learning Field Compatibilities in DEX
Extracted Record
Professor Jane Smith University of
California 209-555-5555 Professor Smith chairs
the Computer Science Department. She hails from
Boston, her administrative assistant John
Doe Administrative Assistant University of
California 209-444-4444
Name Jane Smith, John Doe JobTitle Professor,
Administrative Assistant Company U of
California Department Computer Science Phone
209-555-5555, 209-444-4444 City Boston
Jane Smith
University of California
209-555-5555
Computer Science
Professor
Boston
Administrative Assistant
University of California
John Doe
209-444-4444
21Learning Field Compatibilities in DEX
- 35 error reduction over transitive closure
- Qualitatively better than heuristic approach
-
- Mine Knowledge Bases from other parts of IRIS
for learning compatibility rules among fields - Professor job title co-occurs with University
company - Area code / city compatibility
- Senator job title co-occurs with Washington,
D.C location - In the wild
- As the user adds new fields make corrections,
DEX learns from this KB data - Transfer learning
- between departments/industries
22Rexa A knowledge base of publications, grants,
people, their expertise, topics, and
inter-connections
- Learning for information extraction and
coreference. - Incrementally leveraging multiple sources of
information for improved coreference - Gathering information about peoples expertise
and co-author, citation relations - First a tour of Rexa, then slides about learning
23Previous Systems
24Previous Systems
Cites
Research Paper
25More Entities and Relations
Expertise
Cites
Research Paper
Person
Grant
University
Venue
Groups
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Learning in Rexa
- Extraction, coreferenceIn the wild Re-adjusting
KB after corrections from a user - Also, learning research topics/expertise, and
their interconnections
39(Linear Chain) Conditional Random Fields
Lafferty, McCallum, Pereira 2001
Undirected graphical model, trained to
maximize conditional probability of output
sequence given input sequence
where
Finite state model
Graphical model
OTHER PERSON OTHER ORG TITLE
output seq
y
y
y
y
y
t2
t3
t
-
1
t
t1
FSM states
. . .
observations
x
x
x
x
x
t
2
t
3
t
t
1
-
t
1
said Jones a Microsoft VP
input seq
(500 citations)
40 IE from Research Papers
McCallum et al 99
41IE from Research Papers
Field-level F1 Hidden Markov Models
(HMMs) 75.6 Seymore, McCallum, Rosenfeld,
1999 Support Vector Machines (SVMs) 89.7 Han,
Giles, et al, 2003 Conditional Random Fields
(CRFs) 93.9 Peng, McCallum, 2004
? error 40
(Word-level accuracy is gt99)
42Joint segmentation and co-reference
Extraction from and matching of research paper
citations.
o
s
World Knowledge
Laurel, B. Interface Agents Metaphors with
Character, in The Art of Human-Computer
Interface Design, B. Laurel (ed), Addison-Wesley,
1990.
c
Co-reference decisions
y
y
p
Brenda Laurel. Interface Agents Metaphors with
Character, in Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
Databasefield values
c
y
c
Citation attributes
s
s
Segmentation
o
o
35 reduction in co-reference error by using
segmentation uncertainty.
6-14 reduction in segmentation error by using
co-reference.
Inference Variant of Iterated Conditional Modes
Wellner, McCallum, Peng, Hay, UAI 2004
see also Marthi, Milch, Russell, 2003
Besag, 1986
43Rexa Learning in the Wildfrom User Feedback
- Coreference will never be perfect.
- Rexa allows users to enter corrections to
coreference decisions - Rexa then uses this feedback to
- re-consider other inter-related parts of the KB
- automatically make further error correctionsby
propagating constraints - (Our coreference system uses underlying ideas
very much like Markov Logic, and scales to 20
million mention objects.)
44Finding Topics in 1 million CS papers
200 topics keywords automatically discovered.
45Topical Transfer
Citation counts from one topic to another.
Map producers and consumers
46Topical Diversity
Find the topics that are cited by many other
topics---measuring diversity of impact. Entropy
of the topic distribution among papers that
cite this paper (this topic).
LowDiversity
HighDiversity
47Some New Work onTopic Models
- Robustly capturing topic correlations
- Pachkinko Allocation Model
- Capturing phrases in topic-specific ways
- Topical N-Gram Model
48Pachinko Machine
49Pachinko Allocation Model
Li, McCallum, 2005
?11
Model structure, not the graphical model
?22
?21
Distributions over distributions over topics...
Distributions over topicsmixtures, representing
topic correlations
?31
?33
?32
?41
?42
?43
?44
?45
Distributions over words (like LDA topics)
word1
word2
word3
word4
word5
word6
word7
word8
Some interior nodes could contain one
multinomial, used for all documents. (i.e. a very
peaked Dirichlet)
50Topic Coherence Comparison
models, estimation, stopwords
estimation, some junk
LDA 100 estimation likelihood maximum noisy estima
tes mixture scene surface normalization generated
measurements surfaces estimating estimated iterati
ve combined figure divisive sequence ideal
LDA 20 models model parameters distribution bayes
ian probability estimation data gaussian methods l
ikelihood em mixture show approach paper density f
ramework approximation markov
Example super-topic 33 input hidden units
function number 27 estimation bayesian parameters
data methods 24 distribution gaussian markov
likelihood mixture 11 exact kalman full
conditional deterministic 1 smoothing
predictive regularizers intermediate slope
51Topic Correlations in PAM
5000 research paper abstracts, from across all CS
Numbers on edges are supertopics Dirichlet
parameters
52Likelihood Comparison
53Want to Model Trends over Time
- Is prevalence of topic growing or waning?
- Pattern appears only briefly
- Capture its statistics in focused way
- Dont confuse it with patterns elsewhere in time
- How do roles, groups, influence shift over time?
54Topics over Time (TOT)
Wang, McCallum 2006
?
Dirichlet
?
multinomialover topics
Uniformprior
Dirichlet prior
topicindex
z
?
?
timestamp
word
w
t
?
?
T
T
Nd
Betaover time
Multinomialover words
D
55State of the Union Address
208 Addresses delivered between January 8, 1790
and January 29, 2002.
- To increase the number of documents, we split the
addresses into paragraphs and treated them as
documents. One-line paragraphs were excluded.
Stopping was applied. - 17156 documents
- 21534 words
- 669,425 tokens
Our scheme of taxation, by means of which this
needless surplus is taken from the people and put
into the public Treasury, consists of a tariff
or duty levied upon importations from abroad and
internal-revenue taxes levied upon the
consumption of tobacco and spirituous and malt
liquors. It must be conceded that none of the
things subjected to internal-revenue
taxation are, strictly speaking, necessaries.
There appears to be no just complaint of this
taxation by the consumers of these articles, and
there seems to be nothing so well able to bear
the burden without hardship to any portion of the
people.
1910
56ComparingTOTagainst LDA
57Topic Distributions Conditioned on Time
NIPS vol 1-14
topic mass (in vertical height)
time