Title: Modelling Human Thematic Fit Judgments
1Modelling Human Thematic Fit Judgments
- IGK Colloquium
- 3/2/2005
- Ulrike Padó
2Overview
- (Very) quick introduction to my framework
- Testing the Semantic Module
- Different input corpora
- Smoothing
- Comparing the Semantic Module to standard
selectional preference methods
3Modelling Semantic Processing
- General idea Build a
- probabilistic
- large scale
- broad coverage
- model of syntactic and semantic sentence
processing
4Semantic Processing
- Assign thematic roles on the basis of
co-occurrence statistics from semantically
annotated corpora - Corpus-based frequency estimates of
- Semantic Subcategorisation (Probability of seeing
the role with the verb) - Selectional Preferences (Probability of seeing
the argument head in a role given the verb frame)
5Testing the Semantic Module
- Evaluate just thematic fit of verbs and argument
phrases - Evaluation
- Correlate predictions with human judgments
- Role labelling (prefer correct role)
- Try
- Different input corpora
- Smoothing
6Training Data
- Frequency counts from
- the PropBank (ca. 3000 verb types)
- Very specific domain
- Relatively flat, syntax-based annotation
- FrameNet (ca. 1500 verb types)
- Deep semantic annotation Frames code situations,
group verbs that describe similar events and
their arguments - Extracted from balanced corpus
- Skewed sample through frame-wise annotation
7Development/Test Data
- Development 60 verb-argument pairs from McRae et
al. 98 - Two judgments for each data point Agent/Patient
- Use to determine optimal parameters of clustering
(number of clusters, smoothing) - Test 50 verb-argument pairs, 100 data points
8Sparse Data
- Raw frequencies are sparse
- 1 (Dev)/2 (Test) pairs seen in PropBank
- 0 (Dev)/2 (Test) pairs seen in FrameNet
- Use semantic classes as level of abstraction
Class-based smoothing
9Smoothing
- Reconstruct probabilities for unseen data
- Smoothing by verb and noun classes
- Count class members instead of word tokens
- Compare two alternatives
- Hand-constructed classes
- Induced verb classes (clustering)
10Hand-constructed Verb and Noun classes
- WordNet Use top-level ontology and synsets as
noun classes - VerbNet Use top-level classes for verbs
- Presumably correct and reliable
- Result No significant correlations with human
data for either training corpus
11Induced Verb Classes
- Automatically cluster verbs
- Group by similarities of argument heads, paths
from argument to verb, frame, role labels - Determine optimal number of clusters and
parameters of the clustering algorithm on the
development set
12Induced Classes, PB/FN
Data points covered ?/ Significance
Raw data 2 -/-
Raw data 2 -/-
All Arguments 59 ns
All Arguments 12 ?0.55/ plt0.05
Just NPs 48 ns
Just NPs 16 ?0.56/ plt0.05
13Results
- Hand-built classes do not work (with this amount
of data) - Module achieves reliable correlations with FN
data - Important result for the overall feasibility of
my model
14Adding Noun Classes (PB/FN)
Data points covered ?/ Significance
Raw data 2 -/-
Raw data 2 -/-
PB, all args, Noun classes 4 ?1/ plt0.01
FN, just NPs, Noun classes 18 ?0.63/ plt0.01
15Results
- Hand-built classes do not work (with this amount
of data) - Module achieves reliable correlations with FN
data - Adding noun classes helps yet a little
16Comparison with Selectional Preference Methods
- Have established that our system reliably
predicts human data - How do we do in comparison to standard
computational linguistics methods?
17Selectional Preference Methods
- Clark Weir (2002)
- Add data points by finding the topmost class in
WN that still reliably mirrors the target word
frequency - Resnik (1996)
- Quantify contribution of WN class n to the
overall preference strength of the verb - Both rely on WN noun classes, no verb class
smoothing
18Selectional Preference Methods (PB/FN)
Data points covered ?/ Significance Labelling (Cov/Acc)
Sem. Module 1 18 ?0.63/ plt0.01 38/47.4
Sem. Module 2 16 ?0.56/ plt0.05 30/60
Clark Weir 72 ns 84/50
Clark Weir 23 ns 36/50
Resnik 75 ns 74/48.6
Resnik 46 ns 50/48
19Results
- Too little input data
- No results for selectional preference models
- Small coverage for Semantic Module
- Semantic module manages to make predictions all
the same - Relies on verb clusters Verbs are less sparse
than nouns in small corpora - Annotate larger corpus with FN roles
20Annotating the BNC
- Annotate large, balanced corpus BNC
- More data points for verbs covered in FN
- More verb coverage (though purely syntactic
annotation for unknown verbs) - Results
- Annotation relatively sensible and reliable for
non-FN verbs - Frame-wise annotation in FN causes problems for
FN verbs