Title: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences
1Automatic Acquisition of Paradigmatic Relations
using Iterated Co-occurrences
- Chris Biemann, Stefan Bordag, Uwe Quasthoff
- University of Leipzig, NLP Department
- LREC 2004, Learning Acquisition (II), 27th of
May 2004
2Sets of Words
- Our goal is the automatic extension of homogenous
word sets, i.e. WordNet synsets or small subtrees
of some hierarchy - We collect methods and apply them, eventually in
combination - Mind experiment the computer as
associatorInput some example concepts-
Detection of the relation- Output of additional
instancesThis can be done semi-supervised - Necessary- very large text corpus- features-
methods
3Statistical Co-occurrences
- occurrence of two or more words within a
well-defined unit of information (sentence,
nearest neighbors) - Significant Co-occurrences reflect relations
between words - Significance Measure (log-likelihood)- k is the
number of sentences containing a and b together-
ab is (number of sentences with a)(number of
sentences with b)- n is total number of
sentences in corpus
4Iterating Co-occurrences
- (sentence-based) co-ocurrences of first
orderwords that co-occur significantly often
together in sentences - co-occurrences of second order
- words that co-occur significantly often in
collocation sets of first order - co-occurrences of n-th orderwords that co-occur
significantly often in collocation sets of
(n-1)th order - When calculating a higher order, the significance
values of the preceding order are not relevant. A
co-occurrence set consists of the N highest
ranked co-occurrences of a word.
5Constructed Example I
Ord 1 dog terrier cat mouse barking bite yelp
dog - - - X x X
terrier - - - x x X
cat - - x - x -
mouse - - X - x -
barking X X - - - -
bite X X x x - -
yelp x x - - - -
Ord 2 dog terrier cat mouse barking bite yelp
dog 3 1 1 - - -
terrier 3 1 1 - - -
cat 1 1 1 - - -
mouse 1 1 1 - 1 -
barking - - - - 2 2
bite - - - 1 2 2
yelp - - - - 2 2
6Constructed Example II
Ord 2 dog terrier cat mouse barking bite yelp
dog x - - - - -
terrier x - - - - -
cat - - - - - -
mouse - - - - - -
barking - - - - x x
bite - - - - x x
yelp - - - - x x
Ord 3 dog terrier cat mouse barking bite yelp
dog - - - - - -
terrier - - - - -
cat - - - - - -
mouse - - - - - -
barking - - - - 1 1
bite - - - - 1 1
yelp - - - - 1 1
7Properties of Iterated Co-occurrences
- after some iterations the sets remain more or
less stable - the sets are somewhat semantically homogeneous
- sometimes, they have to do nothing with the
reference word - calculations performed until 10th order.
- Example for TOP 20 NB-collocations of 10th order
for erklärte explained sagte, schwärmte,
lobt, schimpfte, meinte, jubelte, lobte,
resümierte, schwärmt, Reinhard Heß, ärgerte,
kommentierte, urteilte, analysierte, bilanzierte,
freute, freute sich, Bundestrainer, freut
,gefreut said, enthused, praises, grumbled,
meant, was jubilant, praised, summarized, dreamt,
Reinhard Hess, annoyed, commentated, judged,
analyzed, balanced, made happy, was pleased,
coach of the national team, is pleased, been
pleased
8Mapping co-occurrences to graphs
- For all words having co-occurrences, form nodes
in a graph. - Connect them all by edges, initialize edge weight
with 0 - For every co-occurrence of two words in a
sentence, increase edge weight by significance
9First Iteration Step
- The two black nodes A and B get connected in the
step if there are many nodes C which are
connected to both A and B - The more Cs, the higher the weight of the new edge
existing connection
new connection
10Second Iteration Step
- The two black nodes A and B get connected in the
step if there are many (dark grey) nodes Ds which
are connected to both A and B. - The connections between the nodes Ds and the
nodes A and B were constructed because of (light
gray) nodes Es and Fs, respectively
Es
Ds
Fs
former connection
existing connection
new connection
B
A
11Collapsing bridging nodes
- Upper bound for path length in iteration n is 2n.
- However, some of the bridging nodes collapse,
giving rise to self-keeping clusters of arbitrary
path length, which are invariant under iteration.
Upper 5 nodes invariant cluster A, B are being
absorbed by this cluster
12Examples of Iterated Co-occurrences
13Intersection of Co-occurrence Sets resolving
ambiguity
Herz-Bube
Becker
bedient - folgenden - gereizt - Karo-Buben -
Karo-Dame - Karo-König - Karte - Karten -
Kreuz-Ass - Kreuz-Dame - Kreuz-Hand -
Kreuz-König - legt - Mittelhand - Null ouvert
- Pik - Pik-Ass - Pik-Dame - schmiert - Skat
- spielt - Spielverlauf - sticht - übernimmt -
zieht -
Agassi - Australian Open - Bindewald - Boris
- Break - Chang - Dickhaut - - gewann -
Ivanisevic - Kafelnikow - Kiefer - Komljenovic
- Leimen - Matchball - Michael Stich - Monte
Carlo - Prinosil - Sieg - Spiel - spielen -
Steeb - Teamchef
Stich
Achtelfinale - Aufschlag - Boris Becker -
Daviscup - Doppel - DTB Edberg - Finale -
Graf - Haas - Halbfinale - Match - Pilic -
Runde - Sampras - Satz - Tennis - Turnier -
Viertelfinale - Weltrangliste - Wimbledon
Alleinspieler - Herz - Herz-Dame - Herz-König
- Hinterhand - Karo - Karo-As - Karo-Bube -
Kreuz-As - Kreuz-Bube - Pik-As - Pik-Bube -
Pik-König - Vorhand -
Becker - Courier - Einzel - Elmshorn - French
Open - Herz-As - ins - Kafelnikow - Karbacher
- Krajicek - Kreuz-As - Kreuz-Bube - Michael
Stich - Mittelhand - Pik-As - Pik-Bube -
Pik-König
Stich
14Example NB-collocations of 2nd order warm, kühl,
kalt
- Disjunction and filtering for adjectives of
collocation sets for warm, kühl, kalt warm,
cool, cold results inabgekühlt, aufgeheizt,
eingefroren, erhitzt, erwärmt, gebrannt,
gelagert, heiß, heruntergekühlt, verbrannt,
wärmer cooled down, heated, frozen, heated up,
warms up, burned, stored, hot, down-cooled,
burned, more warmly - emotional reading abweisend repelling for
kühl, kalt is eliminated
15Detection of X-onymssynonyms, antonyms,
(co)-hyponyms...
- Idea Intersection of co-occurrence sets of two
X-onyms as reference words should contain X-onyms - lexical ambiguity of one reference word does not
deteriorate the result set - Method- Detect word class for reference words-
calculate co-occurrences for reference words-
filter co-occurrences w.r.t the word class of the
reference words (by means of POS tags)-
perform disjunction of the co-occurrence sets-
output result - ranking can be realized over significance values
of the co-occurrences
16Mini-Evaluation
- Experiments for different data sources,
NB-collocations of 2nd and 3rd order - fraction of X-onyms in TOP 5 higher than in TOP
10 ? ranking method makes sense - disjunction of 2nd-order and 3rd-order
collocations almost always empty ? different
orders exhibit different relations - satisfactory quantity, more through larger
corpora - quality for unsupervised extension not precise
enough
17Word Sets for Thesaurus Expansion
- Application thesaurus expansion
- start set warm, kalt warm, coldresult set
heiß, wärmer, kälter, erwärmt, gut, heißer,
hoch, höher, niedriger, schlecht, frei hot,
warmer, colder, warmed, good, hotter, high,
higher, lower, bad, free - start set gelb, rot yellow, redresult set
blau, grün, schwarz, grau, bunt, leuchtend,
rötlich, braun, dunkel, rotbraun, weiß blue,
green, black, grey, colorful, bright, reddish,
brown, dark, red-brown, white - start set Mörder, Killer murderer,
killerresult set Täter, Straftäter,
Verbrecher, Kriegsverbrecher, Räuber,
Terroristen, Mann, Mitglieder, Männer,
Attentäter offender, delinquent, criminal, war
criminal, robber, terrorists, man, members, men,
assassin
18More Examples in English
- Intersection of N2-Order collocation sets
19Questions?