Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences

Description:

yelp. x. x. bite. x. x. barking. mouse. cat. x. terrier. x. dog. yelp. bite. barking. mouse. cat. terrier. dog. Ord 2. Chris Biemann. 7. Properties of ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 20
Provided by: ich8
Category:

less

Transcript and Presenter's Notes

Title: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences


1
Automatic Acquisition of Paradigmatic Relations
using Iterated Co-occurrences
  • Chris Biemann, Stefan Bordag, Uwe Quasthoff
  • University of Leipzig, NLP Department
  • LREC 2004, Learning Acquisition (II), 27th of
    May 2004

2
Sets of Words
  • Our goal is the automatic extension of homogenous
    word sets, i.e. WordNet synsets or small subtrees
    of some hierarchy
  • We collect methods and apply them, eventually in
    combination
  • Mind experiment the computer as
    associatorInput some example concepts-
    Detection of the relation- Output of additional
    instancesThis can be done semi-supervised
  • Necessary- very large text corpus- features-
    methods

3
Statistical Co-occurrences
  • occurrence of two or more words within a
    well-defined unit of information (sentence,
    nearest neighbors)
  • Significant Co-occurrences reflect relations
    between words
  • Significance Measure (log-likelihood)- k is the
    number of sentences containing a and b together-
    ab is (number of sentences with a)(number of
    sentences with b)- n is total number of
    sentences in corpus

4
Iterating Co-occurrences
  • (sentence-based) co-ocurrences of first
    orderwords that co-occur significantly often
    together in sentences
  • co-occurrences of second order
  • words that co-occur significantly often in
    collocation sets of first order
  • co-occurrences of n-th orderwords that co-occur
    significantly often in collocation sets of
    (n-1)th order
  • When calculating a higher order, the significance
    values of the preceding order are not relevant. A
    co-occurrence set consists of the N highest
    ranked co-occurrences of a word.

5
Constructed Example I
Ord 1 dog terrier cat mouse barking bite yelp
dog - - - X x X
terrier - - - x x X
cat - - x - x -
mouse - - X - x -
barking X X - - - -
bite X X x x - -
yelp x x - - - -
Ord 2 dog terrier cat mouse barking bite yelp
dog 3 1 1 - - -
terrier 3 1 1 - - -
cat 1 1 1 - - -
mouse 1 1 1 - 1 -
barking - - - - 2 2
bite - - - 1 2 2
yelp - - - - 2 2
6
Constructed Example II
Ord 2 dog terrier cat mouse barking bite yelp
dog x - - - - -
terrier x - - - - -
cat - - - - - -
mouse - - - - - -
barking - - - - x x
bite - - - - x x
yelp - - - - x x
Ord 3 dog terrier cat mouse barking bite yelp
dog - - - - - -
terrier - - - - -
cat - - - - - -
mouse - - - - - -
barking - - - - 1 1
bite - - - - 1 1
yelp - - - - 1 1
7
Properties of Iterated Co-occurrences
  • after some iterations the sets remain more or
    less stable
  • the sets are somewhat semantically homogeneous
  • sometimes, they have to do nothing with the
    reference word
  • calculations performed until 10th order.
  • Example for TOP 20 NB-collocations of 10th order
    for erklärte explained sagte, schwärmte,
    lobt, schimpfte, meinte, jubelte, lobte,
    resümierte, schwärmt, Reinhard Heß, ärgerte,
    kommentierte, urteilte, analysierte, bilanzierte,
    freute, freute sich, Bundestrainer, freut
    ,gefreut said, enthused, praises, grumbled,
    meant, was jubilant, praised, summarized, dreamt,
    Reinhard Hess, annoyed, commentated, judged,
    analyzed, balanced, made happy, was pleased,
    coach of the national team, is pleased, been
    pleased

8
Mapping co-occurrences to graphs
  • For all words having co-occurrences, form nodes
    in a graph.
  • Connect them all by edges, initialize edge weight
    with 0
  • For every co-occurrence of two words in a
    sentence, increase edge weight by significance

9
First Iteration Step
  • The two black nodes A and B get connected in the
    step if there are many nodes C which are
    connected to both A and B
  • The more Cs, the higher the weight of the new edge

existing connection
new connection
10
Second Iteration Step
  • The two black nodes A and B get connected in the
    step if there are many (dark grey) nodes Ds which
    are connected to both A and B.
  • The connections between the nodes Ds and the
    nodes A and B were constructed because of (light
    gray) nodes Es and Fs, respectively

Es
Ds
Fs
former connection
existing connection
new connection
B
A
11
Collapsing bridging nodes
  • Upper bound for path length in iteration n is 2n.
  • However, some of the bridging nodes collapse,
    giving rise to self-keeping clusters of arbitrary
    path length, which are invariant under iteration.

Upper 5 nodes invariant cluster A, B are being
absorbed by this cluster
12
Examples of Iterated Co-occurrences
13
Intersection of Co-occurrence Sets resolving
ambiguity
Herz-Bube
Becker
bedient - folgenden - gereizt - Karo-Buben -
Karo-Dame - Karo-König - Karte - Karten -
Kreuz-Ass - Kreuz-Dame - Kreuz-Hand -
Kreuz-König - legt - Mittelhand - Null ouvert
- Pik - Pik-Ass - Pik-Dame - schmiert - Skat
- spielt - Spielverlauf - sticht - übernimmt -
zieht -
Agassi - Australian Open - Bindewald - Boris
- Break - Chang - Dickhaut - - gewann -
Ivanisevic - Kafelnikow - Kiefer - Komljenovic
- Leimen - Matchball - Michael Stich - Monte
Carlo - Prinosil - Sieg - Spiel - spielen -
Steeb - Teamchef
Stich
Achtelfinale - Aufschlag - Boris Becker -
Daviscup - Doppel - DTB Edberg - Finale -
Graf - Haas - Halbfinale - Match - Pilic -
Runde - Sampras - Satz - Tennis - Turnier -
Viertelfinale - Weltrangliste - Wimbledon
Alleinspieler - Herz - Herz-Dame - Herz-König
- Hinterhand - Karo - Karo-As - Karo-Bube -
Kreuz-As - Kreuz-Bube - Pik-As - Pik-Bube -
Pik-König - Vorhand -
Becker - Courier - Einzel - Elmshorn - French
Open - Herz-As - ins - Kafelnikow - Karbacher
- Krajicek - Kreuz-As - Kreuz-Bube - Michael
Stich - Mittelhand - Pik-As - Pik-Bube -
Pik-König
Stich
14
Example NB-collocations of 2nd order warm, kühl,
kalt
  • Disjunction and filtering for adjectives of
    collocation sets for warm, kühl, kalt warm,
    cool, cold results inabgekühlt, aufgeheizt,
    eingefroren, erhitzt, erwärmt, gebrannt,
    gelagert, heiß, heruntergekühlt, verbrannt,
    wärmer cooled down, heated, frozen, heated up,
    warms up, burned, stored, hot, down-cooled,
    burned, more warmly
  • emotional reading abweisend repelling for
    kühl, kalt is eliminated

15
Detection of X-onymssynonyms, antonyms,
(co)-hyponyms...
  • Idea Intersection of co-occurrence sets of two
    X-onyms as reference words should contain X-onyms
  • lexical ambiguity of one reference word does not
    deteriorate the result set
  • Method- Detect word class for reference words-
    calculate co-occurrences for reference words-
    filter co-occurrences w.r.t the word class of the
    reference words (by means of POS tags)-
    perform disjunction of the co-occurrence sets-
    output result
  • ranking can be realized over significance values
    of the co-occurrences

16
Mini-Evaluation
  • Experiments for different data sources,
    NB-collocations of 2nd and 3rd order
  • fraction of X-onyms in TOP 5 higher than in TOP
    10 ? ranking method makes sense
  • disjunction of 2nd-order and 3rd-order
    collocations almost always empty ? different
    orders exhibit different relations
  • satisfactory quantity, more through larger
    corpora
  • quality for unsupervised extension not precise
    enough

17
Word Sets for Thesaurus Expansion
  • Application thesaurus expansion
  • start set warm, kalt warm, coldresult set
    heiß, wärmer, kälter, erwärmt, gut, heißer,
    hoch, höher, niedriger, schlecht, frei hot,
    warmer, colder, warmed, good, hotter, high,
    higher, lower, bad, free
  • start set gelb, rot yellow, redresult set
    blau, grün, schwarz, grau, bunt, leuchtend,
    rötlich, braun, dunkel, rotbraun, weiß blue,
    green, black, grey, colorful, bright, reddish,
    brown, dark, red-brown, white
  • start set Mörder, Killer murderer,
    killerresult set Täter, Straftäter,
    Verbrecher, Kriegsverbrecher, Räuber,
    Terroristen, Mann, Mitglieder, Männer,
    Attentäter offender, delinquent, criminal, war
    criminal, robber, terrorists, man, members, men,
    assassin

18
More Examples in English
  • Intersection of N2-Order collocation sets

19
Questions?
  • THANK YOU !
Write a Comment
User Comments (0)
About PowerShow.com