8' Lexical Acquisition - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

8' Lexical Acquisition

Description:

Develop algorithms and statistical techniques for filling the holes ... Dwelling/abode. Two words from the same semantic domain or topic. Doctor, nurse, fever ... – PowerPoint PPT presentation

Number of Views:223
Avg rating:3.0/5.0
Slides: 33
Provided by: hoojun
Category:

less

Transcript and Presenter's Notes

Title: 8' Lexical Acquisition


1
8. Lexical Acquisition
  • Manning Schutze, Foundations of Statistical
    NLP

2
Contents
  • Lexical Acquisition
  • Evaluation Measures
  • Verb Subcategorization
  • Attachment Ambiguity
  • Selectional Preferences
  • Semantic Similarity
  • The Role of Lexical Acquisition in Statistical NLP

3
Lexical Acquisition
  • Acquisition of more complex syntactic and
    semantic properties of words (than chapter 5)
  • Selectional preference
  • Subcategorization frames
  • Semantic categorization
  • Develop algorithms and statistical techniques for
    filling the holes in existing MRD by looking at
    the occurrence patterns of words in corpora

4
Evaluation Measures
  • Ultimate demonstration of success is showing
    improved performance
  • Precision Recall
  • Precision tp / (tpfp)
  • Recall tp / (tpfn)
  • F measure
  • Fallout fp / (fp tn)
  • Measure of how hard it is to build a system that
    produces few false positives
  • ROC curve recall fallout tradeoffs

5
Verb Subcategorization
  • Verbs subcategorize for different syntactic
    categories
  • Verbs express their semantic arguments with
    different syntactic means
  • Subcategorization frame a particular set of
    syntactic categories that a verb can appear
    (example Table 8.2)
  • Donated a large sum of money to the church
  • Gave the church a large sum of money

6
Some subcategorization frames (Table 8.2)
7
Verb Subcategorization
  • Verb subcategorization frame helps parsing
  • She told the man where Peter grew up.
  • She found the place where Peter grew up.
  • Unfortunately, most dictionariesdo not contain
    information onsubcategorization frames
  • Do not cover all subcategorization frames
  • Do not have quantitative information
  • Acquisition of subcategorization information from
    corpora is necessary
  • Cope with the productivity of language
  • Supplement dictionaries

Subcategorization Frame
8
Verb Subcategorization
  • Learning Algorithm proposed by Brent 1993
    (Lerner)
  • Suppose we want to decide based on corpus
    evidence whether verb v takes frame f. Lerner
    makes this decision in two steps
  • Cues Define a regular pattern which indicates
    the presence of the frame with high certainty
  • Hypothesis testing Initially assume that the
    frame is not appropriate for the verb (null
    hypothesis H0) This hypothesis is rejected if the
    cue indicate with high probability that our H0 is
    wrong
  • Cues Regular patterns to find subcategorization
    frame
  • Cue for frame NP NP
  • (OBJ SUBJ_OBJ CAP) (PUNC CC)
  • a. I greet Peter,
  • b. I came Thursday, before the storm started

9
Verb Subcategorization
  • Hypothesis testing
  • If pE lt a, then we reject H0 (Permit f j as a
    frame of vi)
  • Precision close to 100 (when a0.02) Recall
    47 to 100

n of times vi occurs vi(f j)0 Verb vi does
not permit frame f j C(vi,c j) of times that
vi occurs with cue c j ej error rate for cue f j
10
Verb Subcategorization
  • Mannings (1993) Method
  • Use tagger and run the cue detection on the
    output of the tagger
  • How reliable a cue is not matter!
  • Even an unreliable indicator can help to
    determine the subcategorization frame of a verb
    reliably if it occurs often enough
  • Allowing low-reliability cue and additional cues
    based on tagger output increases the number of
    cues significantly
  • Result sample Table 8.3
  • 3 errors
  • 2 PPs bridge between, retire in
  • Assign vi frame to remark (And..same problems,
    Mr. Smith remarked)
  • Precision 90 (complete set of 40 verbs)Recall
    43

11
Attachment Ambiguity
  • Moscow sent more than 100,000 soldiers into
    Afghanistan
  • How to solve?
  • Simple Model
  • Uses lexical preferences (lexical statistics)
  • co-occurrence counts b/w v and prep, and b/w n
    and prep
  • Ignores a bias for noun attachment in cases where
    a preposition is equally compatible with verb and
    the noun
  • Chrysler confirmed that it would end its troubled
    venture with Maserati

12
Attachment Ambiguity
  • Hindle and Rooth 1993
  • Event space Vt N . PP
  • Only model the behavior of the first PP
  • Determine attachment counts from an unlabeled
    corpus
  • Build a initial model by counting all unambiguous
    cases
  • She sent him into the nursery to gather up his
    toys (obvious attachment)
  • Apply the initial model to all ambiguous cases
    and assign them to the appropriate count if ?
    exceeds a threshold
  • Divide the remaining ambiguous cases evenly
    between the counts
  • P80 R100 P91.7, R55.2 (?s
    threshold3.0)

13
Attachment Ambiguity
  • Limitation of the models
  • Only consider the identity of the preposition,
    noun and the verb
  • Sometimes, other information is important (ex
    noun in the PP)
  • I examined the man with a stethoscope
  • I examined the man with a broken leg
  • Consider only the most basic case of PP
    immediately after an NP object which is modifying
    either the immediately preceding n or v.
  • The board approved its acquisition by Royal
    Trustco Ltd. of Toronto for 27 a share at
    its monthly meeting
  • Other attachment issues
  • Attachment ambiguity in noun compounds
  • Door bell manufacturer left-branching
  • Woman aid worker right-branching

14
Attachment Ambiguity
  • Large Proportion of PP exhibits indeterminacy
    with respect to attachment
  • We have not signed a settlement agreement with
    them
  • Motivates us to explore new ways of determining
    the contribution a PP makes to the meaning of
    sentence
  • Suggests that it may not be a good idea to
    require that PP meaning always be mediated
    through a NP or a VP as current syntactic
    formalisms do

15
Selectional Preference
  • Selectional restriction
  • Preference of arguments of a particular type for
    verbs
  • Ex) Object of the verb eat tend to be food item
  • Subjects of bark tend to be dogs
  • Preference ! Rule
  • Why it is important?
  • Infer unknown words meaning
  • Susan had never eaten a fresh durian before.
  • Rank the possible parses of a sentence
  • Give higher scores to parses where the verb has
    natural arguments
  • (Here, well consider only the case of
    verb-direct object)

16
Selectional Preference
  • Resniks Model
  • Two notions
  • Selectional preference strength
  • measures how strongly the verb constraints its
    direct object.
  • the KL divergence between the prior distribution
    of direct objects and the distribution of direct
    objects of the verb we are trying to characterize

P(C) overall probability distribution of noun
classes P(Cv) probability distribution of noun
classes in the direct object position
of v
17
Selectional Preference Strength (based on
hypothetical data)
18
Selectional Preference
  • Selectional Association
  • Association between a verb and a class
  • Proportion that its summand
    contributes to the overall
    preference strength S(v)
  • Association strength to nouns
  • Example
  • A(eat,food) 1.08, A(find, action) -0.13

19
Selectional Preference
  • Estimating the probability P(cv) P(v,c) / P(v)
  • N total number of verb-object pairs in the
    corpus
  • words(c) set of all nouns in class c
  • classes(n) number of noun classes that
    contain n as a member
  • C(v,n) number of verb-object pairs with v as
    the verb and n as the head of the object NP
  • Bypasses the problem of disambiguating nouns

20
Selectional Preference
  • Association strength distinguishing a verbs
    plausible and implausible objects(Actual Data)
    Table 8.6
  • Left half typical objects
  • Right half atypical objects
  • In most cases, association strength A(v,n) is a
    good predictor of object typicality
  • Most errors the model makes are due to the fact
    that it performs a form of disambiguation, by
    choosing the highest A(v,c) for A(v,n)
  • Implicit Object Alternation Prediction
  • Mike ate the cake.
  • Mike ate.
  • The more constraints a verb puts on its object,
    the more likely it is to permit the
    implicit-object construction
  • SPS is seen as the more basic phenomenon which
    explains the occurrence of implicit-objects as
    well as association strength

21
Semantic Similarity
  • Acquisition of meaning
  • Final goal of lexical acquisition
  • But, how to represent meaning(that can be
    operationally used by an automatic system)?
  • Semantic similarity
  • Automatically acquiring a relative measure of how
    similar a new word is to known words is much
    easier than determining what the meaning actually
    is
  • Most often used for generalization under the
    assumption that semantically similar words behave
    similarly
  • Also used for query expansion astronaut ?
    cosmonaut
  • Used for k nearest neighbors classification

22
Semantic Similarity
  • Notion of semantic similarity
  • Extension of synonym or near-synonym
  • Dwelling/abode
  • Two words from the same semantic domain or topic
  • Doctor, nurse, fever
  • Contextual interchangeable words (Miller and
    Charles 1999)
  • Word similar to the appropriate sense (for
    ambiguous word)
  • Litigation/suit(!clothes)
  • Similarity Measures
  • Vector space measures
  • Probabilistic measures

23
Semantic Similarity
  • Vector space measure
  • Conceptualized measure of semantic similarity
  • Words whose semantic similarity to be computed
    are represented as vectors in a multi-dimensional
    array
  • Doc - Word matrix (Figure 8.3)
  • Words are similar if they occur in the same
    documents
  • Word Word matrix (Figure 8.4)
  • Words are similar when they co-occur with same
    words
  • Modifier Head matrix (Figure 8.5)
  • Hears are similar when they are modified by the
    same modifiers
  • Different spaces get at different types of
    semantic similarity
  • Doc-Word, Word-Word spaces capture topical
    similarity
  • Modifier-Head space captures more fine grained
    similarity
  • Similarity of rows
  • Matrix A documents similarity
  • Matrix C modifier similarity

24
Semantic Similarity
  • Similarity measures (in precise) for binary
    vectors
  • Matching coefficient
  • Dice coefficient
  • Take into account the length of the vectors and
    the total number of non zero entries
  • Jaccard (or Tanimoto) coefficient
  • Penalizes a small number of shared entries more
    than the Dice coefficient does
  • Overlap coefficient
  • Cosine
  • Penalizes less in cases where the number of
    non-zero entries is very different

25
Semantic Similarity
  • Real-valued vector space
  • More powerful representation than binary vector
    space

26
Semantic Similarity
  • Table 8.8 cosine similarities computed for the
    NYT corpus
  • Word-by-word matrix (20,000 by 1000 matrix)
  • Co-occurrence two words occurs with in 25 words
    of each other
  • Summary
  • Have been used in IR for a long time
  • Advantages
  • Intuitively simple! easy to visualize
  • Computationally efficient!
  • Disadvantages
  • Operate on binary data except for cosine
  • Cosine has its own problem
  • Cosine assumes a Euclidean space
  • Euclidean space is not well-motivated choice if
    the vectors we are dealing with are vectors of
    probability or counts

27
Semantic Similarity
  • Probabilistic Measure
  • Transform semantic similarity into the similarity
    of two probability distribution
  • Transform matrices of counts in Figure 8.3, 8.4
    and 8.5 into matrices of conditional probability
  • Ex) (American, Astronaut) ? P(Americanastronaut)
    ½ 0.5

Part of Figure 8.4
28
Semantic Similarity
  • Measures of dissimilarity between probability
    distributions - Table 8.9 (Dagan 1997b)
  • KL divergence
  • Measures how much information is lost if we
    assume distribution q when the true distribution
    is p
  • Practical Problems
  • Get value of infinity when qi0 and pi ! 0
  • Asymmetric ( D(pq) ! D(qp) )
  • Information radius (IRAD)
  • Measures how much information is lost if we
    describe the two words that correspond to p and q
    with their average distribution

29
Semantic Similarity
  • L1 (Manhattan) norm
  • Measure of expected proportion of events that are
    going to be different between the distributions p
    and q
  • Example
  • p1 P(Soviet cosmonaut) 0.5
  • p2 0
  • p3 P(spacewalking cosmonaut)0.5
  • q1 0
  • q2 P(American astronaut) 0.5
  • q3 P(spacewalking astronaut) 0.5

30
Semantic Similarity
  • Comparison between three dissimilarity measures
  • Test Selectional preference problem
  • Find appropriate verb as predicates for given
    noun
  • Example Get similarity of make and take to
    determine that make is the right verb to use with
    plans
  • Dagan et al.(1997b) show that IRad consistently
    performs better than KL and L1

ß can be tuned for optimal performance
31
The Role of Lexical Acquisition in Statistical NLP
  • Lexical acquisition plays a key role in
    statistical NLP
  • Reasons
  • Cost of building lexical resources manually
  • Collect quantitative information (humans are bad
    at this)
  • Many lexical resources were designed for human
    consumption
  • Inherent productivity of language
  • Lexical Coverage
  • Sampson 1989s analysis
  • Tests 45,000 words corpus with 70,000 entries
    dictionary
  • 3 of tokens were not in the dictionary (Table
    8.10)
  • Half of the missing words were proper noun
  • Started take center stage in the late 80s

32
The Role of Lexical Acquisition in Statistical NLP
  • Now and Future
  • Look harder for sources of prior knowledge that
    can constrain the process of lexical acquisition
  • Linguistic theory can be a source of prior
    knowledge
  • Utilize encyclopedias, thesauri, gazetteers,
    collections of technical vocabulary and any other
    reference work or DB in addition to dictionaries
    and text corpora
Write a Comment
User Comments (0)
About PowerShow.com