Unambiguous%20 %20Unlimited%20=%20Unsupervised%20Using%20the%20Web%20for%20Natural%20Language%20Processing%20Problems - PowerPoint PPT Presentation

About This Presentation
Title:

Unambiguous%20 %20Unlimited%20=%20Unsupervised%20Using%20the%20Web%20for%20Natural%20Language%20Processing%20Problems

Description:

Machine translation is vastly improved. Speech recognition is decent in limited circumstances ... Association Models: 2 (Chi Squared) A = #(wi,wj) B = #(wi) ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Unambiguous%20 %20Unlimited%20=%20Unsupervised%20Using%20the%20Web%20for%20Natural%20Language%20Processing%20Problems


1
Unambiguous Unlimited UnsupervisedUsing the
Web for Natural Language Processing Problems
  • Marti Hearst
  • School of Information, UC Berkeley
  • UCB Neyman Seminar
  • October 25, 2006

This research supported in part by NSF DBI-0317510
2
Natural Language Processing
  • The ultimate goal write programs that read and
    understand stories and conversations.
  • This is too hard! Instead we tackle
    sub-problems.
  • There have been notable successes lately
  • Machine translation is vastly improved
  • Speech recognition is decent in limited
    circumstances
  • Text categorization works with some accuracy

3
Automatic Help Desk Translation at MS
4
Why is text analysis difficult?
  • One reason enormous vocabulary size.
  • The average English speakers vocabulary is
    around 50,000 words,
  • Many of these can be combined with many others,
  • And they mean different things when they do!

5
How can a machine understand these differences?
  • Get the cat with the gloves.

6
How can a machine understand these differences?
  • Get the sock from the cat with the gloves.
  • Get the glove from the cat with the socks.

7
How can a machine understand these differences?
  • Decorate the cake with the frosting.
  • Decorate the cake with the kids.
  • Throw out the cake with the frosting.
  • Throw out the cake with the kids.

8
Why is this difficult?
  • Same syntactic structure, different meanings.
  • Natural language processing algorithms have to
    deal with the specifics of individual words.
  • Enormous vocabulary sizes.
  • The average English speakers vocabulary is
    around 50,000 words,
  • Many of these can be combined with many others,
  • And they mean different things when they do!

9
How to tackle this problem?
  • The field was stuck for quite some time.
  • Hand-enter all semantic concepts and relations
  • A new approach started around 1990
  • Get large text collections
  • Compute statistics over the words in those
    collections
  • There are many different algorithms.

10
Size Matters
  • Recent realization bigger is better than
    smarter!
  • Banko and Brill 01 Scaling to Very, Very
    Large Corpora for Natural Language
    Disambiguation, ACL

11
Example Problem
  • Grammar checker example
  • Which word to use?
  • ltprincipalgt ltprinciplegt
  • Solution use well-edited text and look at which
    words surround each use
  • I am in my third year as the principal of Anamosa
    High School.
  • School-principal transfers caused some upset.
  • This is a simple formulation of the quantum
    mechanical uncertainty principle.
  • Power without principle is barren, but principle
    without power is futile. (Tony Blair)

12
Using Very, Very Large Corpora
  • Keep track of which words are the neighbors of
    each spelling in well-edited text, e.g.
  • Principal high school
  • Principle rule
  • At grammar-check time, choose the spelling best
    predicted by the surrounding words.
  • Surprising results
  • Log-linear improvement even to a billion words!
  • Getting more data is better than fine-tuning
    algorithms!

13
The Effects of LARGE Datasets
  • From Banko Brill 01

14
How to Extend this Idea?
  • This is an exciting result
  • BUT relies on having huge amounts of text that
    has been appropriately annotated!

15
How to Avoid Manual Labeling?
  • Web as a baseline (Lapata Keller 04,05)
  • Main idea apply web-determined counts to every
    problem imaginable.
  • Example for t in ltprincipalgt ltprinciplegt
  • Compute f(w-1, t, w1)
  • The largest count wins

16
Web as a Baseline
  • Works very well in some cases
  • machine translation candidate selection
  • article generation
  • noun compound interpretation
  • noun compound bracketing
  • adjective ordering
  • But lacking in others
  • spelling correction
  • countability detection
  • prepositional phrase attachment
  • How to push this idea further?

Significantly better than the best supervised
algorithm.
Not significantly different from the best
supervised.
17
Using Unambiguous Cases
  • The trick look for unambiguous cases to start
  • Use these to improve the results beyond what
    co-occurrence statistics indicate.
  • An Early Example
  • Hindle and Rooth, Structural Ambiguity and
    Lexical Relations, ACL 90, Comp Ling93
  • Problem Prepositional Phrase attachment
  • I eat/v spaghetti/n1 with/p a fork/n2.
  • I eat/v spaghetti/n1 with/p sauce/n2.
  • Question does n2 attach to v or to n1?

18
Using Unambiguous Cases
  • How to do this with unlabeled data?
  • First try
  • Parse some text into phrase structure
  • Then compute certain co-occurrences
  • f(v, n1, p) f(n1, p) f(v, n1)
  • Problem results not accurate enough
  • The trick look for unambiguous cases
  • Spaghetti with sauce is delicious. (pre-verbal)
  • I eat with a fork. (no
    direct object)
  • Use these to improve the results beyond what
    co-occurrence statistics indicate.

19
Using Unambiguous Cases
  • Hindle Rooth, final algorithm
  • Parse text into phrase structure.
  • Create bigram counts (v, p) and (n1, p) as
    follows
  • First, use unambiguous cases to populate bigram
    table
  • Then, for the ambiguous cases
  • Compute a Lexical Association score comparing
    (v1, n1, p) to (n1, p, n2).
  • If this is greater than a threshold, update the
    bigram table with the assumed attachment
  • Else split the score and assign to both
    attachments
  • The bigram table is used for further computations
    of the Lexical Association score.

20
Unambiguous Unlimited Unsupervised
  • Apply the Unambiguous Case Idea to the Very, Very
    Large Corpora idea
  • The potential of these approaches are not fully
    realized
  • Our work (with Preslav Nakov)
  • Structural Ambiguity Decisions
  • PP-attachment
  • Noun compound bracketing
  • Coordination grouping
  • Semantic Relation Acquisition
  • Hypernym (ISA) relations
  • Verbal relations between nouns
  • SAT Analogy problems

21
Structural Ambiguity Problems
  • Apply the U U U idea to structural ambiguity
  • Noun compound bracketing
  • Prepositional Phrase attachment
  • Noun Phrase coordination
  • Motivation BioText project
  • In eukaryotes, the key to transcriptional
    regulation of the Heat Shock Response is the Heat
    Shock Transcription Factor (HSF).
  • Open-labeled long-term study of the subcutaneous
    sumatriptan efficacy and tolerability in acute
    migraine treatment.
  • BimL protein interact with Bcl-2 or Bcl-XL, or
    Bcl-w proteins (Immuno-precipitation (anti-Bcl-2
    OR Bcl-XL or Bcl-w)) followed by Western blot
    (anti-EEtag) using extracts human 293T cells
    co-transfected with EE-tagged BimL and (bcl-2 or
    bcl-XL or bcl-w) plasmids)

22
Applying U U U to Structural Ambiguity
  • We introduce the use of (nearly) unambiguous
    features
  • Surface features
  • Paraphrases
  • Combined with ngrams
  • Use from very, very large corpora
  • Achieve state-of-the-art results without labeled
    examples.

23
Noun Compound Bracketing
  • (a) liver cell antibody (left
    bracketing)
  • (b) liver cell line (right
    bracketing)
  • In (a), the antibody targets the liver cell.
  • In (b), the cell line is derived from the liver.

24
Dependency Model
  • right bracketing w1w2w3
  • w2w3 is a compound (modified by w1)
  • home health care
  • w1 and w2 independently modify w3
  • adult male rat
  • left bracketing w1w2 w3
  • only 1 modificational choice possible
  • law enforcement officer

w1 w2 w3
w1 w2 w3
25
Related Work
  • Marcus(1980), Pustejoskyal.(1993), Resnik(1993)
  • adjacency model Pr(w1w2) vs. Pr(w2w3)
  • Lauer (1995)
  • dependency model Pr(w1w2) vs. Pr(w1w3)
  • Keller Lapata (2004)
  • use the Web
  • unigrams and bigrams
  • Girju al. (2005)
  • supervised model
  • bracketing in context
  • requires WordNet senses
  • to be given
  • Our approach
  • Web as data
  • ?2 , n-grams
  • paraphrases
  • surface features

26
Our U U U Algorithm
  • Compute bigram estimates
  • Compute estimates from surface features
  • Compute estimates from paraphrases
  • Combine these scores with a voting algorithm to
    choose left or right bracketing.
  • We use the same general approach for two other
    structural ambiguity problems.

27
Computing Bigram Statistics
  • Dependency Model, Frequencies
  • Compare (w1,w2) to (w1,w3)
  • Dependency model, Probabilities
  • Pr(left) Pr(w1?w2w2)Pr(w2?w3w3)
  • Pr(right) Pr(w1?w3w3)Pr(w2?w3w3)
  • So we compare Pr(w1?w2w2) to Pr(w1?w3w3)

right
w1 w2 w3
left
28
Using ngrams to estimate probabilities
  • Using page hits as a proxy for n-gram counts
  • Pr(w1?w2w2) (w1,w2) / (w2)
  • (w2) word frequency query for w2
  • (w1,w2) bigram frequency query for w1 w2
  • smoothed by 0.5
  • Use ?2 to determine if w1 is associated with w2
    (thus indicating left bracketing), and same for
    w1 with w3

29
Association Models ?2 (Chi Squared)
  • A (wi,wj)
  • B (wi) (wi,wj)
  • C (wj) (wi,wj)
  • D N (ABC)
  • N 8 trillion ( ABCD)

8 billion Web pages x 1,000 words
30
Our U U U Algorithm
  • Compute bigram estimates
  • Compute estimates from surface features
  • Compute estimates from paraphrases
  • Combine these scores with a voting algorithm to
    choose left or right bracketing.

31
Web-derived Surface Features
  • Authors often disambiguate noun compounds using
    surface markers, e.g.
  • amino-acid sequence ? left
  • brain stems cell ? left
  • brains stem cell ? right
  • The enormous size of the Web makes these frequent
    enough to be useful.

32
Web-derived Surface FeaturesDash (hyphen)
  • Left dash
  • cell-cycle analysis ? left
  • Right dash
  • donor T-cell ? right
  • Double dash
  • T-cell-depletion ? unusable

33
Web-derived Surface FeaturesPossessive Marker
  • Attached to the first word
  • brains stem cell ? right
  • Attached to the second word
  • brain stems cell ? left
  • Combined features
  • brains stem-cell ? right

34
Web-derived Surface FeaturesCapitalization
  • anycase lowercase uppercase
  • Plasmodium vivax Malaria ? left
  • plasmodium vivax Malaria ? left
  • lowercase uppercase anycase
  • brain Stem cell ? right
  • brain Stem Cell ? right
  • Disable this on
  • Roman digits
  • Single-letter words e.g. vitamin D deficiency

35
Web-derived Surface FeaturesEmbedded Slash
  • Left embedded slash
  • leukemia/lymphoma cell ? right

36
Web-derived Surface FeaturesParentheses
  • Single-word
  • growth factor (beta) ? left
  • (brain) stem cell ? right
  • Two-word
  • (growth factor) beta ? left
  • brain (stem cell) ? right

37
Web-derived Surface FeaturesComma, dot,
semi-colon
  • Following the first word
  • home. health care ? right
  • adult, male rat ? right
  • Following the second word
  • health care, provider ? left
  • lung cancer patients ? left

38
Web-derived Surface FeaturesDash to External
Word
  • External word to the left
  • mouse-brain stem cell ? right
  • External word to the right
  • tumor necrosis factor-alpha ? left

39
Web-derived Surface FeaturesProblems Solutions
  • Problem search engines ignore punctuation in
    queries
  • brain-stem cell does not work
  • Solution
  • query for brain stem cell
  • obtain 1,000 document summaries
  • scan for the features in these summaries

40
Other Web-derived FeaturesPossessive Marker
  • We can also query directly for possessives
  • Yes, brain stems cell sort of works.
  • Search engines
  • drop the possessive marker
  • but s is kept
  • Still, we cannot query for brain stems cell

41
Other Web-derived FeaturesAbbreviation
  • After the second word
  • tumor necrosis factor (NF) ? right
  • After the third word
  • tumor necrosis (TN) factor ? right
  • We query for, e.g., tumor necrosis tn factor
  • Problems
  • Roman digits IV, VI
  • States CA
  • Short words me

42
Other Web-derived FeaturesConcatenation
  • Consider health care reform
  • healthcare 79,500,000
  • carereform 269
  • healthreform 812
  • Adjacency model
  • healthcare vs. carereform
  • Dependency model
  • healthcare vs. healthreform
  • Triples
  • healthcare reform vs. health carereform

43
Other Web-derived FeaturesUsing Googles
Operator
  • Each allows a one-word wildcard
  • Single star
  • health care reform ? left
  • health care reform ? right
  • More stars and/or reverse order
  • care reform health ? right

44
Other Web-derived FeaturesReorder
  • Reorders for health care reform
  • care reform health ? right
  • reform health care ? left

45
Other Web-derived FeaturesInternal Inflection
Variability
  • Vary inflection of second word
  • tyrosine kinase activation
  • tyrosine kinases activation

46
Other Web-derived FeaturesSwitch The First Two
Words
  • Predict right, if we can reorder
  • adult male rat as
  • male adult rat

47
Our U U U Algorithm
  • Compute bigram estimates
  • Compute estimates from surface features
  • Compute estimates from paraphrases
  • Combine these scores with a voting algorithm to
    choose left or right bracketing.

48
Paraphrases
  • The semantics of a noun compound is often made
    overt by a paraphrase (Warren,1978)
  • Prepositional
  • stem cells in the brain ? right
  • cells from the brain stem ? right
  • Verbal
  • virus causing human immunodeficiency ? left
  • Copula
  • office building that is a skyscraper ? right

49
Paraphrases
  • Lauer(1995), KellerLapata(2003), Girjual.
    (2005) predict NC semantics by choosing the most
    likely preposition
  • of, for, in, at, on, from, with, about, (like)
  • This could be problematic, when more than one
    preposition is possible
  • In contrast
  • we try to predict syntax, not semantics
  • we do not disambiguate, just add up all counts
  • cells in (the) bone marrow ? left
  • cells from (the) bone marrow ? left

50
Paraphrases
  • prepositional paraphrases
  • We use 150 prepositions
  • verbal paraphrases
  • We use associated with, caused by, contained in,
    derived from, focusing on, found in, involved in,
    located at/in, made of, performed by, preventing,
    related to and used by/in/for.
  • copula paraphrases
  • We use is/was and that/which/who
  • optional elements
  • articles a, an, the
  • quantifiers some, every, etc.
  • pronouns this, these, etc.

51
Our U U U Algorithm
  • Compute bigram estimates
  • Compute estimates from surface features
  • Compute estimates from paraphrases
  • Combine these scores with a voting algorithm to
    choose left or right bracketing.

52
Evaluation Datasets
  • Lauer Set
  • 244 noun compounds (NCs)
  • from Groliers encyclopedia
  • inter-annotator agreement 81.5
  • Biomedical Set
  • 430 NCs
  • from MEDLINE
  • inter-annotator agreement 88 (? .606)

53
Evaluation Experiments
  • Exact phrase queries
  • Limited to English
  • Inflections
  • Lauer Set Carrolls morphological tools
  • Biomedical Set UMLS Specialist Lexicon

54
Co-occurrence Statistics
  • Lauer set
  • Bio set

55
Paraphrase and Surface Features Performance
  • Lauer Set
  • Biomedical Set

56
Individual Surface Features Performance Bio
57
Individual Surface Features Performance Bio
58
Results Lauer
59
Results Comparing with Others
60
Results Bio
61
Results for Noun Compound Bracketing
  • Introduced search engine statistics that go
    beyond the n-gram (applicable to other tasks)
  • surface features
  • paraphrases
  • Obtained new state-of-the-art results on NC
    bracketing
  • more robust than Lauer (1995)
  • more accurate than KellerLapata (2004)

62
Prepositional Phrase Attachment
  • Problem
  • (a) Peter spent millions of dollars. (noun
    attach)
  • (b) Peter spent time with his family. (verb
    attach)
  • Which attachment for quadruple
  • (v, n1, p, n2)
  • Results
  • Much simpler than other algorithms
  • As good as or better than best unsupervised, and
    better than some supervised approaches

63
Related Work
  • Supervised
  • (Brill Resnik, 94) transformation-based
    learning, WordNet classes, P82
  • (Ratnaparkhi al., 94)
  • ME, word classes (MI), P81.6
  • (Collins Brooks, 95)
  • back-off, P84.5
  • (Stetina Makoto, 97) decision trees, WordNet,
    P88.1
  • (Toutanova al., 04) morphology, syntax,
    WordNet, P87.5
  • Unsupervised
  • (Hindle Rooth, 93) partially parsed corpus,
    lexical associations over subsets of (v,n1,p),
    P80,R80
  • (Ratnaparkhi, 98) POS tagged corpus, unambiguous
    cases for (v,n1,p), (n1,p,n2), classifier
    P81.9
  • (Pantel Lin,00) collocation database,
    dependency parser, large corpus (125M words),
    P84.3

Unsup. state-of-the-art
64
PP-attachment Our Approach
  • Unsupervised
  • (v,n1,p,n2) quadruples, Ratnaparkhi test set
  • Google and MSN Search
  • Exact phrase queries
  • Inflections WordNet 2.0
  • Adding determiners where appropriate
  • Models
  • n-gram association models
  • Web-derived surface features
  • paraphrases

65
N-gram models
  • (i) Pr(pn1) vs. Pr(pv)
  • (ii) Pr(p,n2n1) vs. Pr(p,n2v)
  • I eat/v spaghetti/n1 with/p a fork/n2.
  • I eat/v spaghetti/n1 with/p sauce/n2.
  • Pr or (frequency)
  • smoothing as in (Hindle Rooth, 93)
  • back-off from (ii) to (i)
  • N-grams unreliable, if n1 or n2 is a pronoun.
  • MSN Search no rounding of n-gram estimates

66
Web-derived Surface Features
P R
  • Example features
  • open the door / with a key ? verb (100.00,
    0.13)
  • open the door (with a key) ? verb (73.58,
    2.44)
  • open the door with a key? verb (68.18,
    2.03)
  • open the door , with a key ? verb (58.44,
    7.09)
  • eat Spaghetti with sauce ? noun (100.00,
    0.14)
  • eat ? spaghetti with sauce? noun (83.33,
    0.55)
  • eat , spaghetti with sauce ? noun (65.77,
    5.11)
  • eat spaghetti with sauce ? noun (64.71,
    1.57)
  • Summing achieves high precision, low recall.

sum
compare
sum
67
Paraphrases
  • v n1 p n2
  • v n2 n1 (noun)
  • v p n2 n1 (verb)
  • p n2 v n1 (verb)
  • n1 p n2 v (noun)
  • v PRONOUN p n2 (verb)
  • BE n1 p n2 (noun)

68
Evaluation
  • Ratnaparkhi dataset
  • 3097 test examples, e.g.
  • prepare dinner for family V
  • shipped crabs from province V
  • n1 or n2 is a bare determiner 149 examples
  • problem for unsupervised methods
  • left chairmanship of the N
  • is the of kind N
  • acquire securities for an N
  • special symbols , /, etc. 230 examples
  • problem for Web queries
  • buy for 10 V
  • beat SP-down from V
  • is 43-owned by firm N

69
Results
For prepositions other then OF. (of ? noun
attachment)
Models in bold are combined in a majority vote.
Simpler but not significantly different from
84.3 (PantelLin,00).
70
Noun Phrase Coordination
  • (Modified) real sentence
  • The Department of Chronic Diseases and Health
    Promotion leads and strengthens global efforts to
    prevent and control chronic diseases or
    disabilities and to promote health and quality of
    life.

71
NC coordination ellipsis
  • Ellipsis
  • car and truck production
  • means car production and truck production
  • No ellipsis
  • president and chief executive
  • All-way coordination
  • Securities and Exchange Commission

72
NC Coordination ellipsis
  • Quadruple (n1,c,n2,h)
  • Penn Treebank annotations
  • ellipsis
  • (NP car/NN and/CC truck/NN production/NN).
  • no ellipsis
  • (NP (NP president/NN) and/CC (NP chief/NN
    executive/NN))
  • all-way can be annotated either way
  • This is a problem a parser must deal with.

Collins parser always predicts ellipsis, but
other parsers (e.g. Charniaks) try to solve it.
73
Results428 examples from Penn TB
74
Semantic Relation Detection
  • Goal automatically augment a lexical database
  • Many potential relation types
  • ISA (hypernymy/hyponymy)
  • Part-Of (meronymy)
  • Idea find unambiguous contexts which (nearly)
    always indicate the relation of interest

75
Lexico-Syntactic Patterns
76
Lexico-Syntactic Patterns
77
Adding a New Relation
78
Semantic Relation Detection
  • Lexico-syntactic Patterns
  • Should occur frequently in text
  • Should (nearly) always suggest the relation of
    interest
  • Should be recognizable with little pre-encoded
    knowledge.
  • These patterns have been used extensively by
    other researchers.

79
Semantic Relation Detection
  • What relationship holds between two nouns?
  • olive oil oil comes from olives
  • machine oil oil used on machines
  • Assigning the meaning relations between these
    terms has been seen as a very difficult solution
  • Our solution
  • Use clever queries against the web to figure out
    the relations.

80
Queries for Semantic Relations
  • Convert the noun-noun compound into a query of
    the form
  • noun2 that noun1
  • oil that olive(s)
  • This returns search result snippets containing
    interesting verbs.
  • In this case
  • Come from
  • Be obtained from
  • Be extracted from
  • Made from

81
Uncovering Semantic Relations
  • More examples
  • Migraine drug -gt treat, be used for, reduce,
    prevent
  • Wrinkle drug -gt treat, be used for, reduce,
    smooth
  • Printer tray -gt hold, come with, be folded, fit
    under, be inserted into
  • Student protest -gt be led by, be sponsored by,
    pit, be, be organized by

82
Application SAT Analogy Problems
83
Tackling the SAT Analogy Problem
  • First issue queries to find the relations
    (features) that hold between each word pair
  • Compare the features for each answer pair to
    those of the question pair.
  • Weight the features with term count and document
    counts
  • Compare the weighted feature sets using Dice
    coefficient

84
Queries for SAT Analogy Problem
85
Extract Features from Retrieved Text
  • Verb
  • The committee includes many members.
  • This is a committee, which includes many members.
  • This is a committee, including many members.
  • VerbPreposition
  • The committee consists of many members.
  • Preposition
  • He is a member of the committee.
  • Coordinating Conjunction
  • the committee and its members

86
Most Frequent Features for committee member
87
TF.IDF Weighting
  • TF.IDF classic
  • TF.IDF with add-one smoothing

88
Similarity Measure Dice Coefficient
  • Dice coefficient for sets
  • Dice coefficient extended to frequencies

89
SAT Results Nouns Only
90
Conclusions
  • The enormous size of the web opens new
    opportunities for text analysis
  • There are many words, but they are more likely to
    appear together in a huge dataset
  • This allows us to do word-specific analysis
  • To counter the labeled-data roadblock, we start
    with unambiguous features that we can find
    naturally.
  • Weve applied this to structural and semantic
    language problems.
  • These are stepping stones towards sophisticated
    language understanding.

91
Conclusions
  • Tapping the potential of very large corpora for
    unsupervised algorithms
  • Go beyond n-grams
  • Surface features
  • Paraphrases
  • Results competitive with best unsupervised
  • Results can rival supervised algorithms
  • Future Work
  • Unambiguous Unlimited Unsupervised
  • How to extend to other problems?

92
Thank you!
  • http//biotext.berkeley.edu
  • Supported in part by NSF DBI-0317510

93
What about Search?
  • Web search currently does not use very much
    language analysis.
  • Queries are very short (2.1 words/avg) so most
    queries match many pages
  • Improvements in ranking make use of the massive
    size of the web
  • Anchor text (words on links pointed to pages)
  • Which hits users clicked on (starting to use
    this)
  • As well as the structure of language
  • Where query terms occur (title, etc)
  • How close together query words occur

94
Using n-grams to make predictions
  • Say trying to distinguish
  • home health care
  • home health care
  • Main idea compare these co-occurrence
    probabilities
  • home health vs
  • health care

95
Using n-grams to make predictions
  • Use search engines page hits as a proxy for
    n-gram counts
  • compare Pr(w1?w2w2) to Pr(w1?w3w3)
  • Pr(w1 ?w2w2 ) (w1,w2) / (w2)
  • (w2) word frequency query for w2
  • (w1,w2) bigram frequency query for w1 w2

96
Probabilities Why? (1)
  • Why should we use
  • (a) Pr(w1?w2w2), rather than
  • (b) Pr(w2?w1w1)?
  • KellerLapata (2004) calculate
  • AltaVista queries
  • (a) 70.49
  • (b) 68.85
  • British National Corpus
  • (a) 63.11
  • (b) 65.57

97
Probabilities Why? (2)
  • Why should we use
  • (a) Pr(w1?w2w2), rather than
  • (b) Pr(w2?w1w1)?
  • Maybe to introduce a bracketing prior.
  • Just like Lauer (1995) did.
  • But otherwise, no reason to prefer either one.
  • Do we need probabilities? (association is OK)
  • Do we need a directed model? (symmetry is OK)

98
Adjacency Dependency (2)
  • right bracketing w1w2w3
  • w2w3 is a compound (modified by w1)
  • w1 and w2 independently modify w3
  • adjacency model
  • Is w2w3 a compound?
  • (vs. w1w2 being a compound)
  • dependency model
  • Does w1 modify w3?
  • (vs. w1 modifying w2)

w1 w2 w3
w1 w2 w3
w1 w2 w3
99
Paraphrases pattern (1)
  • v n1 p n2 ? v n2 n1 (noun)
  • Can we turn n1 p n2 into a noun compound n2
    n1?
  • meet/v demands/n1 from/p customers/n2 ?
  • meet/v the customer/n2 demands/n1
  • Problem ditransitive verbs like give
  • gave/v an apple/n1 to/p him/n2 ?
  • gave/v him/n2 an apple/n1
  • Solution
  • no determiner before n1
  • determiner before n2 is required
  • the preposition cannot be to

100
Paraphrases pattern (2)
  • v n1 p n2 ? v p n2 n1 (verb)
  • If p n2 is an indirect object of v, then it
    could be switched with the direct object n1.
  • had/v a program/n1 in/p place/n2 ?
  • had/v in/p place/n2 a program/n1

Determiner before n1 is required to prevent n2
n1 from forming a noun compound.
101
Paraphrases pattern (3)
  • v n1 p n2 ? p n2 v n1 (verb)
  • indicates a wildcard position (up to three
    intervening words are allowed)
  • Looks for appositions, where the PP has moved in
    front of the verb, e.g.
  • I gave/v an apple/n1 to/p him/n2 ?
  • to/p him/n2 I gave/v an apple/n1

102
Paraphrases pattern (4)
  • v n1 p n2 ? n1 p n2 v (noun)
  • Looks for appositions, where n1 p n2 has moved
    in front of v
  • shaken/v confidence/n1 in/p markets/n2 ?
  • confidence/n1 in/p markets/n2 shaken/v

103
Paraphrases pattern (5)
  • v n1 p n2 ? v PRONOUN p n2 (verb)
  • n1 is a pronoun ? verb (HindleRooth, 93)
  • Pattern (5) substitutes n1 with a dative pronoun
    (him or her), e.g.
  • put/v a client/n1 at/p odds/n2 ?
  • put/v him at/p odds/n2

pronoun
104
Paraphrases pattern (6)
  • v n1 p n2 ? BE n1 p n2 (noun)
  • BE is typically used with a noun attachment
  • Pattern (6) substitutes v with a form of to be
    (is or are), e.g.
  • eat/v spaghetti/n1 with/p sauce/n2 ?
  • is spaghetti/n1 with/p sauce/n2

to be
105
Related Work
  • (Resnik, 99) similarity of form and meaning,
    conceptual association, decision tree, P80,
    R100
  • (Rus al., 02) deterministic, rule-based
    bracketing in context, P87.42, R71.05
  • (Chantree al., 05) distributional similarities
    from BNC, Sketch Engine (freqs., object/modifier
    etc.), P80.3, R53.8

106
N-gram models
  • (n1,c,n2,h)
  • (i) (n1,h) vs. (n2,h)
  • (ii) (n1,h) vs. (n1,c,n2)

107
Surface Features
sum
compare
sum
108
Paraphrases
  • n1 c n2 h
  • n2 c n1 h (ellipsis)
  • n2 h c n1 (NO ellipsis)
  • n1 h c n2 h (ellipsis)
  • n2 h c n1 h (ellipsis)
Write a Comment
User Comments (0)
About PowerShow.com