Deep Text Understanding with WordNet - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Deep Text Understanding with WordNet

Description:

... retrieval, text mining, question answering, machine translation, AI/reasoning, ... Machine translation. Natural Language Processing ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 72
Provided by: tri5322
Category:

less

Transcript and Presenter's Notes

Title: Deep Text Understanding with WordNet


1
Deep Text Understanding with WordNet
  • Christiane Fellbaum
  • Princeton University and
  • Berlin-Brandenburg Academy of Sciences

2
WordNet
  • What is WordNet and why is it interesting/useful?
  • A bit of history
  • WordNet for natural language processing/word
    sense disambiguation

3
What is WordNet?
  • A large lexical database, or electronic
    dictionary, developed and maintained at
    Princeton University
  • http//wordnet.princeton.edu
  • Includes most English nouns, verbs, adjectives,
    adverbs
  • Electronic format makes it amenable to automatic
    manipulation
  • Used in many Natural Language Processing
    applications (information retrieval, text mining,
    question answering, machine translation,
    AI/reasoning,...)?
  • Wordnets are built for many languages (including
    Danish!)?

4
Whats special about WordNet?
  • Traditional paper dictionaries are organized
    alphabetically words that are found together (on
    the same page) are not related by meaning
  • WordNet is organized by meaning words in close
    proximity are semantically similar
  • Human users and computers can browse WordNet and
    find words that are meaningfully related to their
    queries (somewhat like in a hyperdimensional
    thesaurus)?
  • Meaning similiarity can be measured and
    quantified to support Natural Language
    Understanding

5
A bit of history
  • Research in Artificial Intelligence (AI)
  • How do humans store and access knowledge about
    concept?
  • Hypothesis concepts are interconnected via
    meaningful relations
  • Knowledge about concepts is huge--must be stored
    in an efficient and economic fashion

6
A bit of history
  • Knowledge about concepts is computed on the fly
    via access to general concepts
  • E.g., we know that canaries fly because
  • birds fly and canaries are a kind of bird

7
A simple picture
  • animal (animate, breathes, has
    heart,...)?
  • bird (has feathers, flies,..)?
  • canary (yellow, sings nicely,..)?

8
  • Knowledge is stored at the highest possible node
    and inherited by lower (more specific) concepts
    rather than being multiply stored
  • Collins Quillian (1969) measured reaction times
    to statements involving knowledge distributed
    across different levels

9
  • Do birds fly?
  • --short RT
  • Do canaries fly?
  • --longer RT
  • Do canaries have a heart?
  • --even longer RT

10
  • Collins Quillians results are subject to
    criticism (reaction time to statements like do
    canaries move? are influenced by
    prototypicality, word frequency, uneven semantic
    distance across levels)?
  • But other evidence from psychological experiments
    confirms that humans organize knowledge about
    words and concept by means of meaningful
    relations
  • Access to one concepts activates related concepts
    in an outward spreading (radial) fashion

11
A bit of history
  • But the idea inspired WordNet (1986), which
    asked
  • Can most/all of the lexicon be represented as a
    semantic network where words are interlinked by
    meaning?
  • If so, the result would be a semantic network (a
    graph)?

12
WordNet
  • If the (English) lexicon can be represented as a
    semantic network, which are the relations that
    connect the nodes?

13
Whence the relations?
  • Inspection of association norms
  • stimulus hand reponse finger, arm
  • Classical ontology (Aristotle) IS-A
    (maple-tree),
  • HAS-A (maple-leaves)?
  • Co-occurrence patterns in texts (meaningfully
    related words are used together)?

14
RelationsSynonymy
  • One concept is expressed by several different
    word forms
  • beat, hit, strike
  • car, motorcar, auto, automobile
  • big, large
  • Synonymy onemany mapping of meaning and form

15
Synonymy in WordNet
  • WordNet groups (roughly) synonymous,
    denotationally equivalent, words into unordered
    sets of synonyms (synsets)?
  • hit, beat, strike
  • big, large
  • queue, line
  • Each synset expresses a distinct meaning/concept

16
Polysemy
  • One word form expresses multiple meanings
  • Polysemy onemany mapping of form and meaning
  • table, tabular_array
  • table, piece_of_furniture
  • table, mesa
  • table, postpone
  • Note the most frequent word forms are the most
    polysemous!

17
Polysemy in WordNet
  • A word form that appears in n synsets
  • is n-fold polysemous
  • table, tabular_array
  • table, piece_of_furniture
  • table, mesa
  • table, postpone
  • table is fourfold polysemous/has four senses

18
Some WordNet stats
19
The Net part of WordNet
  • Synsets arethe building block of the network
  • Synsets are interconnected via relations
  • Bi-directional arcs express semantic relations
  • Result large semantic network (graph)?

20
Hypo-/hypernymy relates noun synsets
  • Relates more/less general concepts
  • Creates hierarchies, or trees
  • vehicle
  • / \
  • car, automobile bicycle, bike
  • / \ \
  • convertible SUV mountain bike
  • A car is is a kind of vehicle ltgtThe class of
    vehicles includes cars, bikes
  • Hierarchies can have up to 16 levels

21
Hyponymy
  • Transitivity
  • A car is a kind of vehicle
  • An SUV is a kind of car
  • gt An SUV is a kind of vehicle

22
Meronymy/holonymy(part-whole relation)?
  • car, automobile
  • engine
  • / \
  • spark plug cylinder
  • An engine has spark plugs
  • Spark plus and cylinders are parts of an engine

23
Meronymy/Holonymy
  • Inheritance
  • A finger is part of a hand
  • A hand is part of an arm
  • An arm is part of a body
  • gta finger is part of a body

24
Structure of WordNet (Nouns)?
25
WordNet Data Model
Vocabulary of a language
Concepts
Relations
  • rec 12345
  • financial institute

1
bank
rec 54321 - side of a river
2
rec 9876 - small string instrument
1
fiddle
violin
type-of
rec 65438 - musician playing violin
2
fiddler
violist
rec42654 - musician
type-of
rec35576 - string of instrument
1
part-of
string
rec29551 - subatomic particle
2
rec25876 - string instrument
26
(No Transcript)
27
WordNet for Natural Language Processing
  • Challenge
  • get a computer to understand language
  • Information retrieval
  • Text mining
  • Document sorting
  • Machine translation

28
Natural Language Processing
  • Stemming, parsing currently at gt90 accuracy
    level
  • Word sense discrimination (lexical
    disambiguation) still a major hurdle for
    successful NLP
  • Which sense is intended by the writer (relative
    to a dictionary)?
  • Best systems 60 precision, 60 recall (but
    human inter-annotator agreement isnt perfect,
    either!)?

29
  • Understanding text beyond the word level
  • (joint work with Peter Clark and Jerry Hobbs)?

30
Knowledge in text
  • Human language users routinely derive knowledge
    from text that is NOT expressed on the surface
  • Perhaps more knowledge is unexpressed than
    overtly expressed on the surface
  • Grasser (1981) estimates
  • explicitimplicit info 18

31
An example
  • Text A soldier was killed in a gun battle
  • Inferences
  • Soldiers were fighting one another
  • The soldiers had guns with live ammunition
  • Multiple shots were fired
  • One soldier shot another soldier
  • The shot soldier died as a result of the injuries
    caused by the shot
  • The time interval between the fatal shot and the
    death was short

32
  • Humans use world knowledge to supplement word
    knowledge
  • (How) can such knowledge be encoded and harnessed
    by automatic systems?
  • Previous attempts (e.g., Cycs microtheories)
  • --too few theories
  • --uneven coverage of world knowledge

33
Recognizing Textual Entailment
  • Task
  • Evaluate truth of hypothesis H given a text T
  • (T) A soldier was killed in a gun battle
  • (H) A soldier died
  • Answer may be yes/no/probably/...

34
RTE
  • Many automatic system attempt RTE via lexical,
    syntactic matching algorithms (do the same words
    occur in T, H? do T, H have the same
    subject/object?)?
  • Not deep language understanding

35
Our RTE test suite
  • 250 Text-Hypothesis pairs
  • for 50 of them, H is entailed by T
  • for the remaining 50, H is not (necessarily)
    entailed
  • Focus on semantic interpretation

36
RTE test suite
  • Core of T statements came from newspaper texts
  • H statements were hand-coded
  • focus on general world knowledge

37
RTE test suite
  • Manually analyzed pairs
  • Distinguished, classified 19 types of knowledge
    among the T-H pairs
  • some partial overlap

38
Exx Types of knowledge(increasing order of
difficulty)?
  • Lexical relation among irregular forms of a
    single lemma, Named Entities vs. proper nouns
  • Lexical-semantic (paradigmatic) synonyms,
    hypernyms, meronyms, antonyms, metonymy,
    derivations
  • Syntagmatic selectional preferences, telic roles
  • Propositional cause-effect, preconditions
  • World knowledge/core theories (e.g., ambush
    entails concealment)?

39
Overall approach (bag of tricks)?
  • Initial text interpretation with language
    processing tools (Peter Clark et al.)?
  • Compute subsumption among text fragments
  • WordNet augmentations

40
Text interpretation
  • First step parsing (assign a structure to a
    sentence or phrase)?
  • SAPIR parser (Harrison Maxwell 1986)?
  • SAPIR also produces a Logical Form (LF)?

41
LFs
  • LF structures are trees generated by rules
    parallel to grammar rules
  • contain logic elements
  • nouns, verbs, adjs, prepositions represented as
    variables
  • LFs are parsed and have part-of-speech tags
  • LFs generate ground logical assertions

42
Example
  • LF for "A soldier was killed in a gun battle."
  • (DECL
  • ((VAR X1 "a" "soldier")
  • (VAR X2 "a" "battle" (NN "gun"
    "battle")))
  • (S (PAST) NIL "kill" ?X1 (PP "in" ?X2)))?

43
Logical assertions
  • logic for "A soldier was killed in a gun
    battle."
  • object(kill,soldier) in(kill,battle)
    modifier(battle,gun)?

44
  • Result T, H in Logical Form

45
Matching sentences/fragments with subsumption
  • A basic reasoning operation
  • A person loves a person
  • subsumes
  • A man loves a woman
  • Set1 of clauses subsumes another Set2 of clauses
    if each clause in S1 subsumes some member of S2.
  • Similary, a set of clauses subsumes another set
    of clauses if the arguments of the first subsume
    or match the arguments of the second
  • Argument (word) subsumption as in WordNet (X is a
    Y)?
  • Matching synonyms

46
Syntactic matching of predicates
  • --both are the same
  • --one is predicate of or modifier (my friends
    car, the car of my friend)?
  • --predicates subject and by match (passives)?

47
Lexical (word) matching
  • Words related by derivational morphology
    (destroy, destruction) are considered matches in
    conjunction with syntactic matches

48
  • Recognize as equivalent
  • the bomb destroyed the shrine
  • the destruction of the shrine by the bomb
  • But not
  • the destruction of the bomb by the shrine
  • a person attacks with a bomb
  • there is a bomb attack by a person

49
Benefits for text understanding/RTE
  • (T) Moore is a prolific writer
  • (H) Moore writes many books
  • Moore is the Agent of write

50
  • Exploiting word and world knowledge encoded in
    WordNet

51
Use of WordNet glosses
  • Glosses definition of concept expressed by
    synset members
  • airplane, plane (an aircraft that has fixed
    wings and is powered by propellers or jets)
  • syntagmatic information, world knowledge

52
Translating glossed into First Order Logic Axioms
  • bridge, span (any structure that allows people
    or vehicles to cross an obstacle such as a river
    or canal...)
  • bridgeN1(x,y)?
  • lt--gt structureN1(x) allowV1(x,e1)
    crossV1(e1,z,y)
  • obstacleN2(y) person/vehicle(z)?
  • personN1(z) --gt person/vehicle(z)?
  • vehicleN1(z) --gt vehicle/person(z)?
  • riverN2(y) --gt obstacleN2(y)?
  • canalN3(y) --gt obstacleN2(y)

53
  • The nouns, verbs, adjectives, adverbs in the LF
    glosses were manually disambiguated
  • Thus, each variable in the LFs was identified not
    just with a word form, but a form-meaning pair
    (sense) in WordNet
  • LFs were generated for 110K glosses
  • Particular emphasis on CoreWordNet

54
How well do our tricks perform?
55
An example that works
  • Exploiting formally related words in WN
  • (T) go through licensing procedures
  • (H) go through licensing processes
  • Exploiting hyponymy (IS-A relation)
  • (T) Beverley served at WEDCOR
  • (H) Beverley worked at WEDCOR

56
More complex example that works
  • (T) Britain puts curbs on immigrant labor from
    Bulgaria
  • (H) Britain restricted workers from Bulgaria

57
Knowledge from WordNet
  • Synset with gloss restrict, restrain,
    place_limits_on, (place restrictions on)
  • Synonymy put, place, curb, limit
  • Morphosemantic link labor - laborer
  • Hyponymy laborer ISA worker

58
Example that doesnt work
  • (T) The Philharmonic orchestra draws large crowds
  • (H) Large crowds were drawn to listen to the
    orchestra
  • WordNet tells us that
  • orchestra collection of musicians
  • musician someone who plays musical instrument
  • music sound produced by musical instruments
  • listen hear perceive sound
  • But WN doesnt tell us that playing results in
    sound production and that there is a listener

59
Examples that dont work
  • The most fundamental knowledge that humans take
    for granted trips up automatic systems
  • Such knowledge is not explicitly taught to
    children
  • But it must be taught to machines!

60
Core theories (Jerry Hobbs)
  • Attempt to encode fundamental knowledge
  • Space, time, causality,...
  • Essential for reasoning
  • Not encoded in WordNet glosses

61
Core theories
  • Manually encoded
  • Axiomatized

62
Core theories
  • Composite entities (things made of other things,
    stuff)?
  • Scalar notions (time, space,...)?
  • Change of state
  • Causality

63
Core theories
  • Example of predications
  • change(e1,e2)?
  • changeFrom(e1)?
  • changeTo(e2)?

64
Core theories and WordNet
  • map core theories to Core WN synsets
  • encode meanings of synsets denoting events, event
    structure in terms of core theory predications

65
Examples
  • let(x,e) lt--gt not(cause(x,not(e)))?
  • go, become, get (he went wild)
  • go(x,e) lt--gt changeTo(e)?
  • free(x,y) lt--gt cause(x,changeTo(free(y)))?
  • (All words are linked to WN senses)?

66
Example
  • The captors freed the hostages
  • The hostages were free
  • free let(x, go(y, free(y)))?
  • lt--gt not(cause(x, not(changeTo(free(y)))?
  • lt--gt cause(x, changeTo(free(y)))?
  • lt--gt free(x,y)

67
Preliminary evaluation
  • (What) does each component contribute to RTE?
  • For the 250 Text-Hypothesis pairs in our test
    suite

68
(No Transcript)
69
Conclusion
  • Way to go!
  • Deliberately exclude statistical similarity
    measures (this hurts our results)
  • Symbolic approach aim at deep level understanding

70
WordNet for Deeper Text Understanding
  • Axioms in Logical Form are useful for many other
    NL Understanding applications
  • E.g., automated question answering translate Qs
    and As into logic representation
  • Logic representations enable reasoning (axioms
    can be fed into a reasoner/logic prover)?

71
  • Thanks for your attention
Write a Comment
User Comments (0)
About PowerShow.com