The Chapters on Lexical Semantics from Jurafsky and Martin 2nd Ed PowerPoint PPT Presentation

presentation player overlay
1 / 66
About This Presentation
Transcript and Presenter's Notes

Title: The Chapters on Lexical Semantics from Jurafsky and Martin 2nd Ed


1
The Chapters on Lexical Semanticsfrom Jurafsky
and Martin (2nd Ed)
  • Chu-Ren Huang
  • CLCLP Pro-seminar, 11 December 2007

2
Notes on soureces
  • PPT on the chapter is based on Professor Kathleen
    Ahrenss previous presentation
  • Supplementary Materials (ppt no. 8-28, and 45-47)
    are taken from Huangs previous work.

3
Word meaning
  • Unanalyzed symbols CAT
  • Lexical semantics linguistic study of word
    meaning
  • Question what is a word?

4
Word
  • Lexeme pairing of form and meaning
  • Form can be orthographic or phonological
  • Lexicon finite set of lexemes
  • A lexeme is represented in a dictionary by a
    lemma (citation form)

5
Examples
  • Lexicon
  • Lexeme
  • Lemma/citation form
  • Wordform

6
Lemmatization
  • Mapping from a wordform to a lemma
  • Not deterministic
  • Based on context
  • i.e. found
  • Part of speech specific table has two lemmas,
    one a noun and one a verb
  • Lemmas may be longer than a word, i.e. catch up
  • Focus on lemma in this chapter, not on wordform

7
When do we want a different dictionary entry?
  • For wordforms?
  • Or for lemmas?
  • Why?
  • Note Lemma is also used to mean separate
    senses instead of the citation form of the word
    both uses may be found in the literature.

8
The LexiconBasic Concepts
  • The Lexicon
  • Dictionary
  • MRD Machine Readable Dictionary
  • Electronic Dictionary
  • The Mental Lexicon

9
What is in a Lexicon Which parts belong to
lexical semantics?
  • Entries-lemmas
  • Phonological representation
  • POS / Grammatical Category
  • Lemmatization rules
  • Argument structure/Sub-Categorization Frames
  • Selectional Restrictions
  • Semantic Representation
  • Semantic Relations
  • Corpus Collocation? Frequency? Examples?

10
Issues
  • Which defines a lexical entry?
  • Lemmatised form
  • Traditional orthography (e.g. Chinese characters)
  • Modern orthography (e.g. Pinyin romanization)
  • POS
  • Form-to-Sense mapping
  • Is form dependent on sense or sense dependent on
    form?
  • Does lemmatisation guidelines dictate inherent
    sortal criteria or vice versa?

11
Basic Notion Lexical Entry / Lemma
  • Premise The set of lemmas in a language is the
    result of optimal lemmatization.
  • Hypothesis Lemmas are conceptual atoms which are
    morpho-syntactically autonomous.

12
Words, Stems, Affixes, and Clitics can all be
lemmas
  • English
  • be, student, red,
  • -s, -hood, -ness, -er, -ity etc.
  • Chinese
  • shi4?, lao3shi1??, wan2mei3??,
  • -zhe3?, -li4?

13
Lexical Idiosyncrasies Occur at the lemma level
  • Head-dependent languages show morph-phonemic
    idiosyncrasies at word level
  • Eg. was, children, sheep, fought,
  • Co-dependent language show morpho-phonemic
    idiosyncrasies at affix (or stem) level

14
Implications
  • Defining Lemma How lemmas defined must be
    clearly stated for each language.
  • Languages may have wordaffix or stemaffix as
    their lexical lemmas
  • The Category of Affixes

15
Basic NotionOrthography
  • Alphabetical Order is Orthographic Order
  • True or False?
  • Orthographic convention also conventionalize
    lexical structure
  • What if one language conventionalizes and/or
    tolerates more than one set of Orthography?

16
Orthographic Conventions are Code-Switches
  • Japanese
  • -Kanji
  • -Katakana
  • -Hiragana

17
Code-Switches Plus Code-Mixers
  • Chinese
  • -Simultaneous loan of word and orthography
  • IBM, ADSL,
  • -Loan word adopting loan orthography form a
    different source
  • LKK To be old and senile, from Taiwanese
    lau-ko-ko

18
Code-Switches Plus Code-Mixers
  • Chinese
  • -Code-mixed within an entry
  • ?Q, a typical Chinese who is cynical and
    fatalistic, from a famous novel
  • K? to hit the book
  • C?? C-cup (as in a bra)
  • A? to ill-gain money, gained but not earned
  • ?Sir (Hong Kong Cantonese) a police officer

19
The Challenge
  • Information will be lost if one orthography is
    lost
  • Cannot be represented otherwise
  • Orthography encodes significant linguistic
    information.
  • For instance, all the code-mixed words are
    pronounced according to its orthography

20
Suggested Solution
  • At Lexicon Level
  • Orthographic conventions in the language must be
    described,
  • Lexicon structure conventionalized by the
    orthography (alphabetical order with alphabet
    sets identified, radical classification etc.)
  • This should be done regardless of the
    representation adopted in the lexicon (e.g.
    Pinyin Romanization for Chinese).
  • how word/lemma boundaries are marked by the
    orthography

21
Suggested Solution II
  • At Entry Level
  • Orthographic convention will be marked on each
    entry, including the possibility of code-mixed
    orthography
  • Unmarked default would be the dominant
    orthography stipulated for the whole lexicon.

22
The Forms and Senses of ?
  • (1) bei4 N. back (direction)
  • (2) bei4 N. back (body part)
  • (3) bei1 V. to carry (on the back)
  • (4) bei1 V. (metaphor) to carry/to be
    responsible for debts or obligations
  • (5) bei4 V. to memorize
  • (6) bei4 Prep. to do something in sbs back, to
    do st without letting sb know.
  • (7) bei4 ADJ. unlucky
  • (8) bei4 ADJ. to be hard of hearing (coll. w/
    ear)

23
Irregular Orthographic Variations A challenge to
lemmatisation
  • Variant forms ?(a) and ?(b) (for instance,
    lt??,??gt elder sister and lt??,??gt sisters)
  • ?However, ??(a) Miss and??(b) youngest elder
    sister , are not variants.
  • Variant forms ?(a) and?(b) for to sink with
    the pronunciation chen2.
  • ?However, only?(a) is allowed for shen3
    pronunciation that represents a family name. And
    only?(b) is allowed for the sense of heavy,
    loaded, also pronounced chen2.
  • Non-variants ?(a) and ?(b)
  • ? They show the only variantion in the context of
    lt??,??gt Jesus, and not in any other lexical
    combination.

24
Is Sense Language Dependent? Examples I.
  • Phoenix
  • Feng4huang2 ??
  • A bird in Egyptian mythology that lived in the
    desert for 500 years and then consumed itself by
    fire, later to rise renewed from its ashes.
    AHDEL
  • Feng4huang2 ??
  • Phoenix
  • A bird in Chinese mythology that always showed up
    in a pair the male feng4 and the female huang2.
    They symbolized love and marital bliss.

25
Is Sense Language Dependent? Examples II
  • bo2bo5 ??
  • uncle
  • An elder brother ones father
  • shu2shu5 ??
  • uncle
  • A younger brother of ones father
  • jiu4jiu5 ??
  • uncle
  • A brother of ones mother
  • uncle
  • bo2bo5 or shu2shu5, or jiu4jiu5

26
Suggested Solution
  • Directionality must be marked in multilingual
    lexicon
  • -Adopt OLACMS
  • Subject.language the language being described
  • Language the language used in description
  • In an English-to-Chinese lexicon, English will
    be the Subject.Lanuage, and Chinese will be the
    Language.

27
Is There Grammar in Lexicon
  • PS Rule
  • S ? NP VP
  • VP ? V NP PP
  • Predicate Argument Structure
  • put v. _ NP PPon
  • Lexical Rules and Meta-rules
  • Passivization
  • V ? Ved morphology
  • _ NP X ? _ X PPby

28
Lexicon as KnowledgeBaseOr Lexical Semantics as
the anchor of all linguistic knowledge
  • A knowledge system perspective
  • Not a symbolic perspective where lexicon contains
    atoms to be inserted into PS rules
  • Lexical Knowledge contains the sum of all
    linguistic knowledge
  • And links to world knowledge
  • Unification/Lexicalist Approach to Grammar

29
Word senses
  • Word sense part of a lexeme (represented by a
    lemma) that represents word meaning
  • Word sense discrete representation of one
    aspect of a meaning of a word
  • Question what does aspect mean?

30
Examples
  • Homonyms unrelated senses, same orthographic
    form

31
Examples
  • Polysemy related senses, same orthographic form

32
Examples
  • Metonymy subtype of polysemy relation
  • One aspect of a concept refers to other aspects
    of the entity or to the entity itself

33
Examples
  • Zeugma conjunction of antagonistic readings
  • He bought a paper and enjoyed it.
  • -no zeugma
  • Does Midwest Express serve breakfast and
    Philadelphia?

34
Examples
  • Homophones same pronunciation, different
    spelling too/two/to

35
Examples
  • Homographs same orthography, different
    pronunciations and different meanings (i.e. bass)

36
Questions?
37
Defining senses
  • Sense relations (as used in WordNet)
  • Semantic primitives create a small, finite set
    of atomic units of meaning and create each sense
    from these primitives (i.e. using semantic roles
    to exemplify meanings of events)

38
Synonyms
  • Two senses are identical or nearly identical or
    are substitutable for one another without
    changing the meaning of the sentence (i.e. they
    have the same propositional reading)
  • Some senses of some lemmas are synonymous (c.f.
    big and large)

39
Antonyms
  • Define a binary opposition (on/off)
  • At opposite ends of a scale (hot/cold)
  • Reversives describe a change of movement or
    movement in opposite directions (rise/fall)
  • Similar or different Antonyms share almost all
    their aspects of meaning, except their direction
    or position on a scale
  • What is your opinion?

40
Hyponyms and hypernyms
  • One sense is a hyponym of another sense if the
    first sense is more specific, denoting a subclass
    of the other (car/truck is a hyponym of vehicle)
  • Examples
  • Hypernyms/Superordinates
  • One sense is a hypernym of another sense if the
    first sense is superordinate to the second sense
    (vehicle is a hypernym or superordinate of car)
  • Examples

41
Other terminology
  • Ontology set of distinct objects resulting from
    analysis of domain or microworld
  • Taxonomy particular arrange of the elements of
    an ontology into a tree-like class inclusion
    structure
  • Hypernymy-taxonymy-hyponymy

42
Semantic fields
  • Meronymy part-whole relation (leg is a part of
    a chair)
  • Holonym whole-part relation (Car is a holonym
    of wheel)
  • Binary relations between two sense
  • Semantic field holistic relationship among
    entire sets of words from a single domain

43
Frame/model/script
  • FrameNet a robus computational resource for
    frame-based knowledge
  • Each word is defined with respect to the frame,
    and shares aspects of meaning with other frame
    words
  • Questions What are some possible frames? What
    advantages do frames and semantic relations in
    general have over semantic primitives? When would
    one be preferred over the other?

44
WordNet Database of Lexical Relations
  • Gloss dictionary style definition
  • Synset list of synonyms for each sense WN
    primitive each lexical entry in this synset can
    be used to express that particular concept

45
Wordnet as Linguistic Ontology
  • A wordnet like lexicon is the linguistic ontology
    of a language.
  • It contains all concepts and conceptually links
    which are linguistically defined in that language

46
Wordnet as Linguistic Ontology II
  • Sense are defined and differentiated
    intra-lingually
  • Ontology vary from one language to the other
    language, just like ontology vary form one domain
    to the other domain
  • Senses are organized differently in different
    languages
  • Translation equivalents are not necessarily
    synonyms

47
Wordnet as Linguistic Ontology III
  • To maintain the integrity of each linguistic
    ontology, sense (hence synset) must be defined
    solely based on monolingual evidence
  • Cross-lingual meaning correspondences are marked
    by lexical semantic relations
  • Huang et al. 2002 SemaNet workshop paper
    Cross-lingual Inference of Lexical Semantic
    Relations A First Step Towards Population of
    Multilingual Wordnets

48
Questions?
49
Event Participants
  • Semantic roles
  • Thematic roles attempt to capture semantic
    similarities between similar types of actors
  • Agent volitional causer
  • Theme affected by action

50
Roles
  • Agent
  • Experiencer
  • Force
  • Theme
  • Result
  • Content
  • Instrument
  • Beneficiary
  • Source
  • Goal

51
Diathesis Alternations
  • Shallow semantic language allows use to make
    simple inferences that areneb possible from a
    surface string of words
  • Agent Theme
  • Agent Theme Instrument
  • Instrument Theme
  • Theme
  • Theme Agent

52
  • Verb alternations/Diathesis alternations
    multiple argument structure realizations of a
    verb
  • Dative alternations
  • Kick him the ball.
  • Kick the ball to him.

53
Questions?
54
Problems with thematic roles
  • Difficult to come up with standard set
  • Difficult to produce formal definitions esp.
    cross-linguistically

55
Instruments
  • Intermediary and enabling instruments
  • The new gadget opened the jar. (intermediary)
  • The fork ate the slice banana. (enabling)

56
Agents
  • Animate
  • Volitional
  • Sentient
  • Causal
  • Might not exhibit all properties

57
Generalized semantic roles
  • Proto-agent
  • Proto-patient undergoing change of state,
    causally affected by another participant,
    stationary relative to other participants
  • scalar

58
Verb specific semantic roles
  • Frame Net
  • Prop Bank
  • MARVS
  • PropBank resource of sentences annotated with
    semantic roles
  • Arg0 is protoagent, Arg1 is proto-patient
  • Allows commonalities across different sentences
    with the same verb

59
Frame Net
  • Allows commonalities across different verbs and
    between verbs and nouns
  • Frame is a script-like structure, which
    instantiates a set of frame specific semantic
    roles called frame elements
  • Change_a_postion_on_a_scale the frame consists
    of words that indicate the change of an Items
    position on a scale (Attribute) from a starting
    point (Initial value) to and end point (Final
    value)

60
Core Roles
  • ATTRIBUTE
  • DIFFERENCE
  • FINAL STATE
  • FINAL VALLUE
  • INITIAL STATE
  • INITIAL VALUE
  • ITEM
  • VALUE RANGE

61
Non-core Roles
  • DURATION
  • SPEED
  • GROUP

62
Frames
  • Consist of verbs, nouns, adverbs
  • Frames can inherit generalizations from each
    other

63
Selectional Restrictions
  • A kind of semantic type constraint that a verb
    imposes on the kind of concepts that are allowed
    to fill its argument roles
  • I want to eat someplace close. (ambiguous
    reading)
  • Theme of eating must be edible (selectional
    restriction)

64
Use event representations
  • Problems
  • overkill
  • Presupposes large knowledgebase
  • Alternate Use WordNet to specify a WN synset
    rather than logical concepts
  • i.e. use synset food, nutrient for EAT

65
Primitive Decomposition
  • A.k.a. componential analysis
  • Semantic features symbols which represent
    primitive meaning
  • Conceptual Dependency ten primitive predicates
  • Difficult to come up with primitives that can
    represent all different kinds of meaning.

66
Metaphor
  • Refer to and reason about a concept or domain
    using words from a completely different domain
  • Question how do we know if its from a different
    domain?
Write a Comment
User Comments (0)
About PowerShow.com