Title: The Chapters on Lexical Semantics from Jurafsky and Martin 2nd Ed
1The Chapters on Lexical Semanticsfrom Jurafsky
and Martin (2nd Ed)
- Chu-Ren Huang
- CLCLP Pro-seminar, 11 December 2007
2Notes on soureces
- PPT on the chapter is based on Professor Kathleen
Ahrenss previous presentation - Supplementary Materials (ppt no. 8-28, and 45-47)
are taken from Huangs previous work.
3Word meaning
- Unanalyzed symbols CAT
- Lexical semantics linguistic study of word
meaning - Question what is a word?
4Word
- Lexeme pairing of form and meaning
- Form can be orthographic or phonological
- Lexicon finite set of lexemes
- A lexeme is represented in a dictionary by a
lemma (citation form)
5Examples
- Lexicon
- Lexeme
- Lemma/citation form
- Wordform
6Lemmatization
- Mapping from a wordform to a lemma
- Not deterministic
- Based on context
- i.e. found
- Part of speech specific table has two lemmas,
one a noun and one a verb - Lemmas may be longer than a word, i.e. catch up
- Focus on lemma in this chapter, not on wordform
7When do we want a different dictionary entry?
- For wordforms?
- Or for lemmas?
- Why?
- Note Lemma is also used to mean separate
senses instead of the citation form of the word
both uses may be found in the literature.
8The LexiconBasic Concepts
- The Lexicon
- Dictionary
- MRD Machine Readable Dictionary
- Electronic Dictionary
- The Mental Lexicon
9What is in a Lexicon Which parts belong to
lexical semantics?
- Entries-lemmas
- Phonological representation
- POS / Grammatical Category
- Lemmatization rules
- Argument structure/Sub-Categorization Frames
- Selectional Restrictions
- Semantic Representation
- Semantic Relations
- Corpus Collocation? Frequency? Examples?
10Issues
- Which defines a lexical entry?
- Lemmatised form
- Traditional orthography (e.g. Chinese characters)
- Modern orthography (e.g. Pinyin romanization)
- POS
- Form-to-Sense mapping
- Is form dependent on sense or sense dependent on
form? - Does lemmatisation guidelines dictate inherent
sortal criteria or vice versa?
11Basic Notion Lexical Entry / Lemma
- Premise The set of lemmas in a language is the
result of optimal lemmatization. - Hypothesis Lemmas are conceptual atoms which are
morpho-syntactically autonomous.
12Words, Stems, Affixes, and Clitics can all be
lemmas
- English
- be, student, red,
- -s, -hood, -ness, -er, -ity etc.
- Chinese
- shi4?, lao3shi1??, wan2mei3??,
- -zhe3?, -li4?
13Lexical Idiosyncrasies Occur at the lemma level
- Head-dependent languages show morph-phonemic
idiosyncrasies at word level - Eg. was, children, sheep, fought,
- Co-dependent language show morpho-phonemic
idiosyncrasies at affix (or stem) level
14Implications
- Defining Lemma How lemmas defined must be
clearly stated for each language. - Languages may have wordaffix or stemaffix as
their lexical lemmas - The Category of Affixes
15Basic NotionOrthography
- Alphabetical Order is Orthographic Order
- True or False?
- Orthographic convention also conventionalize
lexical structure - What if one language conventionalizes and/or
tolerates more than one set of Orthography?
16Orthographic Conventions are Code-Switches
- Japanese
- -Kanji
- -Katakana
- -Hiragana
17Code-Switches Plus Code-Mixers
- Chinese
- -Simultaneous loan of word and orthography
- IBM, ADSL,
- -Loan word adopting loan orthography form a
different source - LKK To be old and senile, from Taiwanese
lau-ko-ko
18Code-Switches Plus Code-Mixers
- Chinese
- -Code-mixed within an entry
- ?Q, a typical Chinese who is cynical and
fatalistic, from a famous novel - K? to hit the book
- C?? C-cup (as in a bra)
- A? to ill-gain money, gained but not earned
- ?Sir (Hong Kong Cantonese) a police officer
19The Challenge
- Information will be lost if one orthography is
lost - Cannot be represented otherwise
- Orthography encodes significant linguistic
information. - For instance, all the code-mixed words are
pronounced according to its orthography
20Suggested Solution
- At Lexicon Level
- Orthographic conventions in the language must be
described, - Lexicon structure conventionalized by the
orthography (alphabetical order with alphabet
sets identified, radical classification etc.) - This should be done regardless of the
representation adopted in the lexicon (e.g.
Pinyin Romanization for Chinese). - how word/lemma boundaries are marked by the
orthography
21Suggested Solution II
- At Entry Level
- Orthographic convention will be marked on each
entry, including the possibility of code-mixed
orthography - Unmarked default would be the dominant
orthography stipulated for the whole lexicon.
22The Forms and Senses of ?
- (1) bei4 N. back (direction)
- (2) bei4 N. back (body part)
- (3) bei1 V. to carry (on the back)
- (4) bei1 V. (metaphor) to carry/to be
responsible for debts or obligations - (5) bei4 V. to memorize
- (6) bei4 Prep. to do something in sbs back, to
do st without letting sb know. - (7) bei4 ADJ. unlucky
- (8) bei4 ADJ. to be hard of hearing (coll. w/
ear)
23Irregular Orthographic Variations A challenge to
lemmatisation
- Variant forms ?(a) and ?(b) (for instance,
lt??,??gt elder sister and lt??,??gt sisters) - ?However, ??(a) Miss and??(b) youngest elder
sister , are not variants. - Variant forms ?(a) and?(b) for to sink with
the pronunciation chen2. - ?However, only?(a) is allowed for shen3
pronunciation that represents a family name. And
only?(b) is allowed for the sense of heavy,
loaded, also pronounced chen2. - Non-variants ?(a) and ?(b)
- ? They show the only variantion in the context of
lt??,??gt Jesus, and not in any other lexical
combination.
24Is Sense Language Dependent? Examples I.
- Phoenix
- Feng4huang2 ??
- A bird in Egyptian mythology that lived in the
desert for 500 years and then consumed itself by
fire, later to rise renewed from its ashes.
AHDEL - Feng4huang2 ??
- Phoenix
- A bird in Chinese mythology that always showed up
in a pair the male feng4 and the female huang2.
They symbolized love and marital bliss.
25Is Sense Language Dependent? Examples II
- bo2bo5 ??
- uncle
- An elder brother ones father
- shu2shu5 ??
- uncle
- A younger brother of ones father
- jiu4jiu5 ??
- uncle
- A brother of ones mother
- uncle
- bo2bo5 or shu2shu5, or jiu4jiu5
26Suggested Solution
- Directionality must be marked in multilingual
lexicon - -Adopt OLACMS
- Subject.language the language being described
- Language the language used in description
- In an English-to-Chinese lexicon, English will
be the Subject.Lanuage, and Chinese will be the
Language.
27Is There Grammar in Lexicon
- PS Rule
- S ? NP VP
- VP ? V NP PP
- Predicate Argument Structure
- put v. _ NP PPon
- Lexical Rules and Meta-rules
- Passivization
- V ? Ved morphology
- _ NP X ? _ X PPby
28Lexicon as KnowledgeBaseOr Lexical Semantics as
the anchor of all linguistic knowledge
- A knowledge system perspective
- Not a symbolic perspective where lexicon contains
atoms to be inserted into PS rules - Lexical Knowledge contains the sum of all
linguistic knowledge - And links to world knowledge
- Unification/Lexicalist Approach to Grammar
29Word senses
- Word sense part of a lexeme (represented by a
lemma) that represents word meaning - Word sense discrete representation of one
aspect of a meaning of a word - Question what does aspect mean?
30Examples
- Homonyms unrelated senses, same orthographic
form
31Examples
- Polysemy related senses, same orthographic form
32Examples
- Metonymy subtype of polysemy relation
- One aspect of a concept refers to other aspects
of the entity or to the entity itself
33Examples
- Zeugma conjunction of antagonistic readings
- He bought a paper and enjoyed it.
- -no zeugma
- Does Midwest Express serve breakfast and
Philadelphia?
34Examples
- Homophones same pronunciation, different
spelling too/two/to
35Examples
- Homographs same orthography, different
pronunciations and different meanings (i.e. bass)
36Questions?
37Defining senses
- Sense relations (as used in WordNet)
- Semantic primitives create a small, finite set
of atomic units of meaning and create each sense
from these primitives (i.e. using semantic roles
to exemplify meanings of events)
38Synonyms
- Two senses are identical or nearly identical or
are substitutable for one another without
changing the meaning of the sentence (i.e. they
have the same propositional reading) - Some senses of some lemmas are synonymous (c.f.
big and large)
39Antonyms
- Define a binary opposition (on/off)
- At opposite ends of a scale (hot/cold)
- Reversives describe a change of movement or
movement in opposite directions (rise/fall) - Similar or different Antonyms share almost all
their aspects of meaning, except their direction
or position on a scale - What is your opinion?
40Hyponyms and hypernyms
- One sense is a hyponym of another sense if the
first sense is more specific, denoting a subclass
of the other (car/truck is a hyponym of vehicle) - Examples
- Hypernyms/Superordinates
- One sense is a hypernym of another sense if the
first sense is superordinate to the second sense
(vehicle is a hypernym or superordinate of car) - Examples
41Other terminology
- Ontology set of distinct objects resulting from
analysis of domain or microworld - Taxonomy particular arrange of the elements of
an ontology into a tree-like class inclusion
structure - Hypernymy-taxonymy-hyponymy
42Semantic fields
- Meronymy part-whole relation (leg is a part of
a chair) - Holonym whole-part relation (Car is a holonym
of wheel) - Binary relations between two sense
- Semantic field holistic relationship among
entire sets of words from a single domain
43Frame/model/script
- FrameNet a robus computational resource for
frame-based knowledge - Each word is defined with respect to the frame,
and shares aspects of meaning with other frame
words - Questions What are some possible frames? What
advantages do frames and semantic relations in
general have over semantic primitives? When would
one be preferred over the other?
44WordNet Database of Lexical Relations
- Gloss dictionary style definition
- Synset list of synonyms for each sense WN
primitive each lexical entry in this synset can
be used to express that particular concept
45Wordnet as Linguistic Ontology
- A wordnet like lexicon is the linguistic ontology
of a language. - It contains all concepts and conceptually links
which are linguistically defined in that language
46Wordnet as Linguistic Ontology II
- Sense are defined and differentiated
intra-lingually - Ontology vary from one language to the other
language, just like ontology vary form one domain
to the other domain - Senses are organized differently in different
languages - Translation equivalents are not necessarily
synonyms
47Wordnet as Linguistic Ontology III
- To maintain the integrity of each linguistic
ontology, sense (hence synset) must be defined
solely based on monolingual evidence - Cross-lingual meaning correspondences are marked
by lexical semantic relations - Huang et al. 2002 SemaNet workshop paper
Cross-lingual Inference of Lexical Semantic
Relations A First Step Towards Population of
Multilingual Wordnets
48Questions?
49Event Participants
- Semantic roles
- Thematic roles attempt to capture semantic
similarities between similar types of actors - Agent volitional causer
- Theme affected by action
50Roles
- Agent
- Experiencer
- Force
- Theme
- Result
- Content
- Instrument
- Beneficiary
- Source
- Goal
51Diathesis Alternations
- Shallow semantic language allows use to make
simple inferences that areneb possible from a
surface string of words - Agent Theme
- Agent Theme Instrument
- Instrument Theme
- Theme
- Theme Agent
52- Verb alternations/Diathesis alternations
multiple argument structure realizations of a
verb - Dative alternations
- Kick him the ball.
- Kick the ball to him.
53Questions?
54Problems with thematic roles
- Difficult to come up with standard set
- Difficult to produce formal definitions esp.
cross-linguistically
55Instruments
- Intermediary and enabling instruments
- The new gadget opened the jar. (intermediary)
- The fork ate the slice banana. (enabling)
56Agents
- Animate
- Volitional
- Sentient
- Causal
- Might not exhibit all properties
57Generalized semantic roles
- Proto-agent
- Proto-patient undergoing change of state,
causally affected by another participant,
stationary relative to other participants - scalar
58Verb specific semantic roles
- Frame Net
- Prop Bank
- MARVS
- PropBank resource of sentences annotated with
semantic roles - Arg0 is protoagent, Arg1 is proto-patient
- Allows commonalities across different sentences
with the same verb
59Frame Net
- Allows commonalities across different verbs and
between verbs and nouns - Frame is a script-like structure, which
instantiates a set of frame specific semantic
roles called frame elements - Change_a_postion_on_a_scale the frame consists
of words that indicate the change of an Items
position on a scale (Attribute) from a starting
point (Initial value) to and end point (Final
value)
60Core Roles
- ATTRIBUTE
- DIFFERENCE
- FINAL STATE
- FINAL VALLUE
- INITIAL STATE
- INITIAL VALUE
- ITEM
- VALUE RANGE
61Non-core Roles
62Frames
- Consist of verbs, nouns, adverbs
- Frames can inherit generalizations from each
other
63Selectional Restrictions
- A kind of semantic type constraint that a verb
imposes on the kind of concepts that are allowed
to fill its argument roles - I want to eat someplace close. (ambiguous
reading) - Theme of eating must be edible (selectional
restriction)
64Use event representations
- Problems
- overkill
- Presupposes large knowledgebase
- Alternate Use WordNet to specify a WN synset
rather than logical concepts - i.e. use synset food, nutrient for EAT
65Primitive Decomposition
- A.k.a. componential analysis
- Semantic features symbols which represent
primitive meaning - Conceptual Dependency ten primitive predicates
- Difficult to come up with primitives that can
represent all different kinds of meaning.
66Metaphor
- Refer to and reason about a concept or domain
using words from a completely different domain - Question how do we know if its from a different
domain?