Title: Lexicalization across Languages
1Lexicalization across Languages
- Robert Belvin - Fall 2005
2Surprise from Spanish
- The men walked across the street
- Los hombres cruzaron la calle caminando
- Los hombres caminaron a través de la calle
- (Sp. meaning is used contrastively to focus on
walk) - We ran down the stairs
- Bajamos las escaleras corriendo
- Corrimos hacia abajo por las escaleras
- (Sp. meaning is used contrastively to focus on
run)
3Insights from Typological and Cognitive
Linguistics
- Leonard Talmy (1985, etc.), Jackendoff, etc.
- - Possible to isolate meaning elements,
especially by cross-linguistic investigation
(transparent morphology, diathesis) - - For a motion event verb motion, path, figure,
ground, manner, cause - Then
- - We can then examine which semantic elements
are expressed by which lexical and syntactic
(surface) elements
4Meaning-Surface elements not in 1-1 correspondence
- The relationship is mostly not one-to-one
- Combination of surface elements are sometimes
required to express one semantic element - French negative ne pas
- Semitic definite article def-N def-Adj
- al-rajulu(a)l-thariiyun (Mod. Std. Arabic)
- the-man the-wealthy the wealthy man
- More than one semantic element expressed by one
lexical item (what we're looking at here) Talmy
calls this lexicalization (note there is also a
different NLP use, meaning retrieve from lexicon
and associate with tokens POST)
5Typical Pattern for English
Focusing on verb root (vs. rootaffixes) Most
common type for Indo-European Lang. (also
Chinese) Exs BELOC -- The lamp
stood/lay/leaned on the table. MOVE/GO -- The
rock slid/rolled/bounced down the hill.
I ran/limped/jumped/stumbled/groped my way down
the stairs.
6Conceptual Semantics Representation
- Jackendoff (1983, 1990, etc.) Implemented in an
MT system by Bonnie Dorr (1993--U. Maryland) - (capitalized words are meta-lg.
elementswhich happen to be in English) - EVENT GOLOC
- (THINGltanimgt __ , PATH TO LOCATION __
) - MANNER WALKING
- Lexical Conceptual Structure (LCS) of verb root
walk
7LCS Composition
- Combines with LCS for preposition across
- PATH ACROSS LOCATION
- and with LCSs for the nouns Joe and street
- THINGltanimgt Joe LOCATION street
8LCS Composition
- to yield
- EVENT GOLOC
- (THINGltanimgt Joe , PATH ACROSS LOC
street ) MANNER WALKING - Joe walked across the street
9Spanish prefers to incorporate Path
Talmy again, on Romance packaging of motion verbs
10Spanish LCS for motion verb
- In Jackendoff version, verb root "cruzar" is
then - EVENT GOLOC
- (THING , PATH ACROSS LOC )
- Contrast with English walk
- EVENT GO
- (THINGltanimgt , PATH TO LOCATION
) - MANNER WALKING
- Generic TO Path unifies with any other Path
element
11LCS Composition
- Adjoin the manner adverbial caminando
-
- MANNER WALKING
- and the location and figure expressions, to
yield - EVENT GOLOC
- (THINGltanimategt José, PATH ACROSS
LOCATION calle ) - MANNER WALKING
- José cruzó la calle caminando
12Other Languages Incorporate Figure
- Atsugewi (Hokan, Northern Calif.)
- We also have a some examples in English (rain,
spit, drool, etc.) - But most Atsugewi motion verbs are of this type
- No attested languages that incorporate
motionground as primary type not clear why
13Criteria for Determining Language Type
- Talmy gives 3 criteria for determining a
languages lexicalization type - verb type should be frequent in occurrence
- colloquial in style (not literary)
- pervasive/wide range of semantic notions
expressed - Thus, we have spit and cross and
descend, but majority type is like walk and
run. Similarly, Spanish has caminar and
correr, but majority type is like cruzar or
bajar
14General Analysis Procedure
- Identify main verb, retrieve the LCS, then fill
required arguments by unifying with compatible
LCS elements adjoin any additional
(non-argumental) phrases indicates obligatory
argument - walk across
- EVENT GOLOCATION Path ACROSS
(Location ) - (THING , PATH TO LOCATION )
- MANNER WALKING
- joe street
- THINGltanimgt joe LOCATION street
15General Analysis Procedure
- Since across will unify with the default PATH
(expressed as TO), it gets incorporated in as
argument of GO event - EVENT GOLOCATION
- (THING Joe , PATH ACROSS LOCATION
street ) - MANNER WALKING
-
- Composed LCS (CLCS-Dorr 93)
- (versus Root LCS (RLCS), which is just verb root
LCS
16Lexical Selection in Generation
- Look for match which covers the most LCS
elements possible cruzar can cover two elements -
- CLCS
- EVENT GOLOC (THING joe , PATH ACROSS LOC
street ) MANNER WALKING -
- EVENT GOLOC (THING , PATH ACROSS LOC )
? LCS included as - part of lexical entry for cruzar
17Lexical Selection in Generation
- May have multiple matches the verb caminar also
covers two elements -
- CLCS
- EVENT GOLOC (THING joe , PATH ACROSS LOC
street ) MANNER WALKING -
- EVENT GOLOC (THING , PATH TO LOC )
MANNERWALKING ? LCS included as - part of lexical entry
for walk
18Scaling Problems
Familiar Knowledge Acquisition Bottleneck
Application of 500 verbs is a problem, with
5,000 near impossible, though now large
annotation projects which may provide some of
this knowledge in a machine-readable
form Labor-intensive, consistency problems
19Automate Acquisition of LCSs?
- Tool can lead the developer through a decision
tree - Is core verb meaning stative or eventive?
- If eventive
- Does the core meaning entail GO, STAY or INCHO
- If GO
- Choose the best PATH from list of 12
- (ABOUT, ACROSS,ALONG, AWAY-FROM,DOWN,FROM,IN,
TO,TOWARD, etc) - Is a causative alternation possible
- Is the causative element CAUSE or LET
- etc., etc.
20Automate Acquisition?
- Still problems with labor intensive aspect of
development - System was brittle
- Many other kinds of information are required in
addition to the thematic structure (LCS
representation) - Problems may not be insurmountable--probabalistic
methods seem to be a likely solution to
brittleness problem. - Learning algorithms - CMU automated elicitation
method (Avenue project)
21Toward Automating the Acquisition Process
- Apply Diagnostic Tests, either via human in the
loop or by searching large corpora. Note that
the hypothesis is that theres a finite (fairly
small) number of templates (below is a large
fraction of them) - EVENT ? INCHO(STATE)
- EVENT ? GO (THING,PATH)
- EVENT ? STAY (THING,POSITION)
- STATE ? BE (THING,POSITION)
- STATE ? ORIENT (THING,PATH)
- STATE ? GO-EXTENT(THING,PATH)
- EVENT ? LET ( THING/EVENT, EVENT/STATE)
- EVENT ? CAUSE-EXCHANGE (EVENT,EVENT)
- EVENT ? CAUSE ( THING/EVENT, EVENT/STATE)
22Automating the Acquisition Process--Diagnostics
- Determining if the verb root is eventive or
stative - -does it occur in present progressive
constructions? - John is walking to the store
- -does it occur in Pseudo-cleft constructions?
- What John did was walk to the store
- -does it have only a habitual interpretation if
used in the simple present tense? - John walks to the store (every day)
- ??Oh look, John walks to the store
-
- If so, its likely eventive
23Automating the Acquisition Process--Diagnostics
- Stative predicates pattern oppositely
- John is knowing Spanish
- What John did was know Spanish
- John knows Spanish - has a true present tense
interpretation
24Automating the Acquisition Process--Diagnostics
- Many other observable patterns which can offer
clues - Levin Alternations (English Verb Classes and
Alternations) - 11 different properties listed for this class of
manner of motion verb - Wide variation in path expressions allowed
- Induced Action alternation
- Locative preposition drop
- Locative inversion
- There-insertion
- Adjectival Passive
- etc.
- Taken alone, tests are inconclusive, but
effective in combination
25Automating the Acquisition Process--Diagnostics
- Certain patterns of occurrence are indicative of
- very specific LCS characteristics
- -adjectives as base for inchoative events like
redden, sadden, darken, lessen - EVENT INCHO (STATE BE THING ,
POSITION AT PROPERTY RED )
26Automating the Acquisition Process--Diagnostics
- Certain patterns of occurrence are indicative of
- very specific LCS characteristics
- Causative alternations with animate subjects very
suggestive of - cause (Tltanimgt , E)
- The boat docked/The captain docked the boat
- Non-movable objects appearing to go somewhere
(fictive motion) STATE GO-EXTENT (THING
, PATH THING/LOCATION ) - The road went under the bridge
27Automating the Acquisition Process--Diagnostics
- Difference between Stative and Durative
- Durative predicates can be volitionally
maintained, stative generally cant - John deliberately sits in front of Bill versus
John is deliberately muddy - Note that even though the verb sit is intuitively
durative (extends over a period of time) and
homogenous (sitting doesnt entail change of
state) it is not truly stative, since it fails
the tests for eventivity we noted earlier. - In LCS terms, this translates to using EVENT
STAY ... for durative events, and STATE BE ...
for states. - Characteristic of processes imperfective
paradox - John is running gt John has run
- versus accomplishments or achievements (change of
state events) - John is building a house ?gt John has built a house
28Toward Automating the Acquisition Process
- Either search large corpora or employ native
speakers with appropriate interface as done for
automated acquisition of transfer rules in
Carbonell et al (2002) CMU Avenue project. - Some diagnostic questions are impossible to
answer fully automatically because were looking
for an absence of something (e.g. pseudo-cleft of
verb hypothesized to be stative) fact that it
doesnt occur doesnt mean it cant occur.