Learning%20Language%20from%20its%20Perceptual%20Context - PowerPoint PPT Presentation

About This Presentation
Title:

Learning%20Language%20from%20its%20Perceptual%20Context

Description:

Simulated soccer field. Coach ... Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. ... mouse) gave(john, bag, mouse) threw ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 78
Provided by: Raymond
Category:

less

Transcript and Presenter's Notes

Title: Learning%20Language%20from%20its%20Perceptual%20Context


1
Learning Language from its Perceptual Context
  • Ray Mooney
  • Department of Computer Sciences
  • University of Texas at Austin

Joint work with David Chen Rohit Kate Yuk Wah
Wong
2
Current State of Natural Language Learning
  • Most current state-of-the-art NLP systems are
    constructed by training on large supervised
    corpora.
  • Syntactic Parsing Penn Treebank
  • Word Sense Disambiguation SenseEval
  • Semantic Role Labeling Propbank
  • Machine Translation Hansards corpus
  • Constructing such annotated corpora is difficult,
    expensive, and time consuming.

3
Semantic Parsing
  • A semantic parser maps a natural-language
    sentence to a complete, detailed semantic
    representation logical form or meaning
    representation (MR).
  • For many applications, the desired output is
    immediately executable by another program.
  • Two application domains
  • GeoQuery A Database Query Application
  • CLang RoboCup Coach Language

4
GeoQuery A Database Query Application
  • Query application for U.S. geography database
    Zelle Mooney, 1996

DataBase
5
CLang RoboCup Coach Language
  • In RoboCup Coach competition teams compete to
    coach simulated soccer players
  • The coaching instructions are given in a formal
    language called CLang

Simulated soccer field
6
Learning Semantic Parsers
  • Manually programming robust semantic parsers is
    difficult due to the complexity of the task.
  • Semantic parsers can be learned automatically
    from sentences paired with their logical form.

NL?MR Training Exs
Meaning Rep
7
Our Semantic-Parser Learners
  • CHILLWOLFIE (Zelle Mooney, 1996 Thompson
    Mooney, 1999, 2003)
  • Separates parser-learning and semantic-lexicon
    learning.
  • Learns a deterministic parser using ILP
    techniques.
  • COCKTAIL (Tang Mooney, 2001)
  • Improved ILP algorithm for CHILL.
  • SILT (Kate, Wong Mooney, 2005)
  • Learns symbolic transformation rules for mapping
    directly from NL to MR.
  • SCISSOR (Ge Mooney, 2005)
  • Integrates semantic interpretation into Collins
    statistical syntactic parser.
  • WASP (Wong Mooney, 2006 2007)
  • Uses syntax-based statistical machine translation
    methods.
  • KRISP (Kate Mooney, 2006)
  • Uses a series of SVM classifiers employing a
    string-kernel to iteratively build semantic
    representations.

?
?
8
WASPA Machine Translation Approach to Semantic
Parsing
  • Uses statistical machine translation techniques
  • Synchronous context-free grammars (SCFG) (Wu,
    1997 Melamed, 2004 Chiang, 2005)
  • Word alignments (Brown et al., 1993 Och Ney,
    2003)
  • Hence the name Word Alignment-based Semantic
    Parsing

9
A Unifying Framework for Parsing and Generation
Natural Languages
Machine translation
10
A Unifying Framework for Parsing and Generation
Natural Languages
Semantic parsing
Machine translation
Formal Languages
11
A Unifying Framework for Parsing and Generation
Natural Languages
Semantic parsing
Machine translation
Tactical generation
Formal Languages
12
A Unifying Framework for Parsing and Generation
Synchronous Parsing
Natural Languages
Semantic parsing
Machine translation
Tactical generation
Formal Languages
13
A Unifying Framework for Parsing and Generation
Synchronous Parsing
Natural Languages
Semantic parsing
Compiling Aho Ullman (1972)
Machine translation
Tactical generation
Formal Languages
14
Synchronous Context-Free Grammars (SCFG)
  • Developed by Aho Ullman (1972) as a theory of
    compilers that combines syntax analysis and code
    generation in a single phase.
  • Generates a pair of strings in a single
    derivation.

15
Synchronous Context-Free GrammarProduction Rule
Natural language
Formal language
QUERY ? What is CITY / answer(CITY)
16
Synchronous Context-Free Grammar Derivation
QUERY
QUERY
What is the capital of Ohio
answer(capital(loc_2(stateid('ohio'))))
STATE ? Ohio / stateid('ohio')
17
Probabilistic Parsing Model
d1
CITY
CITY
capital ( CITY )
capital
CITY
of
STATE
loc_2 ( STATE )
Ohio
stateid ( 'ohio' )
STATE ? Ohio / stateid('ohio')
18
Probabilistic Parsing Model
d2
CITY
CITY
capital ( CITY )
capital
CITY
of
RIVER
loc_2 ( RIVER )
Ohio
riverid ( 'ohio' )
RIVER ? Ohio / riverid('ohio')
19
Probabilistic Parsing Model
d1
d2
CITY
CITY
capital ( CITY )
capital ( CITY )
loc_2 ( STATE )
loc_2 ( RIVER )
stateid ( 'ohio' )
riverid ( 'ohio' )
0.5
0.5
?
?
0.3
0.05
0.5
0.5
STATE ? Ohio / stateid('ohio')
RIVER ? Ohio / riverid('ohio')
1.3
1.05
Pr(d1capital of Ohio) exp( ) / Z
Pr(d2capital of Ohio) exp( ) / Z
normalization constant
20
Overview of WASP
Unambiguous CFG of MRL
Lexical acquisition
Training set, (e,f)
Lexicon, L (an SCFG)
Parameter estimation
SCFG parameterized by ?
Training
Testing
Input sentence, e'
Output MR, f'
Semantic parsing
21
Tactical Generation
  • Can be seen as inverse of semantic parsing

The goalie should always stay in our half
Semantic parsing
((true) (do our 1 (pos (half our))))
22
Generation by Inverting WASP
  • Same synchronous grammar is used for both
    generation and semantic parsing.

Tactical generation
Semantic parsing
NL
MRL
QUERY ? What is CITY / answer(CITY)
23
Learning Language from Perceptual Context
  • Children do not learn language from annotated
    corpora.
  • Neither do they learn language from just reading
    the newspaper, surfing the web, or listening to
    the radio.
  • Unsupervised language learning
  • DARPA Learning by Reading Program
  • The natural way to learn language is to perceive
    language in the context of its use in the
    physical and social world.
  • This requires inferring the meaning of utterances
    from their perceptual context.

23
24
Language Grounding
  • The meanings of many words are grounded in our
    perception of the physical world red, ball, cup,
    run, hit, fall, etc.
  • Symbol Grounding Harnad (1990)
  • Even many abstract words and meanings are
    metaphorical abstractions of terms grounded in
    the physical world up, down, over, in, etc.
  • Lakoff and Johnsons Metaphors We Live By
  • Its difficult to put my words into ideas.
  • Most NLP work represents meaning without any
    connection to perception circularly defining the
    meanings of words in terms of other words or
    meaningless symbols with no firm foundation.

24
25
Mary is on the phone
26
Ambiguous Supervision for Learning Semantic
Parsers
  • A computer system simultaneously exposed to
    perceptual contexts and natural language
    utterances should be able to learn the underlying
    language semantics.
  • We consider ambiguous training data of sentences
    associated with multiple potential MRs.
  • Siskind (1996) uses this type referentially
    uncertain training data to learn meanings of
    words.
  • Extracting meaning representations from
    perceptual data is a difficult unsolved problem.
  • Our system directly works with symbolic MRs.

27
Mary is on the phone
28
Mary is on the phone
29
Ironing(Mommy, Shirt)
Mary is on the phone
30
Ironing(Mommy, Shirt)
Working(Sister, Computer)
Mary is on the phone
31
Ironing(Mommy, Shirt)
Carrying(Daddy, Bag)
Working(Sister, Computer)
Mary is on the phone
32
Ambiguous Training Example
Ironing(Mommy, Shirt)
Carrying(Daddy, Bag)
Working(Sister, Computer)
Talking(Mary, Phone)
Sitting(Mary, Chair)
Mary is on the phone
33
Next Ambiguous Training Example
Ironing(Mommy, Shirt)
Working(Sister, Computer)
Talking(Mary, Phone)
???
Sitting(Mary, Chair)
Mommy is ironing a shirt
34
Ambiguous Supervision for Learning Semantic
Parsers (cont.)
  • Our model of ambiguous supervision corresponds to
    the type of data that will be gathered from a
    temporal sequence of perceptual contexts with
    occasional language commentary.
  • We assume each sentence has exactly one meaning
    in its perceptual context.
  • Recently extended to handle sentences with no
    meaning in its perceptual context.
  • Each meaning is associated with at most one
    sentence.

35
Sample Ambiguous Corpus
gave(daisy, clock, mouse)
ate(mouse, orange)
Daisy gave the clock to the mouse.
ate(dog, apple)
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
Forms a bipartite graph
36
KRISPER KRISP with EM-like Retraining
  • Extension of KRISP that learns from ambiguous
    supervision.
  • Uses an iterative EM-like self-training method to
    gradually converge on a correct meaning for each
    sentence.

37
KRISPERs Training Algorithm
1. Assume every possible meaning for a sentence
is correct
gave(daisy, clock, mouse)
ate(mouse, orange)
Daisy gave the clock to the mouse.
ate(dog, apple)
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
38
KRISPERs Training Algorithm
1. Assume every possible meaning for a sentence
is correct
gave(daisy, clock, mouse)
ate(mouse, orange)
Daisy gave the clock to the mouse.
ate(dog, apple)
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
39
KRISPERs Training Algorithm
2. Resulting NL-MR pairs are weighted and given
to KRISP
gave(daisy, clock, mouse)
1/2
ate(mouse, orange)
Daisy gave the clock to the mouse.
1/2
ate(dog, apple)
1/4
1/4
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
1/4
1/4
broke(dog, box)
1/5
1/5
1/5
The dog broke the box.
gave(woman, toy, mouse)
1/5
1/5
gave(john, bag, mouse)
1/3
1/3
John gave the bag to the mouse.
threw(dog, ball)
1/3
1/3
runs(dog)
1/3
The dog threw the ball.
1/3
saw(john, walks(man, dog))
40
KRISPERs Training Algorithm
3. Estimate the confidence of each NL-MR pair
using the resulting trained parser
gave(daisy, clock, mouse)
ate(mouse, orange)
Daisy gave the clock to the mouse.
ate(dog, apple)
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
41
KRISPERs Training Algorithm
4. Use maximum weighted matching on a bipartite
graph to find the best NL-MR pairs Munkres,
1957
gave(daisy, clock, mouse)
0.92
ate(mouse, orange)
Daisy gave the clock to the mouse.
0.11
ate(dog, apple)
0.32
0.88
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
0.22
0.24

broke(dog, box)
0.18
0.71
0.85
The dog broke the box.
gave(woman, toy, mouse)
0.14
0.95
gave(john, bag, mouse)
0.24
0.89
John gave the bag to the mouse.
threw(dog, ball)
0.33
0.97
runs(dog)
0.81
The dog threw the ball.
0.34
saw(john, walks(man, dog))
42
KRISPERs Training Algorithm
4. Use maximum weighted matching on a bipartite
graph to find the best NL-MR pairs Munkres,
1957
gave(daisy, clock, mouse)
0.92
ate(mouse, orange)
Daisy gave the clock to the mouse.
0.11
ate(dog, apple)
0.32
0.88
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
0.22
0.24
broke(dog, box)
0.18
0.71
0.85
The dog broke the box.
gave(woman, toy, mouse)
0.14
0.95
gave(john, bag, mouse)
0.24
0.89
John gave the bag to the mouse.
threw(dog, ball)
0.33
0.97
runs(dog)
0.81
The dog threw the ball.
0.34
saw(john, walks(man, dog))
43
KRISPERs Training Algorithm
5. Give the best pairs to KRISP in the next
iteration, and repeat until convergence
gave(daisy, clock, mouse)
ate(mouse, orange)
Daisy gave the clock to the mouse.
ate(dog, apple)
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
44
Results on Ambig-ChildWorld Corpus
45
New ChallengeLearning to Be a Sportscaster
  • Goal Learn from realistic data of natural
    language used in a representative context while
    avoiding difficult issues in computer perception
    (i.e. speech and vision).
  • Solution Learn from textually annotated traces
    of activity in a simulated environment.
  • Example Traces of games in the Robocup simulator
    paired with textual sportscaster commentary.

46
Grounded Language Learning in Robocup
Robocup Simulator
Sportscaster
Score!!!!
Score!!!!
47
Robocup Sportscaster Trace
Natural Language Commentary
Meaning Representation
badPass ( Purple1, Pink8 )
turnover ( Purple1, Pink8 )
Purple goalie turns the ball over to Pink8
kick ( Pink8)
pass ( Pink8, Pink11 )
Purple team is very sloppy today
kick ( Pink11 )
Pink8 passes the ball to Pink11
Pink11 looks around for a teammate
kick ( Pink11 )
ballstopped
kick ( Pink11 )
Pink11 makes a long pass to Pink8
pass ( Pink11, Pink8 )
kick ( Pink8 )
pass ( Pink8, Pink11 )
Pink8 passes back to Pink11
48
Robocup Sportscaster Trace
Natural Language Commentary
Meaning Representation
badPass ( Purple1, Pink8 )
turnover ( Purple1, Pink8 )
Purple goalie turns the ball over to Pink8
kick ( Pink8)
pass ( Pink8, Pink11 )
Purple team is very sloppy today
kick ( Pink11 )
Pink8 passes the ball to Pink11
Pink11 looks around for a teammate
kick ( Pink11 )
ballstopped
kick ( Pink11 )
Pink11 makes a long pass to Pink8
pass ( Pink11, Pink8 )
kick ( Pink8 )
pass ( Pink8, Pink11 )
Pink8 passes back to Pink11
49
Robocup Sportscaster Trace
Natural Language Commentary
Meaning Representation
badPass ( Purple1, Pink8 )
turnover ( Purple1, Pink8 )
Purple goalie turns the ball over to Pink8
kick ( Pink8)
pass ( Pink8, Pink11 )
Purple team is very sloppy today
kick ( Pink11 )
Pink8 passes the ball to Pink11
Pink11 looks around for a teammate
kick ( Pink11 )
ballstopped
kick ( Pink11 )
Pink11 makes a long pass to Pink8
pass ( Pink11, Pink8 )
kick ( Pink8 )
pass ( Pink8, Pink11 )
Pink8 passes back to Pink11
50
Robocup Sportscaster Trace
Natural Language Commentary
Meaning Representation
P6 ( C1, C19 )
P5 ( C1, C19 )
Purple goalie turns the ball over to Pink8
P1( C19 )
P2 ( C19, C22 )
Purple team is very sloppy today
P1 ( C22 )
Pink8 passes the ball to Pink11
Pink11 looks around for a teammate
P1 ( C22 )
P0
P1 ( C22 )
Pink11 makes a long pass to Pink8
P2 ( C22, C19 )
P1 ( C19 )
P2 ( C19, C22 )
Pink8 passes back to Pink11
51
Sportscasting Data
  • Collected human textual commentary for the 4
    Robocup championship games from 2001-2004.
  • Avg events/game 2,613
  • Avg sentences/game 509
  • Each sentence matched to all events within
    previous 5 seconds.
  • Avg MRs/sentence 2.5 (min 1, max 12)
  • Manually annotated with correct matchings of
    sentences to MRs (for evaluation purposes only).

52
WASPER
  • WASP with EM-like retraining to handle ambiguous
    training data.
  • Same augmentation as added to KRISP to create
    KRISPER.

53
KRISPER-WASP
  • First iteration of EM-like training produces very
    noisy training data (gt 50 errors).
  • KRISP is better than WASP at handling noisy
    training data.
  • SVM prevents overfitting.
  • String kernel allows partial matching.
  • But KRISP does not support language generation.
  • First train KRISPER just to determine the best
    NL?MR matchings.
  • Then train WASP on the resulting unambiguously
    supervised data.

54
WASPER-GEN
  • In KRISPER and WASPER, the correct MR for each
    sentence is chosen based on maximizing the
    confidence of semantic parsing (NL?MR).
  • Instead, WASPER-GEN determines the best matching
    based on generation (MR?NL).
  • Score each potential NL/MR pair by using the
    currently trained WASP-1 generator.
  • Compute NIST MT score (alternative to BLEU score)
    between the generated sentence and the potential
    matching sentence.

55
Strategic Generation
  • Generation requires not only knowing how to say
    something (tactical generation) but also what to
    say (strategic generation).
  • For automated sportscasting, one must be able to
    effectively choose which events to describe.

56
Example of Strategic Generation
pass ( purple7 , purple6 ) ballstopped kick (
purple6 ) pass ( purple6 , purple2 )
ballstopped kick ( purple2 ) pass ( purple2 ,
purple3 ) kick ( purple3 ) badPass ( purple3 ,
pink9 ) turnover ( purple3 , pink9 )
57
Example of Strategic Generation
pass ( purple7 , purple6 ) ballstopped kick (
purple6 ) pass ( purple6 , purple2 )
ballstopped kick ( purple2 ) pass ( purple2 ,
purple3 ) kick ( purple3 ) badPass ( purple3 ,
pink9 ) turnover ( purple3 , pink9 )
58
Learning for Strategic Generation
  • For each event type (e.g. pass, kick) estimate
    the probability that it is described by the
    sportscaster.
  • Requires NL/MR matching that indicates which
    events were described, but this is not provided
    in the ambiguous training data.
  • Use estimated matching computed by KRISPER,
    WASPER or WASPER-GEN.
  • Use a version of EM to determine the probability
    of mentioning each event type just based on
    strategic info.

59
Iterative Generation Strategy Learning (IGSL)
  • Directly estimates the likelihood of commenting
    on each event type from the ambiguous training
    data.
  • Uses self-training iterations to improve
    estimates (à la EM).
  • Uses events not associated with any NL as
    negative evidence for commenting on that event
    type.

60
Demo
  • Game clip commentated using WASPER-GEN with
    EM-based strategic generation, since this gave
    the best results for generation.
  • FreeTTS was used to synthesize speech from
    textual output.

61
Experimental Evaluation
  • Generated learning curves by training on all
    combinations of 1 to 3 games and testing on all
    games not used for training.
  • Baselines
  • Random Matching WASP trained on random choice of
    possible MR for each comment.
  • Gold Matching WASP trained on correct matching
    of MR for each comment.
  • Metrics
  • Precision of systems annotations that are
    correct
  • Recall of gold-standard annotations correctly
    produced
  • F-measure Harmonic mean of precision and recall

62
Evaluating Matching Accuracy
  • Measure how accurately various methods assign MRs
    to sentences in the ambiguous training data.
  • Use gold-standard matches to evaluate correctness.

63
Results on Matching
64
Evaluating Semantic Parsing
  • Measure how accurately learned parser maps
    sentences to their correct meanings in the test
    games.
  • Use the gold-standard matches to determine the
    correct MR for each sentence that has one.
  • Generated MR must exactly match gold-standard to
    count as correct.

65
Results on Semantic Parsing
66
Evaluating Tactical Generation
  • Measure how accurately NL generator produces
    English sentences for chosen MRs in the test
    games.
  • Use gold-standard matches to determine the
    correct sentence for each MR that has one.
  • Use NIST score to compare generated sentence to
    the one in the gold-standard.

67
Results on Tactical Generation
68
Evaluating Strategic Generation
  • In the test games, measure how accurately the
    system determines which perceived events to
    comment on.
  • Compare the subset of events chosen by the system
    to the subset chosen by the human annotator (as
    given by the gold-standard matching).

69
Results on Strategic Generation
70
Human Evaluation(Quasi Turing Test)
  • Asked 4 fluent English speakers to evaluate
    overall quality of sportscasts.
  • Randomly picked a 2 minute segment from each of
    the 4 games.
  • Each human judge evaluated 8 commented game
    clips, each of the 4 segments commented once by a
    human and once by the machine when tested on that
    game.
  • The 8 clips presented to each judge were shown in
    random counter-balanced order.
  • Judges were not told which ones were human or
    machine generated.

71
Human Evaluation Metrics
Score English Fluency Semantic Correctness Sportscasting Ability
5 Flawless Always Excellent
4 Good Usually Good
3 Non-native Sometimes Average
2 Disfluent Rarely Bad
1 Gibberish Never Terrible
72
Results on Human Evaluation
Commentator English Fluency Semantic Correctness Sportscasting Ability
Human 3.94 4.25 3.63
Machine 3.44 3.56 2.94
Difference ?0.5 ?0.69 ?0.69
73
Immediate Future Directions
  • Use strategic generation information to improve
    resolution of ambiguous training data.
  • Improve WASPs ability to handle noisy training
    data.
  • Improve simulated perception to extract more
    detailed and interesting symbolic facts from the
    simulator.

74
Machine Learning Research Direction
  • Learning from ambiguous/weak supervision
    (Siskind,1996).
  • Multiple Instance Learning assumes weak
    supervision for classification in the form of
    positive bags which contain at least on
    positive instance (Dietterich, et al., 1997)
  • Learning with Structured Data assumes I/O are
    complex data structures (e.g. strings or graphs)
    rather than simple vectors and class labels
    (Bakir et al., 2007)
  • Need, Structured Multiple Instance Learning
    where an input string is paired with a set of
    possible MRs, one of which is likely to be
    correct.

75
Longer Term Future Directions
  • Apply approach to learning situated language in a
    computer video-game environment (Gorniak Roy,
    2005)
  • Teach game AIs how to talk to you!
  • Apply approach to captioned images or video using
    computer vision to extract objects, relations,
    and events from real perceptual data (Fleischman
    Roy, 2007)

76
Blatant Talk Advertisement
  • Watch, Listen Learn Co-training on Captioned
    Images and Videos, S. Gupta, J. Kim, K. Grauman,
    and R. Mooney
  • Semi-Supervised Learning session, Thursday,
    1140, R002

77
Conclusions
  • Current language learning work uses expensive,
    unrealistic training data.
  • We have developed language learning systems that
    can learn from sentences paired with an ambiguous
    perceptual environment.
  • We have evaluated it on learning to sportscast
    simulated Robocup games where it learns to
    commentate games almost as well as humans.
  • Learning to connect language and perception is an
    important and exciting research problem.
Write a Comment
User Comments (0)
About PowerShow.com