Title: Natural Language Processing
1Natural Language Processing
2Aspects of language processing
- Word, lexicon lexical analysis
- Morphology, word segmentation
- Syntax
- Sentence structure, phrase, grammar,
- Semantics
- Meaning
- Execute commands
- Discourse analysis
- Meaning of a text
- Relationship between sentences (e.g. anaphora)
3Applications
- Detect new words
- Language learning
- Machine translation
- NL interface
- Information retrieval
4Brief history
- 1950s
- Early MT word translation re-ordering
- Chomskys Generative grammar
- Bar-Hills argument
- 1960-80s
- Applications
- BASEBALL use NL interface to search in a
database on baseball games - LUNAR NL interface to search in Lunar
- ELIZA simulation of conversation with a
psychoanalyst - SHREDLU use NL to manipulate block world
- Message understanding understand a newspaper
article on terrorism - Machine translation
- Methods
- ATN (augmented transition networks) extended
context-free grammar - Case grammar (agent, object, etc.)
- DCG Definite Clause Grammar
- Dependency grammar an element depends on another
- 1990s-now
- Statistical methods
5Classical symbolic methods
- Morphological analyzer
- Parser (syntactic analysis)
- Semantic analysis (transform into a logical form,
semantic network, etc.) - Discourse analysis
- Pragmatic analysis
6Morphological analysis
- Goal recognize the word and category
- Using a dictionary word category
- Input form (computed)
- Morphological rules
- Lemma ed -gt Lemma e (verb in past form)
-
- Is Lemma in dict.? If yes, the transformation is
possible - Form -gt a set of possible lemmas
7Parsing (in DCG)
- s --gt np, vp. det --gta. det --gt an.
- np --gt det, noun. det --gt the.
- np --gt proper_noun. noun --gt apple.
- vp --gt v, ng. noun --gt orange.
- vp --gt v. proper_noun --gt john.
- proper_noun --gt mary.
- v --gt eats.
- v --gt loves.
- Eg. john eats an apple.
- proper_noun v det noun
- np
- np vp
- s
8Semantic analysis
- john eats an apple. Sem. Cat
(Ontology) - proper_noun v det noun object
- person john ?Y?X eat(X,Y) apple
- np animated non-anim
- apple
- np vp person animal food
- person john eat(X, apple)
- s vertebral fruit
- eat(person john, apple)
- apple
9Parsing semantic analysis
- Rules syntactic rules or semantic rules
- What component can be combined with what
component? - What is the result of the combination?
- Categories
- Syntactic categories Verb, Noun,
- Semantic categories Person, Fruit, Apple,
- Analyses
- Recognize the category of an element
- See how different elements can be combined into a
sentence - Problem The choice is often not unique
10Write a semantic analysis grammar
- S(pred(obj)) -gt NP(obj) VP(pred)
- VP(pred(obj)) -gt Verb(pred) NP(obj)
- NP(obj) -gt Name(obj)
- Name(John) -gt John
- Name(Mary) -gt Mary
- Verb(?y?x Loves(x,y)) -gt loves
11Discourse analysis
- Anaphora
- He hits the car with a stone. It bounces back.
- Understanding a text
- Who/when/where/what are involved in an event?
- How to connect the semantic representations of
different sentences? - What is the cause of an event and what is the
consequence of an action?
12Pragmatic analysis
- Practical usage of language what a sentence
means in practice - Do you have time?
- How do you do?
- It is too cold to go outside!
13Some success stories - ELIZA
- young woman Men are all alike.
- eliza In what way?
- young woman They're always bugging us about
something specific or other. - eliza Can you think of a specific example?
- young woman Well, my boyfriend made me come
here. - eliza Your boyfriend made you come here?
- young woman He says I'm depressed much of the
time. - eliza I am sorry to hear you are depressed.
- young woman It's true. I'm unhappy.
- eliza Do you think coming here will help you not
to be unhappy? - young woman I need some help, that much seems
certain. - eliza What would it mean to you if you got some
help? - young woman Perhaps I could learn to get along
with my mother. - eliza Tell me more about your family.
- young woman My mother takes care of me.
14ELIZA Contd
- eliza Who else in your family takes care of you?
- young woman My father.
- eliza Your father.
- young woman You are like my father in some ways.
- eliza What resemblance do you see?
- young woman You are not very aggressive but I
think you don't want me to notice that. - eliza What makes you think I am not very
aggressive? - young woman You don't argue with me.
- eliza Why do you think I don't argue with you?
- young woman You are afraid of me.
- eliza Does it please you to believe I am afraid
of you? - young woman My father is afraid of everybody.
- eliza What else comes to mind when you think of
your father? - young woman Bullies.
- eliza Does that have anything to do with the
fact that your boyfriend made you come here?
15Methods used in Eliza
- Find a trigger in the input of the user (e.g.
father) - Evoke a possible candidate pattern (e.g. family
or mother) (limited parsing) - Compose a sentence by filling in the slots of the
pattern (picking some elements from the user
input) - If no appropriate pattern is found, ask a general
question, possibly related to the user input
16RACTER poem and prose composer
- Slowly I dream of flying. I observe turnpikes and
streets - studded with bushes. Coldly my soaring widens my
awareness. - To guide myself I determinedly start to kill my
pleasure - during the time that hours and milliseconds pass
away. Aid me in this - and soaring is formidable, do not and singing is
unhinged. -
- Side and tumble and fall among
- The dead. Here and there
- Will be found a utensil.
17Success story METEO Environment Canada
- Generate and translate METEO forecasts
automatically Englishlt-gtFrench
- Aujourd'hui, 26 novembre
- Généralement nuageux. Vents du sud-ouest de 20
km/h avec rafales à 40 devenant légers cet
après-midi. Températures stables près de plus 2. - Ce soir et cette nuit, 26 novembre
- Nuageux. Neige débutant ce soir. Accumulation de
15 cm. Minimum zéro.
- Today, 26 November
- Mainly cloudy. Wind southwest 20 km/h gusting to
40 becoming light this afternoon. Temperature
steady near plus 2. - Tonight, 26 November
- Cloudy. Snow beginning this evening. Amount 15
cm. Low zero.
18Problems
- Ambiguity
- Lexical/morphological change (V,N), training
(V,N), even (ADJ, ADV) - Syntactic Helicopter powered by human flies
- Semantic He saw a man on the hill with a
telescope. - Discourse anaphora,
- Classical solution
- Using a later analysis to solve ambiguity of an
earlier step - Eg. He gives him the change.
- (change as verb does not work for parsing)
- He changes the place.
- (change as noun does not work for parsing)
- However He saw a man on the hill with a
telescope. - Correct multiple parsings
- Correct semantic interpretations -gt semantic
ambiguity - Use contextual information to disambiguate (does
a sentence in the text mention that He holds a
telescope?)
19Rules vs. statistics
- Rules and categories do not fit a sentence
equally - Some are more likely in a language than others
- E.g.
- hardcopy noun or verb?
- P(N hardcopy) gtgt P(V hardcopy)
- the training
- P(N training, Det) gt P(V training, Det)
- Idea use statistics to help
20Statistical analysis to help solve ambiguity
- Choose the most likely solution
- solution argmax solution P(solution word,
context) - e.g. argmax cat P(cat word, context)
- argmax sem P(sem word, context)
- Context varies largely (precedent work, following
word, category of the precedent word, ) - How to obtain P(solution word, context)?
- Training corpus
21Statistical language modeling
- Goal create a statistical model so that one can
calculate the probability of a sequence of tokens
s w1, w2,, wn in a language. - General approach
s
Training corpus
Probabilities of the observed elements
P(s)
22Prob. of a sequence of words
- Elements to be estimated
- If hi is too long, one cannot observe (hi, wi)
in the training corpus, and (hi, wi) is hard
generalize - Solution limit the length of hi
23n-grams
- Limit hi to n-1 preceding words
- Most used cases
- Uni-gram
- Bi-gram
- Tri-gram
24A simple example (corpus 10 000 words, 10 000
bi-grams)
Uni-gram P(I, talk) P(I) P(talk)
0.0010.0008 P(I, talks) P(I) P(talks)
0.0010.0008 Bi-gram P(I, talk) P(I )
P(talk I) 0.0080.2 P(I, talks) P(I )
P(talks I) 0.0080
25Estimation
- History short long
- modeling coarse refined
- Estimation easy difficult
- Maximum likelihood estimation MLE
- If (hi mi) is not observed in training corpus,
P(wihi)0 - P(they, talk)P(they) P(talkthey) 0
- never observed (they talk) in training data
- smoothing
26Smoothing
- Goal assign a low probability to words or
n-grams not observed in the training corpus
P
MLE
smoothed
word
27Smoothing methods
- n-gram ?
- Change the freq. of occurrences
- Laplace smoothing (add-one)
- Good-Turing
- change the freq. r to
- nr no. of n-grams of freq. r
28Smoothing (contd)
- Combine a model with a lower-order model
- Backoff (Katz)
- Interpolation (Jelinek-Mercer)
29Examples of utilization
- Predict the next word
- argmax w P(w previous words)
- Used in input (predict the next letter/word on
cellphone) - Use in machine aided human translation
- Source sentence
- Already translated part
- Predict the next translation word or phrase
- argmax w P(w previous trans. words, source
sent.)
30Quality of a statistical language model
- Test a trained model on a test collection
- Try to predict each word
- The more precisely a model can predict the words,
the better is the model - Perplexity (the lower, the better)
- Given P(wi) and a test text of length N
- Harmonic mean of probability
- At each word, how many choices does the model
propose? - Perplexity32 32 words could fit this position
31State of the art
- Sufficient training data
- The longer is n (n-gram), the lower is perplexity
- Limited data
- When n is too large, perplexity decreases
- Data sparseness (sparsity)
- In many NLP researches, one uses 5-grams or
6-grams - Google books n-gram (up to 5-grams)https//books.g
oogle.com/ngrams
32More than predicting words
- Speech recognition
- Training corpus signals words
- probabilities P(signalword), P(word2word1)
- Utilization signals sequence of words
- Statistical tagging
- Training corpus words tags (n, v)
- Probabilities P(wordtag), P(tag2tag1)
- Utilization sentence sequence of tags
33Example of utilization
- Speech recognition (simplified)
- argmaxw1, , wn P(w1, , wns1, , sn)
- argmaxw1, , wn P(s1, , snw1, , wn)
P(w1, , wn) - argmaxw1, , wn ?I P(siw1, , wn)P(wiwi-1)
- argmaxw1, , wn ?I P(siwi)P(wiwi-1)
- Argmax - Viterbi search
- probabilities
- P(signalword),
- P( ice-cream)P( I scream)0.8
- P(word2 word1)
- P(ice-cream eat) gt P(I scream eat)
- Input speech signals s1, s2, , sn
- I eat ice-cream. gt I eat I scream.
34Example of utilization
- Statistical tagging
- Training corpus word tag (e.g. Penn Tree
Bank) - For w1, , wn
- argmaxtag1, , tagn ?I P(witagi)P(tagitagi-1)
- probabilities
- P(wordtag)
- P(changenoun)0.01, P(changeverb)0.015
- P(tag2tag1)
- P(noundet) gtgt P(verbdet)
- Input words w1, , wn
- I give him the change.
- pronoun verb pronoun det noun gt
- pronoun verb pronoun det verb
35Some improvements of the model
- Class model
- Instead of estimating P(w2w1), estimate
P(w2Class1) - P(metake) v.s. P(meVerb)
- More general model
- Less data sparseness problem
- Skip model
- Instead of P(wiwi-1), allow P(wiwi-k)
- Allow to consider longer dependence
36State of the art on POS-tagging
- POS Part of speech (syntactic category)
- Statistical methods
- Training based on annotated corpus (text with
tags annotated manually) - Penn Treebank a set of texts with manual
annotations http//www.cis.upenn.edu/treebank/
37Penn Treebank
- One can learn
- P(wi)
- P(Tag wi), P(wi Tag)
- P(Tag2 Tag1), P(Tag3 Tag1,Tag2)
-
38State of the art of MT
- Vauquois triangle (simplified)
concept
semantic
semantic
syntax
syntax
word
word
Source language
Target language
39Triangle of Vauguois
40State of the art of MT (contd)
- General approach
- Word / term dictionary
- Phrase
- Syntax
- Limited semantics to solve common ambiguities
- Typical example Systran
41Word/term level
- Choose one translation word
- Sometimes, use context to guide the selection of
translation words - The boy grows grandir
- grow potatoes cultiver
42phrase
- Pomme de terre -gt potatoe
- Find a needle in haystacks -gt????
43Statistical machine translation
- argmax F P(FE) argmax F P(EF) P(F) / P(E)
- argmax F P(EF) P(F)
- P(EF) translation model
- P(F) language model, e.g. trigram model
- More to come later on translation model
44Summary
- Traditional NLP approaches symbolic, grammar,
- More recent approaches statistical
- For some applications statistical approaches are
better (tagging, speech recognition, ) - For some others, traditional approaches are
better (MT) - Trend combine statistics with rules (grammar)
- E.g.
- Probabilistic Context Free Grammar (PCFG)
- Consider some grammatical connections in
statistical approaches - NLP still a very difficult problem