Title: NaturalLanguage Processing
1Natural-Language Processing
- Introduction
- Overview of Natural Language Processing
- Reading Chapter 1 (JurafskyMartin)
22001 A Space Odyssey
- Dave Bowman Open the pod bay doors, HAL
- HAL Im sorry Dave, Im afraid
- I cant do that
- Sound analysis Signal Processing
- Recognize the words in the sound - Speech
Recognition - Analyze the sentence into its parts Syntax
- Understand the meaning of the words - Semantics
- Bring relevant sensors info - Information
Retrieval - World Spaceship Model Inference
- Recognize that hes got a command Speech Act
- Generate the response, with tone NL Generation
- Im sorry Dave Lexical Realization, contraction
3Definition of NLP
- Natural-language processing (NLP) systems are
computer programs that process and use human
languages in Man Machine communication. - Written vs. spoken language
- Understanding vs. generation vs. dialogue
- English vs. French vs. Japanese vs (dialects?)
- Domain Variation
4Language and Intelligence
- Turing test
- 3 participant game a computer, a human, and a
human judge - Judge asks teletyped questions of the computer
and human. - Computers job is to act like a human,
- Humans job convince judge that hes not the
machine. - Computer is judged intelligent if it can fool
the judge. - Judgment of intelligence linked to appropriate
- answers to questions from the system.
5Applications Examples
- AskJeeves finds documents on the Web that are
relevant to a users query - MS Dictation converts speech into text
- Systran, Babel translates Web pages to
different languages - The MS Word grammar checker detects and
(sometimes) corrects grammatically incorrect
sentences. - MS English Query allows a person to query a
database in English
6Other NLP Applications
- General applications include
- Machine translation
- Summarisation - SumTime!
- Speech dialogue, over the phone
- Database Query Interfaces
- Intelligent Tutoring Systems
- Speech Recognition
- Spoken Language Understanding
- Information Retrieval / Search Engines
- Categorization, Routing, Filtering, Summarization
- In general, mixed success in real-world
- Better understanding of language and language
engineering needed!
7Related Disciplines
- Linguists
- How do words form phrases ?
- What constrains the possible meanings
- Psycholinguists
- How do people identify structure of sentence
- Philosophers
- What is Meaning and how do words acquire it
- How meaning of words are combined?Sentences
- Computer Science
- Artificial Intelligence algorithms, models
- Statistics
- Do analysis with best expected chance to be
Correct
8The History of NLP 1940s 1950s
- Development of formal language theory
- (Chomsky, Kleene, Backus).
- Formal characterization of Classes of grammars
- ( Context free, finite state, )
- Language association with automata (FST)
- Probability theory language recognizers as
decoding - through a noisy channel ( Shannon).
- Established Artificial Intelligence (Dartmouth
1956) - LISP (McCarthy, 1959)
- Machine Translation.
9History 1940s 1950s
- Machine Translation
- I must go home.
- ? Ich muss nach house gehen. (German)
- The spirit is willing, but the flesh is weak.
- The Vodka is good, but the meat is rotten.
- A hydraulic ram
- ? water sheep
- Out of sight, out of mind.
- ?Blind, crazy.
10History 1960 - 1970
- THEMES
- Limited domains applications Engineering
- BASEBALL (Green 1963) Data Base Query Gener.
- Ex Who did the Yankees play on July 7?
- ? (OR (July_7 Yankees ?ANSWER) (July_7
Yankees ?ANSWER))
- STUDENT (Bobrow 1968) Algebra Word Problems
- Ex Johns age is two times Marys age
- ? John_age 2 Mary_age
- ELISA (Weizenbaum, 1966)
- Interactive dialogue as psychotherapist patient
11ELISA Sample Dialogue
- USER Man are all alike
- ELIZA In what way?
- USER Theyre always bugging us about something
or other. - ELIZA Can you think of a specific example?
- USER Well, my boyfriend made me come here.
- ELIZA Your boyfriend made you come here.
- USER He said Im depressed much of the time.
- ELIZA I am sorry to heare you are depressed.
- USER Its true, I am unhappy.
- ELIZA Do you think coming here will help you not
to be unhappy?
12ELISA Sample Dialogue
- USER I need some help, that much seems certain.
- ELIZA What would it mean to you if you got some
help?
- USER Perhaps I could learn to get along with my
mother. - ELIZA Tell me more about your family.
- USER My mother takes care of me.
- ELIZA Who else in your family takes care of you?
- USER My father.
- ELIZA Your father.
- USER You are like my father in some ways.
- ELIZA What resemblance do you see?
13ELISA Sample Dialogue
- USER You are not very aggressive but I think you
- dont want me to notice that.
- ELIZA What makes you think I am not very
aggressive?
- USER You dont argue with me.
- ELIZA Why do you think I dont argue with you?
- USER You are afraid of me!
- ELIZA Does it please you to think I am afraid of
you?
- USER My father is afraid of everybody.
- ELIZA What else comes to mind when you think of
your - father?
14 The Truth about ELISA
Pattern Matching, Key Words and Rules
- Pattern You are lt X gt me
- ? What makes you think I am lt X gt you?
- Pattern lt X gt are you lt Y gt
- ? Would you prefer if I werent lt Y gt?
- Pattern lt X gt like lt Y gt
- ? In what way?
- ? What resemblance do you see?
If no Pattern matched, then default strings were
printed ? Please go on or Very Interesting,
lt Xgt echo.
15History 1970 - 1980
- THEMES
- Semantic information processing
- - Strong Methods
- LUNAR (Woods, 1970)
- Augmented Transition Networks (ATNs)
- SHRDLU (Winograd, 1972)
- Procedural Semantics word definitions were
- Actions executed via program segments
- MARGIE (Schank, 1975), SAM (Cullinford, 1978)
- PAM, Talespin (Meehan,1976), POLITICS
(Carbonell,79) - Conceptual Dependency Theory, Scripts, Plans,
Goals
16History 1980 - 1990
- THEMES
- General Processing Methods Weak Methods
- KODIAC (Wilenskey, 1986)
- Knowledge Representation Language (KRL)
- Massively Parallel Parsing (Waltz Pollac, 1985)
- Marker Passing
- WIMP (Charniak, 1986), FAUSTUS (Norvig, 1987)
- Metaphor and Analogy
- (Carbonell, 1981 Zernik, 1987)
- Discourse Modelling (Grosz, Sinder, Hobbs, 1985)
- Role of structure of a conversation, focus,
speech acts.
17History 1990 - Present
- Statistical and corpus-based methods are dominant
- part of speech tagging, parsing, word sense
disambiguation, etc.
- Emphasis on Very Large Corpora,
- Large Text Distributed DB over the Internet
- Automated Knowledge Acquisition
- Software Agents, Bots roam over the Internet
- Deep Semantics used only in limited domains
- Speech Recognition and Speech Generation widely
used
- Some ares starting to break through commercially
- Even Text Machine Translation
18Layers of NLP
Dialog Management
Speech Acts
Pragmatics
Semantic Selection
Semantic Analysis
Syntactic Selection
Syntax Analysis
Lexical Realization
Morphology
Morphological Realiz.
Speech Generation
Phonetics
19Phonetics
- Requires knowledge of phonological patterns
- Im enormously proud.
- I mean to make you proud.
- Phonetics
- sound signal lt-gt phonemes
- Speech recognition or character recognition
decomposition into words, segmentation of words
into appropriate phones or letters - Is a speech signal
- 1) I scream is delicious
- 2) Ice cream is delicious
20Phonetic Ambiguity
- Is a speech signal
- 1) I scream is delicious
- 2) Ice cream is delicious
- Linguistic theories
- (2) is grammatical, (1) isnt
- Stats
- Ice cream is occurs much more often than I
scream is
21Morphology
- KODIAC (Wilenskey, 1986)
- Knowledge Representation Language (KRL)
- Massively Parallel Parsing (Waltz Pollac, 1985)
- Marker Passing
- WIMP (Charniak, 1986), FAUSTUS (Norvig, 1987)
- Metaphor and Analogy
- (Carbonell, 1981 Zernik, 1987)
- Discourse Modelling (Grosz, Sinder, Hobbs, 1985)
- Role of structure of a conversation, focus,
22Morphological analysis
- Inflection
- duck s Nduck plural s
- duck s Vduck
3rd person s - spelling changes
- Drop Dropping
- Hide Hiding
- Derivation
- Kind Kindness
23Lexicons
- Lexicons are databases of word information.
- Dictionary of NLP system
- A good lexicon is critical to performance
- the system with the bigger lexicon always wins
- An NLP system needs to know
- Spelling
- Category and subcategory
- Inflections (plurals, past, etc)
- What word corresponds to in DB or KB
- Statistical information
- maybe pronunciation
- probably not derivation
24Example Person
- Person
- Category noun
- Subcategory count noun
- Inflections plural people (special)
- Database correspondence person class.
- Semantics concept HUMAN
- Statistical Frequency .03
25Syntax
Syntax is
How words can be put together to form correct
sentences
- Determine what structural role each word plays
- Attachment
- What Phrases are subparts of other Phrases
- Correct
- I saw the man on the hill with a telescope
- Incorrect
- telescope hill the with on I the a man
26Syntactic Analysis
Association of string with phrase level
constituents. Readying string for semantic
interpretation S NP
VP I V
NP
watched det N the
terrapin
27Syntactic Ambiguity
I made her duck
- I made the duckling belonging to her
- I forced her to lower her head
- I created the duck she owns
- By magic, I changed her into a duck
The computer parser because of lack of
Knowledge sees many more Syntactic trees than
humans.
28Structural Ambiguity
Syntactic disambiguation S
S NP VP
NP VP I
V NP NP I
V NP made her duck
made det N
her duck
29Semantics
Definition
- Concerns what words mean and how these meaning
- Combine in sentences to form sentence meanings.
- This is the study of context-independent meaning
the meaning the sentence has regardless of the
Context in which it is used.
A rose is a rose is a rose A dog is a dog
- The ability of a word to Refer to a class of
objects in the world.
30Compositional Semantics
Proposition Experiencer
Predicate Be ( perc) I ( 1st pers,
sg) pred patient
saw
the Terrapin
31Pragmatics
Definition
Concerns how the words and the sentence refer to
objects and concepts in the context of the
situation.
- Resolve Pronoun Reference
- John saw Bill walking with a brown bag, he was
sure he was drunk again.
- Resolve Time and Location references today,
tomorrow, next week, in Tel-Aviv - Do you have the time?
- Resolve Dixies this, that, here, there, yonder
- Put this here, please
- I am working in this station
32Pragmatics - Example
- Could you turn in your assignments now. (
command) - Could you finish the homework? (question,
command) - I couldnt decide how to catch the crook.
- Then I decided to spy on the crook with
binoculars. - the crook with binoculars.
- the crook with binoculars
- To my surprise, I found out he had them too.
Then I knew - to just follow the crook with binoculars.
33Speech Acts
Definition
Concerns what types of effect the speaker wants
to make on the hearer ( above and beyond
semantics)
- The wife says
- The sink is full !
- Command
- Go away from here
- Question Request for information
- Do you have a watch?
- Request for action
- Can you bring me a cup of coffee?
- Inform
- The Train leaves in 15 minutes
34Discourse Structure
Definition
Concerns what order and relations are acceptable
in different types of multiple people discourse.
Taking turns, coherence, who can interrupt, order
of arguments
35Natural Language Generation
- Semantic Structure Selection
36Syntactic Selection
- KODIAC (Wilenskey, 1986)
- Knowledge Representation Language (KRL)
- Massively Parallel Parsing (Waltz Pollac, 1985)
- Marker Passing
- WIMP (Charniak, 1986), FAUSTUS (Norvig, 1987)
- Metaphor and Analogy
- (Carbonell, 1981 Zernik, 1987)
- Discourse Modelling (Grosz, Sinder, Hobbs, 1985)
- Role of structure of a conversation, focus,
37Lexical Realization
- Selection of words to express the semantics in
context
- Selection of words Morphology
- Selection of Reference words
- Morning star vers. Evening star
38Speech Generation
- Decide on Intonation and Emphasis
39Ambiguity
Many sentences are ambiguous at all levels
Computer sees ambiguities we dont
- Syntactic Ambiguity
- Time flies like an arrow
- I made her duck
- Semantic Ambiguity
- The spirit is willing but the body is weak
- Pragmatic Ambiguity
- You cant do that, it is very dangerous
40Research
- We will be discussing
- State-of-the-art systems which dont work very
well - Theories and models which are the best we can do
but have many problems - NLP is a research area!
41Language and Intelligence
- Turing test
- 3 participant game a computer, a human, and a
human judge - Judge asks teletyped questions of the computer
and human. - Computers job is to act like a human,
- Humans job convince judge that hes not the
machine. - Computer is judged intelligent if it can fool
the judge. - Judgment of intelligence linked to appropriate
- answers to questions from the system.
42Resources for Natural Language Processing
- Dictionary
- Morphology and Spelling Rules
- Grammar Rules
- Semantic Interpretation Rules
- Discourse Interpretation
- Natural Language processing involves both
learning or - fashioning the rules for each component,
- embedding the rules in the relevant automaton,
and - using the automaton to efficiently process the
input .