CSC 594 Topics in AI - PowerPoint PPT Presentation

About This Presentation
Title:

CSC 594 Topics in AI

Description:

Parsing Algorithms Top-down Parsing -- (top-down) derivation Bottom-up Parsing Chart Parsing Earley s Algorithm most efficient, O(n3) Left-corner Parsing ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 43
Provided by: nor4152
Category:
Tags: csc | algorithm | earley | topics

less

Transcript and Presenter's Notes

Title: CSC 594 Topics in AI


1
CSC 594 Topics in AI Applied Natural Language
Processing
  • Fall 2009/2010
  • 1. Introduction

2
What is NLP?
  • Natural Language Processing (NLP) is a field in
    Artificial Intelligence (AI) devoted to creating
    computers that use natural language as input
    and/or output.

3
NLP involves other disciplines..
  • Linguistics
  • NLP is also called Computational Linguistics
  • Psychology
  • Mathematics and Statistics
  • Information Theory
  • Computer Science

4
Machines that Can Speak
  • HAL 9000 in 2001 A Space Odyssey

5
Machines that Can Speak (cont.)
  • C3PO in Star Wars

6
But Still a Sci-Fi
  • However, as 2001 approached it became clear that
    2001's predictions in computer technology were
    far fetched. Natural language, lip reading,
    planning and plain common sense in computers were
    still the stuff of science fiction.HAL 9000 -
    Wikipedia

7
NLP is AI-complete
  • The most difficult problems in AI manifest
    themselves in human language phenomena.
  • Use of language is the touchstone of intelligent
    behavior.

8
Test for Intelligence Turing Test
  • Alan Turing (1950) proposed a test of a machine's
    capability to perform human-like conversation.
  • A human judge engages in a natural language
    conversation with two other parties, one a human
    and the other a machine if the judge cannot
    reliably tell which is which, then the machine is
    said to pass the test.

9
Early Conversational Programs
  • ELIZA (by Joseph Weizenbaum), 1966
  • A psychotherapist
  • No real understanding simple pattern-matching to
    respond to user input -gt canned responses

(my ?x depresses me) (why does your ?x depress
you) 5 (life ?x) (why do you say it ?x) 3 (I
could ?x) (you could ?x) 2 (because ?x) (that is
a good reason) 3 (?x) (tell me more) 0
10
gtgt Hello. How do you do. Please state your
problem. gtgt Men are all alike. In what way ? gtgt
They're alwas bugging us about something or
other. I'm not sure I understand you fully. gtgt
They're always bugging us about something or
other. Can you think of a specific example ? gtgt
Well, my boyfriend made me come here. Your
boyfriend made you come here ? gtgt He says I'm
depressed much of the time. I am sorry to hear
that you are depressed.
11
Recent Conversational ProgramsLoebner Prize
Chatterbots/Chatbots (1)
  • In 1990, Hugh Loebner started an annual Turing
    Test competition.
  • Conversational Programs are nowadays called
    Chatterbots (or Chatbots).
  • 100,000 to the first bot that judges cannot
    distinguish from a real human in a Turing test
    that includes text, visual, and auditory input.
  • The Prize dissolves once the 100,000 prize is
    won.

12
Loebner Prize Chatterbots/Chatbots (2)
  • Nobody has won the prize yet.

13
Real-World Applications of NLP
  • NLP can be stand-along applications or components
    embedded in other systems.
  • Major NLP applications include
  • Machine translation
  • Question answering
  • Summarization
  • Conversational agents

14
Commercial World
  • Lots of exciting stuff going on

Powerset
Source Jurafsky Martin Speech and Language
Processing
15
1. Machine Translation (MT)
  • One of the very earliest pursuits in computer
    science (after WWII).
  • Basic approaches
  • Inter-lingual (rule-based)
  • Direct translation (corpus-based) ? morepopular
    these days
  • Example Google Translate
  • MT engine (based on SYSTRAN system developed in
    EC).

16
2. Question Answering
  • Finds an answer (not a document) to a question
    typed in as a natural language sentence (not
    keywords).
  • Most systems can only answer simple, trivial
    pursuit type questions.
  • Example Ask.com
  • Some search engines perform limited, phrase-based
    QA, e.g. Google

17
3. Text Summarization
  • Create a summary of a text or texts.
  • Many difficult problems, including
  • Paraphrases
  • Anaphora (e.g.it, they)

18
4. Analyzing Web Documents
  • Recently there have been many NLP applications
    which analyze (not just retrieve) weblogs,
    discussion forums, message boards, user groups,
    and other forms of user generated media
  • Product marketing information
  • Political opinion tracking
  • Social network analysis
  • Buzz analysis (whats hot, what topics are people
    talking about right now).

Source Jurafsky Martin Speech and Language
Processing
19
5. NLP in IR
  • Query expansion
  • Add synonyms, related words to the query terms to
    improve search results.
  • Example Google AdWords tool
  • NOTE Stemming is NOT a NLP technique!!!

20
Source Marti Hearst, i256, at UC Berkeley
21
NLP Tasks
  • Those NLP applications require several NLP
    analyses
  • Word tokenization
  • Sentence boundary detection
  • Part-of-speech (POS) tagging
  • to identify the part-of-speech (e.g. noun, verb)
    of each word
  • Named Entity (NE) recognition
  • to identify proper nouns (e.g. names of person,
    location, organization domain terminologies)
  • Parsing
  • to identify the syntactic structure of a sentence
  • Semantic analysis
  • to derive the meaning of a sentence

22
Different Levels of Language Analysis
  • Phonology
  • Speech audio signal to phonemes
  • Morphology
  • Inflection (e.g. I, my, me eat, eats,
    ate, eaten)
  • Derivation (e.g. teach, teacher, nominate,
    nominee)
  • Syntax
  • Part-of-speech (noun, verb, adjective,
    preposition, etc.)
  • Phrase structure (e.g. noun phrase, verb phrase)
  • Semantics
  • Meaning of a word (e.g. book as a bound volume
    or an accounting ledger) or a sentence
  • Discourse
  • Meaning and inter-relation between sentences

23
Why is NLP so hard..?
  • Understanding natural languages is hard because
    of inherent ambiguity
  • Engineering NLP systems is also hard because
    of
  • Huge amount of data resources needed (e.g.
    grammar, dictionary, documents to extract
    statistics from)
  • Computational complexity (intractable) of
    analyzing a sentence

24
Ambiguity (1)
  • Get the cat with the gloves.

Source Marti Hearst, i256, at UC Berkeley
25
Ambiguity (2)
  • Find at least 5 meanings of this sentence
    I made her duck
  • I cooked waterfowl for her benefit (to eat)
  • I cooked waterfowl belonging to her
  • I created the (plaster?) duck she owns
  • I caused her to quickly lower her head or body
  • I waved my magic wand and turned her into
    undifferentiated waterfowl

Source Jurafsky Martin Speech and Language
Processing
26
Ambiguity (3)
  • Some ambiguous headlines
  • Juvenile Court to Try Shooting Defendant
  • Teacher Strikes Idle Kids
  • Kids Make Nutritious Snacks
  • Bush Wins on Budget, but More Lies Ahead
  • Hospitals are Sued by 7 Foot Doctors

Source Marti Hearst, i256, at UC Berkeley
27
Ambiguity is Pervasive
  • Phonetics
  • I mate or duck
  • Im eight or duck
  • Eye maid her duck
  • Aye mate, her duck
  • I maid her duck
  • Im aid her duck
  • I mate her duck
  • Im ate her duck
  • Im ate or duck
  • I mate or duck

Sound likeI made her duck
Source Jurafsky Martin Speech and Language
Processing
28
  • Lexical category (part-of-speech)
  • duck as a noun or a verb
  • Lexical Semantics (word meaning)
  • duck as an animal or a plaster duck statue
  • Compound nouns
  • e.g. dog food, Intelligent design scores
  • Syntactic ambiguity
  • But semantics can sometimes help disambiguate

29
Dealing with Ambiguity
  • Four possible approaches
  • Formal approaches -- Tightly coupled interaction
    among processing levels knowledge from other
    levels can help decide among choices at ambiguous
    levels.
  • Pipeline processing that ignores ambiguity as it
    occurs and hopes that other levels can eliminate
    incorrect structures.
  • Probabilistic approaches based on making the most
    likely choices
  • Dont do anything, maybe it wont matter

Source Jurafsky Martin Speech and Language
Processing
30
The Bottom Line
  • Complete NL Understanding (thus general
    intelligence) is impossible.
  • But we can make incremental progress.
  • Also we have made successes in limited domains.
  • But NLP is costly Lots of work and resources
    are needed, but the amount of return is sometimes
    not worth it.

31
Statistical Approaches to NLP
  • Get large text collections (corpora)
  • Compute statistics over the words in those
    collections
  • Surprising results
  • Getting more data is better than fine-tuning
    algorithms!

Banko Brill 01
Source Marti Hearst, i256, at UC Berkeley
32
Sentence Analysis
John ate the cake
Syntactic structure
Semantic structure
(ACTION ingest (ACTOR John-1) (PATIENT food))
33
Syntactic Parsing
  • The process of deriving the phrase structure of a
    sentence is called parsing.
  • The structure (often represented by a
    Context-Free parse tree) is based on the grammar.

34
Parsing Algorithms
  • Top-down Parsing -- (top-down) derivation
  • Bottom-up Parsing
  • Chart Parsing
  • Earleys Algorithm most efficient, O(n3)
  • Left-corner Parsing optimization of Earleys
  • and lots more

35
(Bottom-up) Chart Parsing
John ate the cake 0 1 2 3 4
Grammar
(11) reduce
(10) reduce
---
(9) reduce
(5) shift 2
---
(2) shift 2
(7) shift 2
(4) shift 2
---
(1) shift 1 John
(3) shift 1 ate
(6) shift 1 the
(8) shift 1 cake
0
1
2
3
4
36
Earleys Algorithm
John ate the cake 0 1 2 3 4
Grammar
(12) completor
(1) predictor
(11) completor
(2) scanner John
(3) predictor
(4) predictor
(10) completor
(6) completor
(7) predictor
(5) scanner ate
(9) scanner cake
(8) scanner the
37
Demo using my CF parser
38
Probabilistic Parsing
  • For ambiguous sentences, wed like to know which
    parse tree is more likely than others.
  • So we must assign probability to each parse tree
    but how?
  • A probability of a parse tree t is

    where r is a rule used in t.
  • and p(r) is obtained from a (annotated) corpus.

39
Partial Parsing
  • Parsing fails when the coverage of the grammar is
    not complete but its almost impossible to
    write out all legal syntax (without accepting
    ungrammatical sentences).
  • Wed like to at least get pieces even when full
    parsing fails.
  • Why not abandon full parsing and aim for partial
    parsing from the start

40
Semantic Analysis (1)
  • Derive the meaning of a sentence.
  • Often applied on the result of syntactic
    analysis.
  • John ate the cake.
  • NP V NP
  • ((action INGEST) syntactic verb
    (actor JOHN-01) syntactic subj
    (object FOOD)) syntactic obj
  • To do semantic analysis, we need a (semantic)
    dictionary (e.g. WordNet, http//www.cogsci.prince
    ton.edu/wn/).

41
Semantic Analysis (2)
  • Semantics is a double-edged sword
  • Can resolve syntactic ambiguity
  • I saw a man on the hill with a telescope
  • I saw a man on the hill with a hat
  • But introduces semantic ambiguity
  • She walked towards the bank
  • But in human sentence processing, we seem to
    resolve both types of ambiguities simultaneously
    (and in linear time as well)

42
Demo using my Unification parser
Write a Comment
User Comments (0)
About PowerShow.com