Overview of Natural Language Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Overview of Natural Language Processing

Description:

Works because language used in reports is stylized and regular. ... I ate spaghetti with meatballs. ( ingredient of spaghetti) I ate spaghetti with salad. ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 55
Provided by: cse92
Learn more at: http://cse.unl.edu
Category:

less

Transcript and Presenter's Notes

Title: Overview of Natural Language Processing


1
Overview of Natural Language Processing
  • Advanced AI CSCE 976
  • Amy Davis
  • amydavis_at_cse.unl.edu

2
Outline
  • Common Applications
  • Dealing with Sentences
  • (and words)
  • Dealing with Discourses

3
Practical Applications
  • Machine translation
  • Database access
  • Information Retrieval
  • Query-answering
  • Text categorization
  • Summarization
  • Data extraction

4
Machine Translation
  • Proposals for mechanical translators of languages
    pre-date the invention of the digital computer
  • First was a dictionary look-up system at Birkbeck
    College, London 1948
  • American interest started by Warren Weaver, a
    code breaker in WW2, was popular during cold war,
    but alas, rather unsuccessful

5
Machine Translation Working Systems
  • Taum-Meteo Translates Weather reports from
    English to French in Montreal. Works because
    language used in reports is stylized and regular.
  • Xerox Systram Translates Xerox manuals from
    English to all languages that Xerox deals in.
    Utilized pre-edited texts

6
Machine Translation Difficulties
  • Need a big Dictionary with Grammar rules in both
    (or all) languages, large start-up cost
  • Direct word translation often ambiguous
  • Lexicons (words that arent in a dictionary, but
    made of common parts)
  • (ex. Lebensversicherungsgesellschaftsangestellter
    ,
  • a life insurance company employee)
  • Ambiguity even in primary language
  • Elements of language are different

7
Machine Translation Difficulties
  • Essentially requires a good understanding of the
    text, and finding a corresponding text in the
    target language that does a good job of
    describing the same (or similar) situation.
  • Requires computer to understand.

8
Machine Translation Successes
  • Limited Domain
  • allows for limited vocabulary, grammar, easier
    disambiguation and understanding
  • Journal article Church, K.W. and E.H. Hovy.
    1993. Good Applications for Crummy Machine
    Translation. Machine Translation 8 (239--258)
  • MAT
  • machine-aided translation, where a machine
    starts, and a real person proof-reads for
    clarity. (Sometimes doesnt require bi-lingual
    people).

9
Example of MAT (page 692)
  • The extension of the coverage of the health
    services to the underserved or not served
    population of the countries of the region was the
    central goal of the Ten-Year Plan and probably
    that of greater scope and transcendence. Almost
    all the countries formulated the purpose of
    extending the coverage although could be
    appreciated a diversity of approaches for its
    attack, which is understandable in view of the
    different national policies that had acted in
    the configuration of the health systems of each
    one of the countries. (Translated by SPANAM
    Vasconcellos and Leon, 1985).

10
Database Access
  • The first major success for NLP was in the area
    of database access
  • Natural Language Interfaces to Databases were
    developed to save mainframe operators the work of
    accessing data through complicated programs.

11
Database AccessWorking Systems
  • LUNAR (by Woods for NASA, 1973)
  • allowed queries of chemical analysis data of
    lunar rock and soil samples brought back by
    Apollo missions
  • CHAT (Pereira, 1983)
  • allows queries of a geographical database

12
Database Access Difficulties
  • Limited Vocabulary
  • User must phrase question correctly system
    doesnt understand everything
  • Context detection
  • allowing questions that implicitly refer to
    previous questions
  • Becomes Text Interpretation question

13
Database Access Conclusion
  • Worked well for a time
  • Now more information is stored in text, not in
    databases (ex. email, news, articles, books,
    encyclopedias, web pages)
  • The problem now is not to find information, its
    to sort through the information thats available.

14
Information Retrieval
  • Now the main focus of Natural Language Processing
  • There are four types
  • Query answering
  • Text categorization
  • Text summary
  • Data extraction

15
Information Retrieval The task
  • Choose from some set of documents ones that are
    related to my query
  • Ex. Internet search

16
Information RetrievalMethods
  • Boolean (Natural AND Language) OR
    (Computational AND Linguistics)
  • too confusing for most users
  • Vector Assign different weights to each term in
    query. Rank documents by distance from query and
    report ones that are close.

17
Information Retrieval
  • Mostly implemented using simple statistical
    models on the words only
  • More advanced NLP techniques have not yielded
    significantly better results
  • Information in a text is mostly in its words

18
Text Categorization
  • Once upon a time this was done by humans
  • Computers are much better at it (and more
    consistent)
  • Best success for NLP so far (90 accuracy)
  • Much faster and more consistent than humans.
    Automated systems now perform most of the work.
  • NLP works better for TC than IR because
    categories are fixed.

19
Text Summarization
  • Main task understand main meaning and describe
    in a shorter way
  • Common Systems Microsoft
  • How
  • Sentence/paragraph extraction (find the most
    important sentences/paragraphs and string them
    together for a summary)
  • Statistical methods are more common

20
Data extraction
  • Goal Derive from text assertions to store in a
    database
  • Example SCISOR, Jacobs and Rau 1990
  • Summarizes Dow Jones News stories, and adds
    information to a database.

21
NLP Goals
  • Have (or feign) some understanding based on
    communication with Natural Language
  • In order to receive and send information in ways
    easily understandable by human users

22
How to get there
  • NLP applications are all similar in that they
    require some level of understanding.
  • Understand the query, understand the document,
    understand the data being communicated

23
Understanding Sentences Overview
  • Parsing and Grammar
  • How is a sentence composed?
  • Lexicons
  • How is a word composed?
  • Ambiguity

24
Parsing Requirements
  • Requires a defined Grammar
  • Requires a big dictionary (10K words)
  • Requires that sentences follow the grammar
    defined
  • Requires ability to deal with words not in
    dictionary

25
Parsing (from Section 22.4)
  • Goal
  • Understand a single sentence by syntax analysis
  • Methods
  • Bottom-up
  • Top-down
  • More efficient (and complicated) algorithm given
    in 23.2

26
A Parsing Example
S ? NP VP NP ? Article N Proper VP ? Verb NP N
? home boy store Proper ? Betty John Verb ?
gogivesee Article ? the an a
  • Rules

The Sentence The boy went home.
27
A Parsing Example The answer
28
Lexicons
  • The current trend in parsing
  • Goal figure out this word
  • Method
  • Tokenize with morphological analysis
  • Inflectional, derivational, compound
  • Dictionary lookup on each token
  • Error recovery (spelling correction,
    domain-dependent cues)

29
Lexicons in Practice
  • 10,000 100,000 root word forms
  • Expensive to develop, not readily shared
  • Wordnet (George Miller, Princeton)
  • clarity.princeton.edu

30
Ambiguity
  • More extensive Language ? more Ambiguity
  • Disambiguation
  • task of finding correct interpretation
  • Evidence
  • Syntactic
  • Lexical
  • Semantic
  • Metonymy
  • Metaphor

31
Disambiguation Tools
  • Syntax
  • modifiers (prepositions, adverbs) usually attach
    to nearest possible place
  • Lexical
  • probability of a word having a particular
    meaning, or being used in a particular way
  • Semantic
  • determine most likely meaning from context

32
Semantic Disambiguation Example with
  • Sentence Relation
  • I ate spaghetti with meatballs. (ingredient of
    spaghetti)
  • I ate spaghetti with salad. (side dish of
    spaghetti)
  • I ate spaghetti with abandon. (manner of
    eating)
  • I ate spaghetti with a fork. (instrument of
    eating)
  • I ate spaghetti with a friend. (accompanier of
    eating)
  • Disambiguation is probabilistic!

33
More Disambiguation Tools
  • Metonymy
  • Chrysler announced doesnt mean companies can
    talk.
  • Metaphor
  • more is up confidence has fallen, prices have
    sky-rocketed.

34
Beyond Sentences Discourse understanding
  • Sentences are nice but
  • Most communication takes place in the form of
    multiple sentences (discourses)
  • Theres lots more to the world than parsing and
    grammar!

35
Discourse Understanding Goals
  • Correctly interpret sequences of sentences
  • Increase knowledge about world from discourse
    (learn)
  • Dependent on facts as well as new knowledge
    gained from discourse.

36
Discourse Understanding an example
  • John went to a fancy restaurant.
  • He was pleased and gave the waiter a big tip.
  • He spent 50.
  • What is a proper understanding of this discourse?
  • What is needed to have a proper understanding of
    this discourse?

37
General world knowledge
  • Restaurants serve meals, so a reason for going to
    a restaurant is to eat.
  • Fancy restaurants serve fancy meals, 50 is a
    typical price for a fancy meal. Paying and
    leaving a tip is customary after eating meals at
    restaurants.
  • Restaurants employ waiters.

38
General Structure of Discourse
  • John went to a fancy restaurant. He was
    pleased
  • Describe some steps of a plan for a character
  • Leave out steps that can be easily inferred from
    other steps.
  • From first sentence John is in the
    eat-at-restaurant plan. Inference eat-meal step
    probably occurred even if it wasnt mentioned.

39
Syntax and Semantics
  • ...gave the waiter a big tip.
  • the used for objects that have been mentioned
    before
  • OR
  • Have been implicitly alluded to in this case, by
    the eat-at-restaurant plan

40
Specific knowledge about situation
  • He spent 50
  • He is John.
  • Recipients of the 50 are the restaurant and the
    waiter.

41
Structure of coherent discourse
  • Discourses comprised of segments
  • Relations between segments
  • (more in Mann and Thompson, 1983)
  • (coherence relation)
  • Enablement
  • Evaluation
  • Causal
  • Elaboration
  • Explanation

42
Speaker Goals (Hobbs 1990)
  • The Speaker does 4 things
  • 1) wants to convey a message
  • 2) has a motivation or goal
  • 3) wants to make it easy for the hearer to
    understand.
  • 4) links new information to what hearer knows.

43
A Theory of Attention
  • Grosz and Sidner, 1986
  • Speaker or hearers attention is focused
  • Focus follows a stack model
  • Explains why order is important.

44
Order is important
  • Whats the difference?
  • I visited Paris. I visited Paris.
  • I bought you some Then I flew home.
  • expensive cologne.
  • Then I flew home. I went to Kmart.
  • I went to Kmart. I bought you some expensive
    cologne.
  • I bought some underwear. I bought some
    underwear.

45
Summary
  • NLP have practical applications, but none do a
    great job in an open-ended domain
  • Sentences are understood through grammar, parsing
    and lexicons
  • Choosing a good interpretation of a sentence
    requires evidence from many sources
  • Most interesting NLP comes in connected discourse
    rather than in isolated sentences

46
Current NLP Crowd
  • Originally, mostly mathematicians.
  • Now Computer Scientists (computational linguists
    linguists, stasticians, computer science folk).
  • Big names are Perrault, Hobbs, Pereira, Grosz and
    Charniak

47
Current NLP conferences
  • Association for Computational Linguistics
  • Coling
  • EACL (Europe Association for Computational
    Linguistics)

48
USA Schools with NLP Grad.
Johns Hopkins University Massachusetts at
Amherst, University of Massachusetts Institute of
Technology Michigan, University of New Mexico
State University New York University Ohio State
University Pennsylvania, University of Rochester,
University of Southern California, University
of Stanford University SUNY, Buffalo Utah,
University of Wisconsin - Milwaukee, University
of Yale University
  • Brown University
  • Buffalo, SUNY at
  • California at Berkeley, University of
  • California at Los Angeles, University of
  • Carnegie-Mellon University
  • Columbia University
  • Cornell University
  • Delaware, University of
  • Duke University
  • Georgetown University
  • Georgia, University of
  • Georgia Institute of Technology
  • Harvard University
  • Indiana University
  • Information Sciences Institute (ISI) at the
    University of Southern California
  • Johns Hopkins University

Massachusetts at Amherst, University
of Massachusetts Institute of Technology Michigan,
University of New Mexico State University New
York University Ohio State University Pennsylvania
, University of Rochester, University of Southern
California, University of Stanford
University Utah, University of Wisconsin -
Milwaukee, University of Yale University
49
Current NLP Journals
  • Computational Linguistics
  • Journal of Natural Language Engineering (JLNE)
  • Machine Translation
  • Natural Language and Linguistic Theory

50
Industrial NLP Research Centers
  • ATT Labs - Research
  • BBN Systems and Technologies Corporation
  • DFKI (German research center for AI)
  • General Electric RD
  • IRST, Italy
  • IBM T.J. Watson Research, NY
  • Lucent Technologies Bell Labs, Murray Hill, NJ
  • Microsoft Research, Redmond, WA
  • MITRE
  • NEC Corporation
  • SRI International, Menlo Park, CA
  • SRI International, Cambridge, UK
  • Xerox, Palo Alto, CA
  • XRCE, Grenoble, France

51
Speaker Goals (Hobbs 1990)
  • The Speaker does 4 things
  • 1) wants to convey a message
  • 2) has a motivation or goal
  • 3) wants to make it easy for the hearer to
    understand.
  • 4) links new information to what hearer knows.

52
Discourse comprehension
  • The procedure is actually quite simple. First
    you arrange things into different groups. Of
    course, one pile may be sufficient depending on
    how much there is to do. If you have to go
    somewhere else due to lack of facilities that is
    the next step, otherwise you are pretty well set.
    It is important not to overdo things. That is,
    it is better to do too few things at once than
    too many. In the short run this may not seem
    important but complications can easily arise. A
    mistake is expensive as well. At first the whole
    procedure will seem complicated. Soon however,
    it will become just another facet of life. It is
    difficult to foresee any end to the necessity of
    this task in the immediate future, but then one
    can never tell. After the procedure is completed
    one arranges the material into different groups
    again. Then they can get put into their
    appropriate places. Eventually they will be used
    once more and the whole cycle will have to be
    repeated. However, this is a part of life.

53
Now What do you remember?
  • What are the four steps mentioned?
  • What step is left out?
  • What is the material mentioned?
  • What kind of mistake would be expensive?
  • Is it better to do too few or too many?
  • Why?

54
Oh Yeah --
  • The title of the discourse is
  • Washing Clothes
  • Now, re-read, and see if the questions are
    easier. What does this say about discourse
    comprehension?
Write a Comment
User Comments (0)
About PowerShow.com