Natural Language Processing - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Natural Language Processing

Description:

Dave Bowman: Open the pod bay doors, HAL. HAL: I'm sorry Dave, I'm afraid I ... The sentence 'Colorless green ideas sleep furiously' is grammatically correct ... – PowerPoint PPT presentation

Number of Views:1826
Avg rating:3.0/5.0
Slides: 21
Provided by: Emzy
Category:

less

Transcript and Presenter's Notes

Title: Natural Language Processing


1
Natural Language Processing
  • Artificial Intelligence
  • Seminar Project
  • Cristea Emilia, gr. 922

2
Introduction
  • Dave Bowman Open the pod bay doors, HALHAL
    Im sorry Dave, Im afraid I cant do
    that.(Stanley Kubrick and Arthur C. Clarke,
    screenplay of 2001 A Space Odyssey)
  • The HAL 9000 computer from Stanley Kubricks film
    2001 A Space Odyssey is one of the most
    recognizable characters in 20th century cinema.
    HAL is an artificial agent capable of decision
    making, speaking and understanding English (and,
    at a crucial moment in the plot, even reading
    lips). Today it is clear that Arthur C. Clarke
    was a little too optimistic in predicting when
    such an entity would be available. But just far
    off was he?
  • Minimally, such an agent would have to be capable
    of interacting with humans via language, which
    includes
  • understanding humans through speech recognition
    and natural language understanding
  • communicating with humans through speech
    synthesis and natural language generation
  • It would also need to do information retrieval,
    information extraction and inference (drawing
    conclusions based on known facts).
  • Although these problems are far from being
    completely solved, much of the needed language
    related technology is currently being developed
    (some already available commercially). Solving
    this kind of problems is the main concern of the
    fields known as Natural Language Processing,
    Computational Linguistics and Speech Recognition
    and Synthesis.

3
What is NLP?
  • Natural language processing (NLP) is a subfield
    of artificial intelligence and computational
    linguistics. It studies the problems of automated
    generation and understanding of natural human
    languages.
  • Natural-language-generation systems convert
    information from computer databases into
    normal-sounding human language.
  • Natural-language-understanding systems convert
    samples of human language into more formal
    representations that are easier for computer
    programs to manipulate.

Computational linguistics is an interdisciplinary
field dealing with the statistical and/or
rule-based modeling of natural language from a
computational perspective. This modeling is not
limited to any particular field of linguistics.
4
  • Traditionally, computational linguistics was
    usually performed by computer scientists.
  • Recent research has shown that human language is
    much more complex than previously thought, so
    computational linguists often work as members of
    interdisciplinary teams.
  • In general computational linguistics draws upon
    the involvement of linguists, computer
    scientists, experts in artificial intelligence,
    cognitive psychologists, mathematicians, and
    logicians, amongst others.

5
Applications of NLP
  • Natural Language Processing (NLP) is the use of
    computers to process written and spoken language
    for some practical, useful, purpose
  • to translate languages
  • to get information from the web on text data
    banks so as to answer questions
  • to carry on conversations with machines
  • These are only examples of major types of NLP,
    and there is also a huge range of lesser but
    interesting applications, e.g. getting a computer
    to decide if one newspaper story has been
    rewritten from another or constructing a summary
    for a certain text.
  • Language is the fabric of the web. The rapid
    growth of the Internet/WWW and the emergence of
    the information society poses exciting new
    challenges to language technology.  Although the
    new media combine text, graphics, sound and
    movies, the whole world of multimedia information
    can only be structured, indexed and navigated
    through language.
  • For browsing, navigating, filtering and
    processing the information on the web, we need
    software that can get at the contents of
    documents. Language technology for content
    management is a necessary precondition for
    turning the wealth of digital information into
    collective knowledge.

6
Examples
  • E.g. information retrieval

E.g. summarization
7
More on NLP
  • Natural Language Processing (NLP) is both a
    modern computational technology and a method of
    investigating and evaluating claims about human
    language itself.
  • NLP normally has an emphasis on the role of
    knowledge representations, that is to say the
    need for representations of our knowledge of the
    world in order to understand human language with
    computers.
  • NLP is not simply applications but the core
    technical methods and theories that the major
    tasks above divide up into, such as Machine
    Learning techniques. This last is closer to
    Artificial Intelligence, and is an essential
    component of NLP if computers are to engage in
    realistic conversations they must, like us, have
    an internal model of the humans they converse
    with.
  • NLP is Challenging
  • AI-complete To solve NLP, youd need to solve
    all of the problems in AI. Natural-language
    recognition seems to require extensive knowledge
    about the outside world and the ability to
    manipulate it.
  • Turing test Posits that engaging effectively
    in linguistic behavior is a
  • sufficient condition for having achieved
    intelligence.

8
Problems in NLP
  • Limitations In theory, natural-language
    processing is a very attractive method of
    human-computer interaction. Early systems such as
    SHRDLU, working in restricted blocks worlds
    with restricted vocabularies, worked extremely
    well, leading researchers to excessive optimism,
    which was soon lost when the systems were
    extended to more realistic situations with
    real-world ambiguity and complexity.
  • Concrete problems The sentences We gave the
    monkeys the bananas because they were hungry and
    We gave the monkeys the bananas because they
    were over-ripe have the same surface grammatical
    structure. However, the pronoun they refers to
    monkeys in one sentence and bananas in the other,
    and it is impossible to tell which without a
    knowledge of the properties of monkeys and
    bananas.
  • A string of words may be interpreted in different
    ways. For example, the strings Time flies like
    an arrow and Fruit flies like a banana may be
    interpreted in a variety of ways.
  • The sentence Colorless green ideas sleep
    furiously is grammatically correct but it is
    nonsensical. Linguist Noam Chomsky concludes that
    data-driven approaches will always suffer from a
    lack of data, and hence are doomed to failure. gt
    see Statistical NLP click to slide.

9
Major Tasks in NLP
  • Automatic summarization
  • Foreign Language Reading Aid
  • Foreign Language Writing Aid
  • Information extraction
  • Information retrieval
  • Machine translation
  • Named entity recognition
  • Natural language generation
  • Optical Character Recognition
  • Question answering
  • Speech recognition
  • Spoken dialogue system
  • Text simplification
  • Text to speech
  • Text-proofing

10
Major Obstacle in NLP
  • Ambiguity! - at all levels of analysis.
  • Phonetics and phonology
  • Concerns how words are related to the sounds
    that realize them ( "I scream" vs. "ice cream).
  • Morphology
  • Concerns how words are constructed from
    sub-word units.
  • Syntax
  • Concerns sentence structure
  • Different syntactic structure implies
    different interpretation.
  • Semantics
  • Concerns what words mean and how these
    meanings combine to form sentence meanings (e.g.
    Jack invited Mary to the Halloween ball. -gt
    dance vs. some big sphere with Halloween
    decorations?).
  • Discourse
  • Concerns how the immediately preceding
    sentences affect the interpretation of the next
    sentence.

11
Statistical NLP
  • Statistical natural-language processing uses
    stochastic, probabilistic and statistical methods
    to resolve some of the difficulties discussed
    above, especially those which arise because
    longer sentences are highly ambiguous when
    processed with realistic grammars, yielding
    thousands or millions of possible analyses.
  • Methods for disambiguation often involve the use
    of corpora and Markov models. Statistical NLP
    comprises all quantitative approaches to
    automated language processing, including
    probabilistic modeling, information theory, and
    linear algebra. The technology for statistical
    NLP comes mainly from machine learning and data
    mining, both of which are fields of artificial
    intelligence that involve learning from data.

12
Statistical NLP vs. Linguistics
  • We must not go overboard and mistakenly conclude
    that the successes of statistical NLP render
    linguistics irrelevant. The information and
    insight that linguists, psychologists, and others
    have gathered about language is invaluable in
    creating high-performance broad-domain language
    understanding systems.
  • Head-driven phrase structure grammar (HPSG)
    formalism is a way of analyzing natural language
    utterances that truly marries deep linguistic
    information with computer science mechanisms,
    such as unification and recursive data-types, for
    representing and propagating this information
    throughout the utterance's structure.
  • In sum, computational techniques and data-driven
    methods are now an integral part both of building
    systems capable of handling language in a
    domain-independent, flexible, and graceful way,
    and of improving our understanding of language
    itself.

13
Lexical semantics WordNet
  • Handcrafted database of lexical relations.
  • Three separate databases nouns verbs
    adjectives and adverbs.
  • Each database is a set of lexical entries
    (according to unique orthographic forms).
  • Set of senses associated with each entry
  • Sample entry

14
Word sense disambiguation
  • This is an NLP task with a long history but one
    which has come to prominence in recent years as a
    new, and very high level, application of
    empirical and machine learning methods in NLP.
    High levels of success have now been achieved
    with both small selections of words in a corpus
    and with the disambiguation of all content words.
    The task now has its own competition, SENSEVAL
    and has been extended to a range of languages.
  • Problem description Given a fixed set of senses
    is associated with a lexical item, determine
    which of them applies to a particular instance of
    the lexical item.
  • Two fundamental approaches

WSD occurs during semantic analysis as a
side-effect of the elimination of ill-formed
semantic representations Stand-alone
approach WSD is performed independent of, and
prior to, compositional semantic analysis Makes
minimal assumptions about what information will
be available from other NLP processes
15
Survey of WSD methods
  • In general terms, word sense disambiguation (WSD)
    involves the association of a given word in a
    text or discourse with a definition or meaning
    (sense) which is distinguishable from other
    meanings potentially attributable to that word.
    The task therefore necessarily involves two
    steps
  • (1) the determination of all the different
    senses for every word relevant (at least) to the
    text or discourse under consideration and
  • (2) a means to assign each occurrence of a word
    to the appropriate sense.
  • Much recent work on WSD relies on pre-defined
    senses for step (1), including
  • a list of senses such as those found in everyday
    dictionaries
  • a group of features, categories, or associated
    words (e.g., synonyms, as in a thesaurus)
  • an entry in a transfer dictionary which includes
    translations in another language etc.
  • The precise definition of a sense is, however, a
    matter of considerable debate within the
    community. The variety of approaches to defining
    senses has raised recent concern about the
    comparability of much WSD work.

16
Dictionary-based approaches
  • Rely on machine readable dictionaries
  • Initial implementation of this kind of approach
    is due to Michael Lesk (1986)
  • Given a word W to be disambiguated
  • Retrieve all of the sense definitions, S, for
    W from the MRD.
  • Compare each s in S to the dictionary
    definitions of all the remaining words in the
    context.
  • Select the sense s with the most overlap with
    (the definitions of) these context words.
  • Example
  • Word cone
  • Context pine cone
  • Sense definitions
  • pine 1 kind of evergreen tree with needle-shaped
    leaves
  • 2 waste away through sorrow or illness
  • cone 1 solid body which narrows to a point
  • 2 something of this shape whether solid or hollow
  • 3 fruit of certain evergreen trees
  • Accuracy of 50-70 on short samples of text from
    Pride and Prejudice.

17
Machine learning approaches
  • Machine learning methods
  • Supervised inductive learning
  • Bootstrapping
  • Unsupervised
  • Emphasis is on acquiring the knowledge needed for
    the task from data, rather than from human
    analysts.

18
WSD Applied Dialog systems
  • A dialog system is a computer system intended to
    converse with a human, with a coherent structure.
    Dialog systems have employed text, speech,
    graphics, haptics, gestures and other modes for
    communication on both the input and output
    channel.
  • Dialog systems welcome to the wonderful world of
    chat-bots
  • Jabberwacky http//www.jabberwacky.com/
  • Joan, Artificially Intelligent, speaking,
    videocentric Avatar http//www.icogno.com/
  • Alice, winner of the 2004 Loebner Prize
    competition http//www.alicebot.org/
  • Human Name a philosopherALICE Hans Moravec.
    -).Human Who is Hans MoravecALICE He is a
    famous robotics expert at Carnegie Mellon.Human
    Does he like the movie you mentioned?ALICE
    Suppose he does.Human What was the name of the
    movie?ALICE It depends on the historical
    context.
  • Human What is the meaning of life?ALICE
    Actually, it's twenty-three, not forty-two.

19
Bibliography
  • SPEECH and LANGUAGE PROCESSING An Introduction
    to Natural Language Processing, Computational
    Linguistics, and Speech Recognition
  • By Daniel Jurafsky and James H. Martin,
    Prentice Hall, 2000.
  • COMPUTATIONAL LINGUISTICS. Models, Resources,
    Applications
  • By Igor Bolshakov and Alexander Gelbukh, Ciencia
    de la Computación, 2004
  • I'm sorry Dave, I'm afraid I can't do that
    Linguistics, Statistics, and Natural Language
    Processing circa 20011
  • By Lillian Lee, Cornell University, In Computer
    Science Reflections on the Field, Reflections
    from the Field, 2004.
  • Natural Language Processing, Cornell University
    http//www.cs.cornell.edu/
  • The Stanford Natural Language Processing Group
    http//nlp.stanford.edu/
  • Association for the Advancement of Artificial
    Intelligence (AAAI) (formerly the American
    Association for A.I.) http//www.aaai.org/home.ht
    ml
  • Natural Language Processing Research Group at the
    University of Sheffield Department of Computer
    Science http//nlp.shef.ac.uk
  • Open source NLP projects http//opennlp.sourcefor
    ge.net/projects.html
  • Wikipedia http//en.wikipedia.org/

20
  • Hasta la vista, baby.
  • - Terminator 2 Judgment Day
Write a Comment
User Comments (0)
About PowerShow.com