Introduction to Natural Language Processing - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Introduction to Natural Language Processing

Description:

response different on dining table and in chemistry lab ... This car is much better than the previous one. Car refers to new model launched ... – PowerPoint PPT presentation

Number of Views:296
Avg rating:3.0/5.0
Slides: 45
Provided by: shirazi6
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Natural Language Processing


1
Introduction toNatural Language Processing
  • Heshaam Faili
  • hfaili_at_ece.ut.ac.ir

2
Session Agenda
  • Artificial Intelligence
  • Natural Language Processing
  • History of NLP
  • Statistical NLP
  • Applications of NLP

3
AI Concepts and Definitions
  • Encompasses Many Definitions
  • AI Involves Studying Human Thought Processes
  • Representing Thought Processes on Machines

4
Artificial Intelligence
  • Behavior by a machine that, if performed by a
    human being, would be considered intelligent
  • study of how to make computers do things at
    which, at the moment, people are better (Rich
    and Knight 1991)
  • Theory of how the human mind works (Mark Fox)

5
AI Objectives
  • Make machines smarter (primary goal)
  • Understand what intelligence is
  • Make machines more useful (practical purpose)
  • (Winston and Prendergast 1984)

6
Turing Test for Intelligence
  • A computer can be considered to be smart only
    when a human interviewer, conversing with both
    an unseen human being and an unseen computer, can
    not determine which is which

7
Major AI Areas
  • Expert Systems
  • Natural Language Processing
  • Speech Understanding
  • Robotics and Sensory Systems
  • Computer Vision and Scene Recognition
  • Intelligent Computer-Aided Instruction
  • Neural Computing
  • Fuzzy Logic
  • Genetic Algorithms
  • Intelligent Software Agents

8
What is NLP ?
  • Natural Language is one of fundamental aspects of
    human behaviors.
  • One of the final aim of human-computer
    communication.
  • Provide easy interaction with computer
  • Make computer to understand texts.

9
Major Disciplines Studying Language
10
Interaction Level
  • The level that computer and human interact.
  • NL used for make Interaction level near to human.

11
Other Titles
  • The most common titles, apart from Natural
    Language Processing include
  • Automatic Language Processing
  • Computational Linguistics
  • Natural Language Understanding

12
Computational Linguistics
  • This is the application of computers to the
    scientific study of human language.
  • This definition suggests that there are
    connections with Cognitive Science, that is to
    say, the study of how humans produce and
    understand language.

13
Computational Linguistics
  • Historically, Computational Linguistics has been
    associated with work in Generative Linguistics
    and formerly included the study of formal
    languages (eg finite state automata) and
    programming languages.

14
Natural Language Understanding
  • Distinguish a particular approach to Natural
    Language Processing.
  • The people using this title tend to lay much
    emphasis on the meaning of the language being
    processed, in particular getting the computer to
    respond to the input in an apparently intelligent
    fashion.

15
Natural Language Understanding
  • At one time, those who belonged to the Natural
    Language Understanding camp avoided the use of
    any syntactic processing, but textbooks that bear
    this title now include significant sections on
    syntactic processing, which suggests that the
    edge of the title has been rather blunted. (For
    instance, see Allen (1987 part 1).

16
Motivation for NLP
  • Understand language analysis generation
  • Communication
  • Language is a window to the mind
  • Data is in linguistic form
  • Data can be in Structured (table form), Semi
    structured (XML form), Unstructured (sentence
    form).

17
Language Processing
  • Level 1 Speech sound (Phonetics Phonology)
  • Level 2 Words their forms (Morphology,
    Lexicon)
  • Level 3 Structure of sentences (Syntax,
    Parsing)
  • Level 4 Meaning of sentences (Semantics)
  • Level 5 Meaning in context for a purpose
    (Pragmatics)
  • Level 6 Connected sentence processing in a
    larger body of text (Discourse)

18
Phonetics
  • Concerns processing or identifying
  • Languages
  • Accents
  • Pauses
  • Word boundaries
  • Amplitude, Tone
  • Also includes background noise elimination
  • E.g. I got up late and I got a plate sound
    similar

19
Lexicon
  • Deals with vocabulary of words
  • Uses Dictionary, Wordnet etc.
  • Various levels of richness in dictionary, e.g.
    tense, senses, usage, etc.
  • Resources Princeton, Euro-wordnet,

20
Syntax
  • Involves parsing and understanding structure of
    grammar
  • Challenges
  • Ungrammatical sentences
  • Word order fixed, free
  • Word attachment and scope
  • e.g. Old men and women were rescued.
  • Only old men or old women too
  • Prepositional phrase attachment
  • e.g. I saw the boy with a telescope
  • With associated with boy or telescope?

21
Semantics
  • Concerned with meaning
  • Creates a structure for a sentence
  • Main verb associated with agent, object,
    instrument, etc.
  • E.g. I ate rice with spoon.

instrument
  • Challenges
  • Representation
  • Domain (straddles into pragmatics)
  • To construct meaning from individual meanings

22
Pragmatics
  • Use of the sentence in a situation
  • Understanding user's intention
  • E.g. Is that water? response different on dining
    table and in chemistry lab
  • Applications Search engine tuned to user
    preferences

23
Discourse
  • Processing of connected text
  • Co-reference Two expressions in the utterance,
    both refer to the same thing.
  • Examples
  • Pronoun to noun binding John is sleeping. He
    is lazy (He refers to John)
  • In an article George Bush, Mr. Bush, The
    President of United States, The President
  • General to specific Ferrari launched a new
    model. This car is much better than the previous
    one. Car refers to new model launched

24
NLP History (1)
  • The first recognizable NLP application was a
    dictionary look-up system developed at Birkbeck
    College, London in 1948.

25
NLP History (2)
  • NLP from 1966-1980
  • Augmented Transition Networks
  • Case Grammar
  • Semantic representations
  • Conceptual Dependency
  • Semantic network
  • Procedural semantics

26
NLP History (3)
  • The key systems were
  • LUNAR A database interface system that used
    ATNs and Woods' Procedural Semantics.
  • LIFER/LADDER One of the most impressive of NLP
    systems. It was designed as a natural language
    interface to a database of information about US
    Navy ships.
  • NLP from 1980 - 1990
  • - Grammar Formalisms
  • NLP from 1990- 2000
  • - Multilinguality and Multimodality
  • NLP from 2000-now
  • - Statistical Approaches and Practical Uses

27
Why NLP is Hard?
28
Why NLP is Hard?
29
Why NLP is Hard?
30
Why NLP is Hard?
31
Why NLP is Hard?
32
Basics of statistical NLP
  • Consider NLP problems as sequence labeling tasks
  • Amenable to machine learning (training and
    generalization)
  • In classical NLP rules are obtained from
    linguists
  • In statistical NLP probabilities are learnt
    from data

33
Noisy Channel Metaphor
  • Speech Text
  • Signal
  • - I want food.
  • - It is cold today.

Noisy
34
Data-Driven Approach
  • The issues in this approach are -
  • Corpora collection (coherent piece of text)
  • Corpora cleaning spelling, grammar, strange
    characters removal
  • Annotation
  • Named entity recognition
  • POS detection
  • Parsing
  • Meaning
  • Again The biggest challenge is Ambiguity.

35
Sequence Labeling Tasks
  • In the order of complexity -
  • Dealing words POS tagging, Named Entity
    Recognition (NER), Sense disambiguation
  • Phrases Chunking
  • Sentences Bracketing
  • Paragraphs Co-referencing

36
Examples of Levels
  • Example Sentence The dog Bill went near cat
    Jack. It bit it
  • POS Tagging
  • The dog Bill went near cat Jack. It bit it
  • DT NN NNP VBD PP NN NNP PN VBD PN
  • NER
  • ltperson-namegtBilllt/person-namegt
  • ltperson-namegtJacklt/person-namegt
  • Sense Using Wordnet
  • dog, animal synset-id
  • synset-id assigned to each sense

37
Chunking
  • (Beginning, Intermediate, End)
  • (The dog Bill) went near (the cat Jack)
  • B I E BIE BIE B I E
  • It bit it
  • BIE BIE BIE

38
Parsing
S
NP
VP
V
PP
DT
NP
went
P
NP
the
N
N
near
dog
Bill
the cat Jack
39
Higher Order Structures
  • Bracketing
  • S NP VP V PP P NP S NP VP V
    NP
  • Co-referencing
  • The dog Bill went near the cat Jack. It bit it
  • 1 2 3 4 5 6 7 8 9 10 11
  • References 2lt-9, 7lt-11, 2lt-3, 7lt-8

40
Sequence labeling task is a classification task
Task Classification
  • POS
  • NER
  • Sense
  • Chunking
  • Bracketing
  • word-gtPOS catNN, VBD ...
  • word-gtName catperson, place
  • word-gtsense-id001 ... N
  • word-gtB, I, E
  • sentence-gthas_tree, no_tree

41
Learning Algorithm
  • Knowledge Based
  • Rules
  • Decision Trees
  • Decision Lists
  • Statistical
  • Graphical Models HMM
  • Neural Networks
  • Support Vector Machines (SVM)

42
Applications
  • Machine Translation different strategies
  • Systran www.Systransoft.com
  • Google Translate.google.com
  • Question Answering
  • MIT QA system( START ) http//start.csail.mit.ed
    u/
  • Summarization
  • Information Extraction
  • Spell Checking
  • Microsoft Spell Checker
  • Call centre
  • MT for SMS

43
NLP Laboratory
  • The first aim is to establish a virtual center
    for NLP related researches
  • Defining of practical applications specially on
    Persian
  • POS TAGGER, Spell Checker, n-gram model, Machine
    translation, NER , Document Classification,
    Search Engine, Summarization,
  • Defining several research projects
  • Sharing different resources and experiences
  • Make a foundation of NLP-Suite
  • Like TINA MIT NLP-SUITE
  • Contact me for any request on NLP domain
    (hfaili_at_ece.ut.ac.ir)

44
?
Write a Comment
User Comments (0)
About PowerShow.com