Introduction to Natural Language Processing (NLP) - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Introduction to Natural Language Processing (NLP)

Description:

cashier (cashy er) more wealthy. lacerate (lace rate) speed of tatting ... urgar/civilized las/BECOME tir/CAUS ama/NEG yabil/POT ecek/FUT ler/3PL imiz/POSS ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 24
Provided by: barbara86
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Natural Language Processing (NLP)


1
Introduction to Natural Language Processing (NLP)
  • Dekang Lin
  • Department of Computing Science
  • University of Alberta
  • lindek_at_cs.ualberta.ca

2
Outline
  • What is NLP?
  • Applications
  • Challenges
  • Linguistics Issues
  • Course Overview

3
Textbook
  • Daniel Jurafsky and James H. Martin, Speech and
    Language Processing, Prentice-Hall, 2000.
  • Note errata available on website check before
    reading each chapter please

4
What is Natural Language Processing?
  • Natural Language Processing
  • Process information contained in natural language
    text.
  • Also known as Computational Linguistics (CL),
    Human Language Technology (HLT), Natural Language
    Engineering (NLE)
  • Can machines understand human language?
  • Define understand
  • Understanding is the ultimate goal. However, one
    doesnt need to fully understand to be useful.

5
Why Study NLP?
  • A hallmark of human intelligence.
  • Text is the largest repository of human knowledge
    and is growing quickly.
  • emails, news articles, web pages, IM, scientific
    articles, insurance claims, customer complaint
    letters, transcripts of phone calls, technical
    documents, government documents, patent
    portfolios, court decisions, contracts,
  • Are we reading any faster than before?

6
NLP Applications
  • Question answering
  • Who is the first Taiwanese president?
  • Text Categorization/Routing
  • e.g., customer e-mails.
  • Text Mining
  • Find everything that interacts with BRCA1.
  • Machine (Assisted) Translation
  • Language Teaching/Learning
  • Usage checking
  • Spelling correction
  • Is that just dictionary lookup?

7
(No Transcript)
8
Challenges in NLP Ambiguity
  • Words or phrases can often be understood in
    multiple ways.
  • Teacher Strikes Idle Kids
  • Killer Sentenced to Die for Second Time in 10
    Years
  • They denied the petition for his release that was
    signed by over 10,000 people.
  • child abuse expert/child computer expert
  • Who does Mary love? (three-way ambiguous)

9
Probabilistic/Statistical Resolution of
Ambiguities
  • When there are ambiguities, choose the
    interpretation with the highest probability.
  • Example how many times peoples say
  • Mary loves
  • the Mary love
  • Which interpretation has the highest probability?

10
Challenges in NLP Variations
  • Syntactic Variations
  • I was surprised that Kim lost
  • It surprised me that Kim lost
  • That Kim lost surprised me.
  • The same meaning can be expressed in different
    ways
  • Who wrote The Language Instinct?
  • Steven Pinker, a MIT professor and author of The
    Language Instinct,

11
Subareas of Linguistics
  • Morphology
  • structures and patterns in words
  • analyzes how words are formed from minimal units
    of meaning, or morphemes, e.g., dogs dogs.
  • Syntax
  • structures and patterns in phrases
  • how phrases are formed by smaller phrases and
    words

12
Subareas of Linguistics
  • Semantics the meaning of a word or phrase within
    a sentence
  • How to represent meaning?
  • Semantic network? Logic? Policy?
  • How to construct meaning representation?
  • Is meaning compositional?
  • Pragmatics structures and patterns in discourses
  • Co-reference resolution
  • Jane races Mary on weekends. She often beats her.
  • Implicatures
  • How many times do you go skating each week?
  • Speech acts
  • Do you know the time?

13
Morphology
  • Morphology is concerned with the internal make-up
    of words
  • Input The fearsome cats attacked the foolish
    dog
  • Output The fear-some cat-s attack-ed the
    fool-ish dog
  • Inflectional morphology
  • Does not change the grammatical category of
    words cats/cat-s, attacked/attack-ed
  • Derivational morphology
  • May involve changes to grammatical categories
    fearsome/fear-some, foolish/fool-ish

14
Morphology Is not as Easy as It May Seem to be
  • Examples from Woods et. al. 2000
  • delegate (de leg ate) take the legs from
  • caress (car ess) female car
  • cashier (cashy er) more wealthy
  • lacerate (lace rate) speed of tatting
  • ratify (rat ify) infest with rodents
  • infantry (infant ry) childish behavior

15
A Turkish Example Oflazer Guzey 1994
  • uygarlastiramayabileceklerimizdenmissinizcesine
  • urgar/civilized las/BECOME tir/CAUS ama/NEG
    yabil/POT ecek/FUT ler/3PL imiz/POSS-1SG den/ABL
    mis/NARR siniz/2PL cesine/AS-IF
  • an adverb meaning roughly (behaving) as if you
    were one of those whom we might not be able to
    civilize.

16
Why not just Use a Dictionary?
  • How many words are there in a language?
  • English OED 400K entries
  • Turkish 600x106 forms
  • Finnish 107 forms
  • New words are being invented all the time
  • e-mail
  • IM

17
Syntax is about Sentence Structures
  • Sentences have structures and are made up of
    constituents.
  • The constituents are phrases.
  • A phrase consists of a head and modifiers.
  • The category of the head determines the category
    of the phrase
  • e.g., a phrase headed by a noun is a noun phrase

18
Parsing
  • Analyze the structure of a sentence

19
S
S
VP
VP
NP
NP
NP
NP
N
N
V
N
N
V
A
N
Teacher strikes idle kids
Teacher strikes idle kids
20
Syntax
  • Syntax is the study of the regularities and
    constraints of word order and phrase structure
  • How words are organized into phrases
  • How phrases are combined into larger phrases
    (including sentences).

21
Course Overview Background Theories
  • Linguistics
  • Syntax
  • Binding theory
  • Probability and Information Theory
  • Markov model
  • Bayesian network
  • EM (expectation/estimation maximization)

22
Course Overview Enabling Technologies
  • Stemming
  • Reduce detects, detected, detecting, detect, to
    the same form.
  • POS Tagging
  • Determine for each word whether it is a noun,
    adjective, verb, ..
  • Parsing
  • sentence ? parse tree
  • Word Sense Disambiguation
  • orange juice vs. orange coat
  • Learning from text

23
Course Overview Applications
  • Question Answering
  • Machine Translation
  • Text Mining/Information Extraction
Write a Comment
User Comments (0)
About PowerShow.com