Fall 2004

1 / 75
About This Presentation
Title:

Fall 2004

Description:

Formerly at IBM TJ Watson Research Center. Times: Tuesdays 1: ... tabby. small. Open (lexical) and closed (functional) categories: No-fly-zone. yadda yadda yadda ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 76
Provided by: rad75

less

Transcript and Presenter's Notes

Title: Fall 2004


1

EECS 595 / LING 541 / SI 661
Natural Language Processing
  • Fall 2004
  • Lecture Notes 1

2
Introduction
3
Course logistics
  • Instructor Prof. Dragomir Radev
    (radev_at_umich.edu) Ph.D., Computer Science,
    Columbia University Formerly at IBM TJ Watson
    Research Center
  • Times Tuesdays 110-355 PM, in 412, West Hall
  • Office hours TBA, 3080 West Hall Connector

Course home page
http//www.si.umich.edu/radev/NLP-fall2004
4
Example (from a famous movie)
Dave Bowman Open the pod bay doors, HAL. HAL
Im sorry Dave. Im afraid I cant do that.
5
Example
I saw her fall
  • How many different interpretations does the above
    sentence have?

6
What is Natural Language Processing
  • Natural Language Processing (NLP) is the study of
    the computational treatment of natural language.
  • NLP draws on research in Linguistics, Theoretical
    Computer Science, Mathematics and Statistics,
    Artificial Intelligence, Psychology, etc.

7
Linguistics
  • Knowledge about language
  • Phonetics and phonology - the study of sounds
  • Morphology - the study of word components
  • Syntax - the study of sentence and phrase
    structure
  • Lexical semantics - the study of the meanings of
    words
  • Compositional semantics - how to combine words
  • Pragmatics - how to accomplish goals
  • Discourse conventions - how to deal with units
    larger than utterances

8
Theoretical Computer Science
  • Automata
  • Deterministic and non-deterministic finite-state
    automata
  • Push-down automata
  • Grammars
  • Regular grammars
  • Context-free grammars
  • Context-sensitive grammars
  • Complexity
  • Algorithms
  • Dynamic programming

9
Mathematics and Statistics
  • Probabilities
  • Statistical models
  • Hypothesis testing
  • Linear algebra
  • Optimization
  • Numerical methods

10
Artificial Intelligence
  • Logic
  • First-order logic
  • Predicate calculus
  • Agents
  • Speech acts
  • Planning
  • Constraint satisfaction
  • Machine learning

11
Ambiguity
I saw her fall.
  • The categories of knowledge of language can be
    thought of as ambiguity-resolving components
  • How many different interpretations does the above
    sentence have?
  • How can each ambiguous piece be resolved?
  • Does speech input make the sentence even more
    ambiguous?

Time flies like an arrow.
12
http//edition.cnn.com/2004/WEATHER/09/03/hurrican
e.frances/index.html Frances churns toward
Florida Hurricane center Storm 'relentlessly
lashing Bahamas' Friday, September 3, 2004
Posted 2024 GMT (0424 HKT) MIAMI, Florida
(CNN) -- Hurricane Frances moved slowly toward
Florida on Friday, and the National Hurricane
Center said it could gain intensity before making
landfall, possibly late Saturday. At 2 p.m. ET,
the Category 3 storm was centered near the
southern tip of Great Abaco in the Bahamas, 200
miles (321 kilometers) east-southeast of
Florida's lower east coast, according to the
National Hurricane Center. The storm was moving
toward the west-northwest at about 9 mph (15
kph). Its maximum sustained winds had dropped to
115 mph (185 kph), but forecasters said it still
is "a dangerous hurricane." Hurricanes are
classified as categories 1 to 5 on the
Saffir-Simpson hurricane scale. A Category 3
storm has sustained winds between 111 and 130 mph
(178 and 209 kph). The advisory said Frances was
likely to make landfall in Florida in about 36
hours. Hurricane-force winds extend 85 miles (140
kilometers) from the center of the storm, and
winds of tropical storm strength (39-73 mph)
extend outward up to 185 miles (295
kilometers). Because Frances is the size of Texas
-- more than twice as large as Hurricane Charley
three weeks ago -- its major winds and heavy rain
are expected to batter a large part of Florida
well before landfall. By Friday afternoon, parts
of Florida were experiencing wind gusts as high
as 39 mph -- the lower end of tropical-storm
intensity. Hurricane warnings are in effect for
much of Florida's eastern coastline. A hurricane
warning means hurricane conditions are expected
in the warning area within 24 hours. Storm surge
flooding of six to 14 feet above normal has been
reported in the storm's path, and the hurricane
center warned "rainfall amounts of seven to 12
inches -- locally as high as 20 inches -- are
possible in association with Frances." The
hurricane center bulletin said Frances was
"relentlessly lashing the central and western
Bahamas." A hurricane center official told CNN
the storm could spend two days moving across the
Florida Peninsula. Frances has weakened slightly
in the past few days, but the hurricane center
advisory warned that as it moves across the warm
waters of the Gulf Stream, "this could easily
lead to re-intensification." However, current
forecasts predict "a 100-knot hurricane at
landfall" -- meaning wind speeds of about 115
mph. Because steering currents are expected to
weaken further, Frances "will likely slow down on
its way to Florida. This could delay the landfall
a few more hours," the advisory said. "Numerical
guidance continues to bring the hurricane over
Florida during the next two to three
days." Florida Gov. Jeb Bush said Friday that the
state was taking all necessary steps to prepare
for the storm.
13
Florida Gov. Jeb Bush said Friday that the state
was taking all necessary steps to prepare for the
storm. "We are staging across -- some outside the
state and some inside the state -- a massive
response for this storm, and we're going to need
it," Bush said in a news conference. "There's
going to be a lot of work necessary to make sure
that the response is massive and immediate to
help people once this storm comes." He said he
has asked the governors of 17 states to waive
size and weight restrictions on trucks carrying
relief supplies. His brother, President Bush,
also offered support at a campaign rally Friday
morning in Pennsylvania. "Before I begin, I do
know you'll join me in offering our prayers and
best wishes to those in the path of Hurricane
Frances," the president said. A hurricane the
size of Texas Florida ordered mandatory
evacuations in parts of 16 counties and voluntary
evacuations in five other counties. "If you are
on a barrier island or a low-lying area, and you
haven't left, now is the time to do so," Governor
Bush said. Florida officials said the evacuation
order covers 2.5 million people. Most of them
"are staying in their own community, which is
exactly what they should be doing," said Bush,
noting that low-lying areas were most at risk.
"They've made plans to be with a loved one or a
friend and they're not on the roads." People
looking to flee the region clogged highways
Thursday, but officials said Friday that traffic
had died down. "Overall we're very, very pleased
with evacuation procedures yesterday and
continuing through today," said Col. Chris
Knight, director of the Florida Highway Patrol.
"We have no problems this morning." The Red Cross
opened 82 shelters in Florida on Thursday and
about 21,000 people were in them by nightfall,
spokeswoman Carol Miller told CNN. The group also
set up eight reception centers along the highway
to help people who needed information,
directions, water and maps, she said. Miller said
the Red Cross was launching its largest-ever
response effort to a domestic natural
disaster. Airlines have canceled flights in and
out of some of the major airports in Florida and
the Caribbean, and are expected to adjust
schedules as weather patterns change throughout
the weekend. Military preparations Military
officials preparing to evacuate three commands as
Frances approaches. At MacDill Air Force Base in
Tampa, on Florida's Gulf Coast, a military team
is preparing to set up alternative headquarters
facilities for the U.S. Central Command and
Special Operations Command at the stadium used by
the Tampa Bay Buccaneers football team. Central
Command is responsible for running the wars in
Afghanistan and Iraq, while Special Operations
Command oversees 50,000 special operations
forces. Patrick Air Force Base, on the eastern
coast of Florida near Melbourne, was evacuated
Thursday, and the commander of a fighter wing
near Miami ordered aircraft moved out of the
hurricane's path. The naval air station at
Jacksonville also moved aircraft out of the
area. In Miami, the headquarters of the Southern
Command has closed. Command-and-control
operations are being performed, but they could be
moved to Davis-Monthan Air Force Base in Arizona.
14
The alphabet soup(NLP vs. CL vs. SP vs. HLT vs.
NLE)
  • NLP (Natural Language Processing)
  • CL (Computational Linguistics)
  • SP (Speech Processing)
  • HLT (Human Language Technology)
  • NLE (Natural Language Engineering)
  • Other areas of research Speech and Text
    Generation, Speech and Text Understanding,
    Information Extraction, Information Retrieval,
    Dialogue Processing, Inference
  • Related areas Spelling Correction, Grammar
    Correction, Text Summarization

15
Sample applications
  • Speech Understanding
  • Question Answering
  • Machine Translation
  • Text-to-speech Generation
  • Text Summarization
  • Dialogue Systems

16
Some demos
  • ATT Labs Text-To-Speech(http//www.research.att.
    com/projects/tts/demo.html)
  • Babelfish (babelfish.altavista.com)
  • OneAcross (www.oneacross.com)
  • AskJeeves (www.ask.com)
  • IONaut (http//www.ionaut.com8400)
  • NSIR (http//tangra.si.umich.edu/clair/NSIR/html/n
    sir.cgi)
  • AnswerBus (www.answerbus.com)
  • NewsInEssence (www.newsinessence.com)

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
The Turing Test
  • Alan Turing the Turing test (language as test
    for intelligence)
  • Three participants a computer and two humans
    (one is an interrogator)
  • Interrogators goal to tell the machine and
    human apart
  • Machines goal to fool the interrogator into
    believing that a person is responding
  • Other humans goal to help the interrogator
    reach his goal

Q Please write me a sonnet on the topic of the
Forth Bridge. A Count me out on this one. I
never could write poetry. Q Add 34957 to
70764. A 105621 (after a pause)
22
Some brief history
  • Foundational insights (40s and 50s) automaton
    (Turing), probabilities, information theory
    (Shannon), formal languages (Backus and Naur),
    noisy channel and decoding (Shannon), first
    systems (Davis et al., Bell Labs)
  • Two camps (57-70) symbolic and
    stochastic.Transformation grammar (Harris,
    Chomsky), artificial intelligence (Minsky,
    McCarthy, Shannon, Rochester), automated theorem
    proving and problem solving (Newell and
    Simon)Bayesian reasoning (Mosteller and
    Wallace)Corpus work (Kucera and Francis)

23
Some brief history
  • Four paradigms (70-83) stochastic (IBM),
    logic-based (Colmerauer, Pereira and Warren, Kay,
    Bresnan), nlu (Winograd, Schank, Fillmore),
    discourse modelling (Grosz and Sidner)
  • Empiricism and finite-state models redux (83-93)
    Kaplan and Kay (phonology and morphology), Church
    (syntax)
  • Late years (94-03) strong integration of
    different techniques, different areas (including
    speech and IR), probabilistic models, machine
    learning

24
The state of the art and the near-term future
  • World-Wide Web (WWW)
  • Sample scenarios
  • generate weather reports in two languages
  • teaching deaf people to speak
  • translate Web pages into different languages
  • speak to your appliances
  • find restaurants
  • answer questions
  • grade essays (?)
  • closed-captioning in many languages
  • automatic description of a soccer game

25
Structure of the course
  • Three major parts
  • Linguistic, mathematical, and computational
    background
  • Computational models of morphology, syntax,
    semantics, discourse, pragmatics
  • Applications text generation, machine
    translation, information extraction, etc.
  • Three major goals
  • Learn the basic principles and theoretical issues
    underlying natural language processing
  • Learn techniques and tools used to develop
    practical, robust systems that can communicate
    with users in one or more languages
  • Gain insight into many open research problems in
    natural language

26
Readings
  • Speech and Language Processing(Daniel Jurafsky
    and James Martin)Prentice-Hall, 2000ISBN
    0-13-095069-6
  • Handouts given in class
  • 1-2 chapters per week

Optional readings Natural Language
Understanding by Allen Foundations of
Statistical Natural Language Processing by
Manning and Schütze.
27
Grading
  • Four homework assignments (40)
  • Midterm (15)
  • Final project (20)
  • Final exam (25)
  • Additional requirements for SI761

28
Assignments
  • (subject to change)
  • Finite-state modeling, part of speech tagging,
    and information extraction
  • Fsmtools/lextools/JMX (Bell Labs, Penn)
  • Tagging and parsing
  • Brill tagger/Charniak parser (JHU, Brown)
  • Machine translation
  • GIZA/Rewrite decoder (Aachen, JHU, ISI)
  • Text generation
  • FUF/Surge (Columbia)

29
Syllabus
Wk Date Topic HW HW due
1 9/7 Introduction (JM1)Linguistic Fundamentals
2 9/14 Regular Expressions and Automata (JM2) 1
3 9/21 Morphology and Finite-State Transducers (JM3)Word Classes and Part of Speech Tagging (JM8)
4 9/28 Context-Free Grammars for English (JM9)Parsing with Context-Free Grammars (JM10) 2 1
5 10/5 Features and Unification (JM11)Lexicalized and Probabilistic Parsing (JM12)
6 10/12 Natural Language Generation (JM20)Machine Translation (JM 21 handout) 3 2
10/19 NO CLASS
30
Syllabus
Wk Date Topic HW HW due
7 10/26 Midterm
8 11/2 Natural Language Generation (JM20) (Contd)The Functional Unification Formalism (Handout) 4 3
9 11/9 Language and Complexity (JM13)
10 11/16 Representing Meaning (JM14) 4
11 11/23 Semantic Analysis (JM15)Discourse (JM18)
12 11/30 Rhetorical Analysis (Handout)Dialogue and Conversational Agents (JM19) Project due
13 12/714 Project Presentations
31
Other meetings
  • CLAIR meeting
  • (TBA)
  • Artificial Intelligence Seminar
  • (Tuesdays 4-530)
  • STIET
  • (Thursdays 4-530)

32
Projects
Each student will be responsible for designing
and completing a research project that
demonstrates the ability to use concepts from the
class in addressing a practical problem. A
significant part of the final grade will depend
on the project assignment. Students can elect to
do a project on an assigned topic, or to select a
topic of their own. The final version of the
project will be put on the World Wide Web, and
will be defended in front of the class at the end
of the semester (procedure TBA). In some cases
(and only with instructors approval), students
may be allowed to work in pairs when the
projects scope is significant.
33
Sample projects
  • Noun phrase parser
  • Paraphrase identification
  • Question answering
  • NL access to databases
  • Named entity tagging
  • Rhetorical parsing
  • Anaphora resolution, entity crossreference
  • Document and sentence alignment
  • Using bioinformatics methods
  • Encyclopedia
  • Information extraction
  • Speech processing
  • Sentence normalization
  • Text summarization
  • Sentence compression
  • Definition extraction
  • Crossword puzzle generation
  • Prepositional phrase attachment
  • Machine translation
  • Generation
  • Semi-structured document parsing
  • Semantic analysis of short queries
  • User-friendly summarization
  • Number classification
  • Domain-specific PP attachment
  • Time-dependent fact extraction

34
Main research forums and other pointers
  • Conferences ACL/NAACL, SIGIR, AAAI/IJCAI, ANLP,
    Coling, HLT, EACL/NAACL, AMTA/MT Summit,
    ICSLP/Eurospeech
  • Journals Computational Linguistics, Natural
    Language Engineering, Information Retrieval,
    Information Processing and Management, ACM
    Transactions on Information Systems, ACM TALIP,
    ACM TSLP
  • University centers Columbia, CMU, JHU, Brown,
    UMass, MIT, UPenn, USC/ISI, NMSU, Michigan,
    Maryland, Edinburgh, Cambridge, Saarland,
    Sheffield, and many others
  • Industrial research sites IBM, SRI, BBN, MITRE,
    MSR, (ATT, Bell Labs, PARC)
  • Startups Language Weaver, Ask.com, LCC
  • The Anthology http//www.aclweb.org/anthology

35
(No Transcript)
36
What this course is NOT
  • EECS 597 / LING 792 / SI 661 Language and
    Information, last taught in Fall of 2002,
    essentially an introduction to corpus-based and
    statistical NLP.
  • Topics covered introduction to computational
    linguistics, information theory, data compression
    and coding, N-gram models, clustering,
    lexicography, collocations, text summarization,
    information extraction, question answering, word
    sense disambiguation, analysis of style, and
    other topics .
  • SI 760 Information Retrieval, last taught
    Winter 2003.
  • Topics covered information need, IR models,
    documents, queries, query languages, relevance,
    retrieval evaluation, reference collections,
    query expansion and relevance feedback, indexing
    and searching, XML retrieval, language modeling
    approaches, crawling the Web, hyperlink analysis,
    measuring the Web, similarity and clustering,
    social network analysis for IR, hubs and
    authorities, PageRank and HITS, focused crawling,
    relevance transfer, question answering
  • An undergraduate Linguistics course such as Ling
    212 Intro to the Symbolic Analysis of Language
    or Ling 320 Programming for Linguistics and
    Language Studies

37
Linguistic Fundamentals
38
Syntactic categories
  • Substitution test



black Persian tabbysmall
Nathalie likes
cats.
  • Open (lexical) and closed (functional) categories

No-fly-zone yadda yadda yadda
the in
39
Morphology
The dog chased the yellow bird.
  • Parts of speech eight (or so) general types
  • Inflection (number, person, tense)
  • Derivation (adjective-adverb, noun-verb)
  • Compounding (separate words or single word)
  • Part-of-speech tagging
  • Morphological analysis (prefix, root, suffix,
    ending)

40
Part of speech tags
From Church (1991) - 79 tags
NN / singular noun / IN / preposition
/ AT / article / NP / proper noun / JJ
/ adjective / , / comma / NNS /
plural noun / CC / conjunction / RB /
adverb / VB / un-inflected verb / VBN /
verb en (taken, looked (passive,perfect)) / VBD
/ verb ed (took, looked (past tense)) / CS
/ subordinating conjunction /
41
Jabberwocky (Lewis Carroll)
  • Twas brillig, and the slithy tovesDid gyre
    and gimble in the wabeAll mimsy were the
    borogoves,And the mome raths outgrabe."Beware
    the Jabberwock, my son!The jaws that bite, the
    claws that catch!Beware the Jubjub bird, and
    shunThe frumious Bandersnatch!"

42
Nouns
  • Nouns dog, tree, computer, idea
  • Nouns vary in number (singular, plural), gender
    (masculine, feminine, neuter), case (nominative,
    genitive, accusative, dative)
  • Latin filius (m), filia (f), filium
    (object)German Mädchen
  • Clitics (s)

43
Pronouns
  • Pronouns she, ourselves, mine
  • Pronouns vary in person, gender, number, case (in
    English nominative, accusative, possessive, 2nd
    possessive, reflexive)

Mary saw her in the mirror. Mary saw herself in
the mirror.
  • Anaphors herself, each other

44
Determiners and adjectives
  • Articles the, a
  • Demonstratives this, that
  • Adjectives describe properties
  • Attributive and predicative adjectives
  • Agreement in gender, number
  • Comparative and superlative (derivative and
    periphrastic)
  • Positive form

45
Verbs
  • Actions, activities, and states (throw, walk,
    have)
  • English four verb forms
  • tenses present, past, future
  • other inflection number, person
  • gerunds and infinitive
  • aspect progressive, perfective
  • voice active, passive
  • participles, auxiliaries
  • irregular verbs
  • French and Finnish many more inflections than
    English

46
Other parts of speech
  • Adverbs, prepositions, particles
  • phrasal verbs (the plane took off, take it off)
  • particles vs. prepositions (she ran up a
    bill/hill)
  • Coordinating conjunctions and, or, but
  • Subordinating conjunctions if, because, that,
    although
  • Interjections Ouch!

47
Phrase structure
  • Constraints on word order
  • Constituents NP, PP, VP, AP
  • Phrase structure grammars

S
NP
VP
PN
V
N
Det
N
Spot
chased
a
bird
48
Phrase structure
  • Paradigmatic relationships (e.g., constituency)
  • Syntagmatic relationships (e.g., collocations)

S
NP
VP
VBD
That
man
PP
NP
the
butterfly
IN
NP
caught
a
net
with
49
Phrase-structure grammars
Peter gave Mary a book. Mary gave Peter a book.
  • Constituent order (SVO, SOV)
  • imperative forms
  • sentences with auxiliary verbs
  • interrogative sentences
  • declarative sentences
  • start symbol and rewrite rules
  • context-free view of language

50
Sample phrase-structure grammar
S ? NP VPNP ? AT NNSNP ? AT NNNP ? NP
PPVP ? VP PP VP ? VBD VP ? VBD NP P ? IN
NP
AT ? theNNS ? children NNS ? students NNS ?
mountains VBD ? slept VBD ? ate VBD ? saw IN
? in IN ? of NN ? cake
51
Phrase structure grammars
  • Local dependencies
  • Non-local dependencies
  • Subject-verb agreement

The women who found the wallet were given a
reward.
  • wh-extraction

Should Peter buy a book? Which book should Peter
buy?
  • Empty nodes

52
Dependency arguments and adjuncts
Sue watched the man at the next table.
  • Event dependents (verb arguments are usually
    NPs)
  • agent, patient, instrument, goal - semantic roles
  • subject, direct object, indirect object
  • transitive, intransitive, and ditransitive verbs
  • active and passive voice

53
Subcategorization
  • Arguments subject complements
  • adjuncts vs. complements
  • adjuncts are optional and describe time, place,
    manner
  • subordinate clauses
  • subcategorization frames

54
Subcategorization
  • Subject The children eat candy.Object The
    children eat candy.Prepositional phrase She put
    the book on the table.Predicative adjective We
    made the man angry.Bare infinitive She helped
    me walk.To-infinitive She likes to
    walk.Participial phrase She stopped singing
    that tune at the end.That-clause She thinks
    that it will rain tomorrow.Question-form
    clauses She asked me what book I was reading.

55
Subcategorization frames
  • Intransitive verbs The woman walked
  • Transitive verbs John loves Mary
  • Ditransitive verbs Mary gave Peter flowers
  • Intransitive with PP I rent in Paddington
  • Transitive with PP She put the book on the table
  • Sentential complement I know that she likes you
  • Transitive with sentential complement She told
    me that Gary is coming on Tuesday

56
Selectional restrictions and preferences
  • Subcategorization frames capture syntactic
    regularities about complements
  • Selectional restrictions and preferences capture
    semantic regularities bark, eat

57
Phrase structure ambiguity
  • Grammars are used for generating and parsing
    sentences
  • Parses
  • Syntactic ambiguity
  • Attachment ambiguity Our company is training
    workers.
  • The children ate the cake with a spoon.
  • High vs. low attachment
  • Garden path sentences The horse raced past the
    barn fell. Is the book on the table red?

58
Ungrammaticality vs. semantic abnormality
Slept children the. Colorless green ideas
sleep furiously. The cat barked.
59
Semantics and pragmatics
  • Lexical semantics and compositional semantics
  • Hypernyms, hyponyms, antonyms, meronyms and
    holonyms (part-whole relationship, tire is a
    meronym of car), synonyms, homonyms
  • Senses of words, polysemous words
  • Homophony (bass).
  • Collocations white hair, white wine
  • Idioms to kick the bucket

60
Discourse analysis
  • Anaphoric relations

1. Mary helped Peter get out of the car. He
thanked her.2. Mary helped the other passenger
out of the car. The man had asked her for
help because of his foot injury.
  • Information extraction problems (entity
    crossreferencing)

Hurricane Hugo destroyed 20,000 Florida homes.At
an estimated cost of one billion dollars, the
disasterhas been the most costly in the states
history.
61
Pragmatics
  • The study of how knowledge about the world and
    language conventions interact with literal
    meaning.
  • Speech acts
  • Research issues resolution of anaphoric
    relations, modeling of speech acts in dialogues

62
Other areas of NLP
  • Linguistics is traditionally divided into
    phonetics, phonology, morphology, syntax,
    semantics, and pragmatics.
  • Sociolinguistics interactions of social
    organization and language.
  • Historical linguistics change over time.
  • Linguistic typology
  • Language acquisition
  • Psycholinguistics real-time production and
    perception of language

63
Other sites
  • Johns Hopkins University (Jason
    Eisner)http//www.cs.jhu.edu/jason/465/
  • Cornell University (Lillian Lee)http//courses.cs
    .cornell.edu/cs674/2002SP/
  • Simon Fraser University (Anoop Sarkar)
  • http//www.sfu.ca/anoop/courses/CMPT-825-Fall-20
    03/index.html
  • Stanford University (Chris Manning)http//www.sta
    nford.edu/class/cs224n/
  • JHU Summer workshophttp//www.clsp.jhu.edu/ws2003
    /calendar/preliminary.shtml

64
Word classes andpart-of-speech tagging
65
Part of speech tagging
  • Problems transport, object, discount, address
  • More problems content
  • French est, président, fils
  • Book that flight what is the part of speech
    associated with book?
  • POS tagging assigning parts of speech to words
    in a text.
  • Three main techniques rule-based tagging,
    stochastic tagging, transformation-based tagging

66
Rule-based POS tagging
  • Use dictionary or FST to find all possible parts
    of speech
  • Use disambiguation rules (e.g., ARTV)
  • Typically hundreds of constraints can be designed
    manually

67
Example in French
ltSgt
beginning of sentence La rf b nms
u article teneur nfs nms
noun feminine singular Moyenne
jfs nfs v1s v2s v3s adjective feminine
singular en p a b
preposition uranium nms
noun masculine singular des
p r preposition
rivieres nfp noun
feminine plural , x
punctuation bien_que
cs subordinating conjunction
délicate jfs
adjective feminine singular À p
preposition calculer
v verb
68
Sample rules
  • BS3 BI1 A BS3 (3rd person subject personal
    pronoun) cannot be followed by a BI1 (1st person
    indirect personal pronoun). In the example il
    nous faut'' (\it we need) - il'' has the tag
    BS3MS and nous'' has the tags BD1P BI1P BJ1P
    BR1P BS1P. The negative constraint BS3 BI1''
    rules out BI1P'', and thus leaves only 4
    alternatives for the word nous''.
  • N K The tag N (noun) cannot be followed by a tag
    K (interrogative pronoun) an example in the test
    corpus would be ... fleuve qui ...''
    (...river, that...). Since qui'' can be tagged
    both as an E'' (relative pronoun) and a K''
    (interrogative pronoun), the E'' will be chosen
    by the tagger since an interrogative pronoun
    cannot follow a noun (N'').
  • R VA word tagged with R (article) cannot be
    followed by a word tagged with V (verb) for
    example l' appelle'' (calls him/her). The word
    appelle'' can only be a verb, but l''' can be
    either an article or a personal pronoun. Thus,
    the rule will eliminate the article tag, giving
    preference to the pronoun.

69
Stochastic POS tagging
  • HMM tagger
  • Pick the most likely tag for this word
  • P(wordtag) P(tagprevious n tags) find tag
    sequence that maximizes the probability formula
  • A bigram-based HMM tagger chooses the tag ti for
    word wi that is most probable given the previous
    tag ti-1 and the current word wi
  • ti argmaxj P(tjti-1,wi)
  • ti argmaxj P(tjti-1)P(witj) HMM equation
    for a single tag

70
Example
  • Secretariat/NNP is/VBZ expected/VBN to/TO race/VB
    tomorrow/ADV
  • People/NNS continue/VBP to/TO inquire/VB the/DT
    reason/NN for/IN the/DT race/NN for/IN outer/JJ
    space/NN
  • P(VBTO)P(raceVB)
  • P(NNTO)P(raceNN)
  • TO toVB (to sleep), toNN (to school)

71
Example (contd)
  • P(NNTO) .021
  • P(VBTO) .34
  • P(raceNN) .00041
  • P(raceVB) .00003
  • P(VBTO)P(raceVB) .00001
  • P(NNTO)P(raceNN) .000007

72
HMM Tagging
  • T argmax P(TW), where Tt1,t2,,tn
  • By Bayes rule P(TW) P(T)P(WT)/P(W)
  • Thus we are attempting to choose the sequence of
    tags that maximizes the rhs of the equation
  • P(W) can be ignored
  • P(T)P(WT) ?P(wiw1t1wi-1ti-1ti)P(tiw1t1wi-1t
    i-1)

73
Transformation-based learning
  • P(NNrace) .98
  • P(VBrace) .02
  • Change NN to VB when the previous tag is TO
  • Types of rules
  • The preceding (following) word is tagged z
  • The word two before (after) is tagged z
  • One of the two preceding (following) words is
    tagged z
  • One of the three preceding (following) words is
    tagged z
  • The preceding word is tagged z and the following
    word is tagged w

74
Confusion matrix
IN JJ NN NNP RB VBD VBN
IN - .2 .7
JJ .2 - 3.3 2.1 1.7 .2 2.7
NN 8.7 - .2
NNP .2 3.3 4.1 - .2
RB 2.2 2.0 .5 -
VBD .3 .5 - 4.4
VBN 2.8 2.6 -
Most confusing NN vs. NNP vs. JJ, VBD vs. VBN
vs. JJ
75
Readings
  • JM Chapters 1, 2, 3, 8
  • What is Computational Linguistics by Hans
    Uszkoreithttp//www.coli.uni-sb.de/hansu/what_is
    _cl.html
  • Lecture notes 1
Write a Comment
User Comments (0)