SIMS 290-2: Applied Natural Language Processing - PowerPoint PPT Presentation

About This Presentation
Title:

SIMS 290-2: Applied Natural Language Processing

Description:

Title: SIMS 290-2: Applied Natural Language Processing: Marti Hearst Last modified by: hearst Created Date: 7/19/2001 7:37:29 AM Document presentation format – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 33
Provided by: coursesIs1
Category:

less

Transcript and Presenter's Notes

Title: SIMS 290-2: Applied Natural Language Processing


1
SIMS 290-2 Applied Natural Language Processing
Marti Hearst Sept 22, 2004    
2
Today
  • Cascaded Chunking
  • Example of Using Chunking Word Associations
  • Evaluating Chunking
  • Going to the next level Parsing

3
Cascaded Chunking
  • Goal create chunks that include other chunks
  • Examples
  • PP consists of preposition NP
  • VP consists of verb followed by PPs or NPs
  • How to make it work in NLTK
  • The tutorial is a bit confusing, I attempt to
    clarify

4
Creating Cascaded Chunkers
  • Start with a sentence token
  • A list of words with parts of speech assigned
  • Create a fresh one or use one from a corpus

5
Creating Cascaded Chunkers
  • Create a set of chunk parsers
  • One for each chunk type
  • Each one takes as input some kind of list of
    tokens, and produced as output a NEW list of
    tokens
  • You can decide what this new list is called
  • Examples NP-CHUNK, PP-CHUNK, VP-CHUNK
  • You can also decide what to name each occurrence
    of the chunk type, as it is assigned to a subset
    of tokens
  • Examples NP, VP, PP
  • How to match higher-level tags?
  • It just seems to match their string description
  • So best be certain that their name does not
    overlap with POS tags too

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
Lets do some text analysis
  • Lets try this on more complex sentences
  • First, read in part of a corpus
  • Then, count how often each word occurs with each
    POS
  • Determine some common verbs, choose one
  • Make a list of sentences containing that verb
  • Test out the chunker on them examine further

10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Why didnt this parse work?
14
Why didnt this parse work?
15
Why didnt this parse work?
16
Why didnt this parse work?
17
Corpus Analysis for Discovery ofWord Associations
  • Classic paper by Church Hanks showed how to use
    a corpus and a shallow parser to find interesting
    dependencies between words
  • Word Association Norms, Mutual Information, and
    Lexicography, Computational Linguistics, 16(1),
    1991
  • http//www.research.att.com/kwc/publications.html
  • Some cognitive evidence
  • Word association norms which word to people say
    most often after hearing another word
  • Given doctor nurse, sick, health, medicine,
    hospital
  • People respond more quickly to a word if theyve
    seen an associated word
  • E.g., if you show bread theyre faster at
    recognizing butter than nurse (vs a nonsense
    string)

18
Corpus Analysis for Discovery ofWord Associations
  • Idea use a corpus to estimate word associations
  • Association ratio log ( P(x,y) / P(x)P(y) )
  • The probability of seeing x followed by y vs. the
    probably of seeing x anywhere times the
    probability of seeing y anywhere
  • P(x) is how often x appears in the corpus
  • P(x,y) is how often y follows x within w words
  • Interesting associations with doctor
  • X honorary Y doctor
  • X doctors Y dentists
  • X doctors Y nurses
  • X doctors Y treating
  • X examined Ydoctor
  • X doctors Y treat

19
Corpus Analysis for Discovery ofWord Associations
  • Now lets make use of syntactic information.
  • Look at which words and syntactic forms follow a
    given verb, to see what kinds of arguments it
    takes
  • Compute triples of subject-verb-object
  • Example nouns that appear as the object of the
    verb usage of drink
  • martinis, cup_water, champagne, beverage,
    cup_coffee, cognac, beer, cup, coffee, toast,
    alcohol
  • What can we note about many of these words?
  • Example verbs that have telephone in their
    object
  • sit_by, disconnect, answer, hang_up, tap,
    pick_up, return, be_by, spot, repeat, place,
    receive, install, be_on

20
Corpus Analysis for Discovery ofWord Associations
  • The approach has become standard
  • Entire collections available
  • Dekang Lins Dependency Database
  • Given a word, retrieve words that had dependency
    relationship with the input word
  • Dependency-based Word Similarity
  • Given a word, retrieve the words that are most
    similar to it, based on dependencies
  • http//www.cs.ualberta.ca/lindek/demos.htm

21
Example Dependency Database sell
22
Example Dependency-based Similarity sell
23
Homework Assignment
  • Choose a verb of interest
  • Analyze the context in which the verb appears
  • Can use any corpus you like
  • Can train a tagger and run it on some fresh text
  • Example What kinds of arguments does it take?
  • Improve on my chunking rules to get better
    characterizations

24
Evaluating the Chunker
  • Why not just use accuracy?
  • Accuracy correct/total number
  • Definitions
  • Total number of chunks in gold standard
  • Guessed set of chunks that were labeled
  • Correct of the guessed, which were correct
  • Missed how many correct chunks not guessed?
  • Precision correct / guessed
  • Recall correct / total
  • F-measure 2 (PrecRecall) / (Prec Recall)

25
Example
  • Assume the following numbers
  • Total 100
  • Guessed 120
  • Correct 80
  • Missed 20
  • Precision 80 / 120 0.67
  • Recall 80 / 100 0.80
  • F-measure 2 (.67.80) / (.67 .80) 0.69

26
Evaluating in NLTK
  • We have some already chunked text from the
    Treebank
  • The code below uses the existing parse to compare
    against, and to generate Tokens of type word/tag
    to parse with our own chunker.
  • Have to add location information so the
    evaluation code can compare which words have been
    assigned which labels

27
How to get better accuracy?
  • Use a full syntactic parser
  • These days the probabilistic ones work
    surprisingly well
  • They are getting faster too.
  • Prof. Dan Kleins is very good and easy to run
  • http//nlp.stanford.edu/downloads/lex-parser.shtml

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
Next Week
  • Shallow Parsing Assignment
  • Due on Wed Sept 29
  • Next week
  • Read paper on end-of-sentence disambiguation
  • Presley and Barbara lecturing on categorization
  • We will read the categorization tutorial the
    following week
Write a Comment
User Comments (0)
About PowerShow.com