CSA2050: Natural Language Processing - PowerPoint PPT Presentation

About This Presentation

Title:

CSA2050: Natural Language Processing

Description:

like rule-based tagging: rules are used to specify tags in a certain environment; ... Prosodic chunking: [I saw] [a tall man] [in the park]. Question answering: ... – PowerPoint PPT presentation

Number of Views:118

Avg rating:3.0/5.0

Slides: 37

Provided by: michael307

Category:

more less

Transcript and Presenter's Notes

Title: CSA2050: Natural Language Processing

1
CSA2050 Natural Language Processing

Tagging 3 and Chunking
Transformation Based Tagging
Chunking

2
Tagging 3 and Chunking Lecture

Slides based on Mike Rosner and Marti Hearst
notes
Additions from NLTK tutorials

3
3 Approaches to Tagging

Rule-Based Tagger ENGTWOL Tagger(Voutilainen
1995)
Stochastic Tagger HMM-based Tagger
Transformation-Based Tagger Brill Tagger(Brill
1995)

4
Transformation-Based Tagging

A combination of rule-based and stochastic
tagging methodologies
like rule-based tagging rules are used to
specify tags in a certain environment
like stochastic tagging machine learning is
used.
Transformation-Based Learning (TBL)

5
Transformation Based Error Driven Learning
unannotated text
initial state
annotated text
TRUTH
learner
transformation rules
diagram after Brill (1996)
6
TBL Requirements

Initial State Annotator
List of allowable transformations
Scoring function
Search strategy

7
Initial State Annotation

Input
Corpus
Dictionary
Frequency counts for each entry
Output
Corpus tagged with most frequent tags

8
TBL Requirements

Initial State Annotator
List of allowable transformations
Scoring function
Search strategy

9
Transformations

Each transformation comprises
A source tag
A target tag
A triggering environment
Example
NN
VB
Previous tag is TO

10
More Examples
Source tag Target Tag Triggering
Environment NN VB
previous tag is TOVBP VB
one of the three previous
tags is MD JJR RBR
next tag is JJ VBP
VB one of the two previous
words is nt
11
Allowable transforms based on fixed schemas
12
Set of Possible Transformations

The set of possible transformations is
enumerated by allowing
every possible tag or word
in every possible slot
in every possible schema
This set can get quite large

13
TBL Requirements

Initial State Annotator
List of allowable transformations
Scoring function
Search strategy

14
Scoring Function

For a given tagging state of the corpusFor a
given transformation
For every word position in the corpus
If the rule applies and yields a correct tag,
increment score by 1
If the rule applies and yields an incorrect tag,
decrement score by 1

15
TBL Requirements

Initial State Annotator
List of allowable transformations
Scoring function
Search strategy

16
The Basic Algorithm

Label every word with its most likely tag
Repeat the followingwhile improvement gt
threshold
Examine every possible transformation, selecting
the one that results in the most improved tagging
Retag the data according to this rule
Append this rule to output list
Return output list of transformations

17
TBL Remarks

Execution Speed TBL tagger is slower than HMM
approach.
Learning Speed is slow Brills implementation
over a day (600k tokens)
BUT
Learns small number of simple, non-stochastic
rules
Can be made to work faster with Finite State
Transducers

18
Tagging Unknown Words

New words added to (newspaper) language 20 per
month
Plus many proper names
Increases error rates by 1-2
Methods
Assume the unknowns are nouns.
Assume the unknowns have a probability
distribution similar to words occurring once in
the training set.
Use morphological information, e.g. words ending
with ed tend to be tagged VBN.

19
Evaluation

The result is compared with a manually coded
Gold Standard
Typically accuracy reaches 95-97
This may be compared with the result for a
baseline tagger (one that uses no context).
Important 100 accuracy is impossible even for
human annotators.

20
A word of caution

95 accuracy every 20th token wrong
96 accuracy every 25th token wrong
an improvement of 25 from 95 to 96 ???
97 accuracy every 33th token wrong
98 accuracy every 50th token wrong

21
How much training data is needed?

When working with the STTS (50 tags) we observed
a strong increase in accuracy when testing on
10000, 20000, , 50000 tokens,
a slight increase in accuracy when testing on up
to 100000 tokens,
hardly any increase thereafter.

22
Summary

Tagging decisions are conditioned on a wider
range of events that HMM models mentioned
earlier. For example, left and right context can
be used simultaneously.
Learning and tagging are simple, intuitive and
understandable.
Transformation-based learning has also been
applied to sentence parsing.

23
The Three Approaches Compared

Rule Based
Hand crafted rules
It takes too long to come up with good rules
Portability problems
Stochastic
Find sequence with highest probability (Viterbi)
Result of training not accessible to humans
Large storage needs for intermediate results
whilst training
Transformation
Rules are learned
Small number of rules
Rules can be inspected and modified by humans

24
Shallow/Chunk Parsing

Goal divide a sentence into a sequence of
chunks.
Chunks are non-overlapping regions of a text
I saw a tall man in the park.
Chunks are non-recursive
A chunk can not contain other chunks
Chunks are non-exhaustive
Not all words are included in chunks

25
Chunk Parsing Examples

Noun-phrase chunking
I saw a tall man in the park.
Verb-phrase chunking
The man who was in the park saw me.
Prosodic chunking
I saw a tall man in the park.
Question answering
What Spanish explorer discovered the
Mississippi River?

26
Motivation

Locating information
e.g., text retrieval
Index a document collection on its noun phrases
Ignoring information
Generalize in order to study higher-level
patterns
e.g. phrases involving gave in Penn treebank
gave NP gave up NP in NP gave NP up gave NP
help gave NP to NP
Sometimes a full parse has too much structure
Too nested
Chunks usually are not recursive

27
Representation

BIO (or IOB)Trees

28
Comparison with Full Parsing

Parsing is usually an intermediate stage
Builds structures that are used by later stages
of processing
Full parsing is a sufficient but not necessary
intermediate stage for many NLP tasks
Parsing often provides more information than we
need
Shallow parsing is an easier problem
Less word-order flexibility within chunks than
between chunks
More locality
Fewer long-range dependencies
Less context-dependence
Less ambiguity

29
Chunks and Constituency

Constituents a tall man in the park.
Chunks a tall man in the park.
A constituent is part of some higher unit in the
hierarchical syntactic parse
Chunks are not constituents
Constituents are recursive
But, chunks are typically subsequences of
constituents
Chunks do not cross major constituent boundaries

30
Chunk Parsing in NLTK

Chunk parsers usually ignore lexical content
Only need to look at part-of-speech tags
Possible steps in chunk parsing
Chunking, unchunking
Chinking
Merging, splitting
Evaluation
Compare to a Baseline
Evaluate in terms of
Precision, Recall, F-Measure
Missed (False Negative), Incorrect (False
Positive)

31
Chunk Parsing in NLTK

Define a regular expression that matches the
sequences of tags in a chunk
A simple noun phrase chunk regexp
(Note that ltNN.gt matches any tag starting with
NN)
ltDTgt? ltJJgt ltNN.?gt
Chunk all matching subsequences
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN
If matching subsequences overlap, first 1 gets
priority

32
Unchunking

Remove any chunk with a given pattern
e.g., unChunkRule(ltNNDTgt, Unchunk NNDT)
Combine with Chunk Rule ltNNDTJJgt
Chunk all matching subsequences
Input
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN
Apply chunk rule
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN
Apply unchunk rule
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN

33
Chinking

A chink is a subsequence of the text that is not
a chunk.
Define a regular expression that matches the
sequences of tags in a chink
A simple chink regexp for finding NP chunks
(ltVB.?gtltINgt)
First apply chunk rule to chunk everything
Input
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN
ChunkRule('lt.gt', Chunk everything)
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN
Apply Chink rule above
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN

34
Merging

Combine adjacent chunks into a single chunk
Define a regular expression that matches the
sequences of tags on both sides of the point to
be merged
Example
Merge a chunk ending in JJ with a chunk starting
with NN
MergeRule(ltJJgt, ltNNgt, Merge adjs and
nouns)
the/DT little/JJ cat/NN sat/VBD on/IN
the/DT mat/NN
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN
Splitting is the opposite of merging

35
Merging

Combine adjacent chunks into a single chunk
Define a regular expression that matches the
sequences of tags on both sides of the point to
be merged
Example
Merge a chunk ending in JJ with a chunk starting
with NN
MergeRule(ltJJgt, ltNNgt, Merge adjs and
nouns)
the/DT little/JJ cat/NN sat/VBD on/IN
the/DT mat/NN
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN
Splitting is the opposite of merging

36
Next Sessions