Title: Learning Extraction Patterns for Subjective Expressions
1Learning Extraction Patterns for Subjective
Expressions
- Ellen Riloff Janyce Wiebe
- University of Utah University of Pittsburgh
2Subjectivity
- Subjective language includes opinions, rants,
allegations, accusations, suspicions, and
speculation - Distinguishing factual from subjective
information could benefit many applications - information extraction
- question answering
- summarization
3Goals
- Sentence-level subjectivity classification
- (Wiebe et al. 2001) found that 44 of sentences
in news articles are subjective - Learning subjectivity clues from unannotated text
corpora - Learning linguistically rich patterns
4Previous Work Subjectivity Analysis
- Document-level subjectivity classification (e.g.,
Turney 2002 Pang et al 2002 Spertus 1997) and
above (Tong 2001) - Genre classification (e.g., Karlgren and Cutting
1994 Kessler et al. 1997 Wiebe et al. 2001) - Supervised sentence-level classification (Wiebe
et al 1999) - Learning adjectives, adjectival phrases, verbs,
nouns, and N-grams (e.g., Turney 2002
Hatzivassiloglou McKeown 1997 Wiebe et al.
2001)
5Recent Related Work
- Yu and Hatzivassiloglou (EMNLP03). Unsupervised
sentence level classification. Complementary
approach and features. - Dave et al. (WWW03) reviews classified as
positive or negative. - Agrawal et al. (WWW03) newsgroup authors
partitioned into camps based on quotation links - Gordon et al. (ACL03) manually developed
grammars for some types of subjective language
6Extraction Patterns
- Extraction patterns are lexico-syntactic patterns
to identify relevant information - Typically they represent role relationships
surrounding noun and verb phrases - hijacking of ltxgt hijacked vehicle
- ltxgt was hijacked hijacked vehicle
- ltxgt hijacked hijacker
7Our Method
- Subjective expressions represented as extraction
patterns - get to know ltdobjgt ltsubjgt appear to be
ltsubjgt was satisfied ltsubjgt complained - Supervised extraction pattern learning
- Training data generated automatically
- Entire process bootstrapped
8Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
9Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
10Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
11Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
12Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
13Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
14Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
15Unannotated Text Collection
English language versions of FBIS news articles
from a variety of countries.
Size 302,160 sentences
16Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
17- From previous work
- Manually identified
- (e.g, entries from Levin 1993)
- Automatically identified
- (e.g., nouns from Riloff et al. 2003)
Known subjective vocabulary
18- From previous work
- Manually identified
- (e.g, entries from Levin 1993)
- Automatically identified
- (e.g., nouns from Riloff et al. 2003)
Known subjective vocabulary
- Strongly subjective most instances
- subjective
- Weakly subjective objective instances
- also common
19- From previous work
- Manually identified
- (e.g, entries from Levin 1993)
- Automatically identified
- (e.g., nouns from Riloff et al. 2003)
Known subjective vocabulary
Any data used is separate from data in this paper
- Strongly subjective most instances
- subjective
- Weakly subjective objective instances
- also common
20Unannotated Text Collection
unlabeled sentences
subjective sentences
Subjective gt1 strongly subjective Classifier
clue
Known subjective vocabulary
unlabeled sentences
91.3 Precision 31.9 Recall Test set 2197
sentences 59 subjective
objective sentences
Objective Classifier
21Unannotated Text Collection
unlabeled sentences
Subjective gt1 strongly subjective Classifier
clue
Known subjective vocabulary
unlabeled sentences
Objective 0 strongly subjective clue
Classifier 0 or 1 weakly subjective clue in
previous, current, next sentence
objective sentences
82.6 Precision 16.4 Recall
22Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
23Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
2417,000
Subjective Classifier
subjective sentences
relevant texts
17,000
Extraction Pattern AutoSlog-TS Learner
Riloff 1996
objective sentences
Objective Classifier
subjective patterns
irrelevant texts
25Step 1 Apply Syntactic Templates
- ltsubjgtactive-verb dobj ltsubjgt dealt blow
- ltsubjgt verb infinitive ltsubjgt appear to be
- ltsubjgt aux noun ltsubjgt has position
- Active-verb ltdobjgt endorsed ltdobjgt
- Verb infinitive ltdobjgt get to know ltdobjgt
- Noun prep ltnpgt opinion on ltnpgt
- Infinitive prep ltnpgt to resort to ltnpgt
26Step 1 Apply Syntactic Templates
- ltsubjgtactive-verb dobj ltsubjgt dealt blow
- ltsubjgt verb infinitive ltsubjgt appear to be
- ltsubjgt aux noun ltsubjgt has position
- Active-verb ltdobjgt endorsed ltdobjgt
- Verb infinitive ltdobjgt get to know ltdobjgt
- Noun prep ltnpgt opinion on ltnpgt
- Infinitive prep ltnpgt to resort to ltnpgt
27Step 1 Apply Syntactic Templates
- ltsubjgtactive-verb dobj ltsubjgt dealt blow
- Matches any sentence with
- verb phrase with headdealt
- direct object with headblow.
- The experience certainly dealt a stiff blow to
his pride.
28Step 2 Select Patterns
- Apply all learned patterns to training data
- Rank patterns
- Prec(pattern) p(subjective pattern)
- in subjective sentences / total
- Choose patterns with
- Frequency gt F
- Prec gt P
- on the training data for some F and P
29Examples from Training Data
SUBJ
ltsubjgt was asked 100
ltsubjgt asked 63
ltsubjgt is talk 100
talk of ltnpgt 90
ltsubjgt will talk 71
was expected from ltnpgt 100
ltsubjgt was expected 42
ltsubjgt is fact 100
fact is ltdobjgt 100
30Examples from Training Data
SUBJ
ltsubjgt was asked 100
ltsubjgt asked 63
ltsubjgt is talk 100
talk of ltnpgt 90
ltsubjgt will talk 71
was expected from ltnpgt 100
ltsubjgt was expected 42
ltsubjgt is fact 100
fact is ltdobjgt 100
31Examples from Training Data
SUBJ
ltsubjgt was asked 100
ltsubjgt asked 63
ltsubjgt is talk 100
talk of ltnpgt 90
ltsubjgt will talk 71
was expected from ltnpgt 100
ltsubjgt was expected 42
ltsubjgt is fact 100
fact is ltdobjgt 100
32Examples from Training Data
SUBJ
ltsubjgt was asked 100
ltsubjgt asked 63
ltsubjgt is talk 100
talk of ltnpgt 90
ltsubjgt will talk 71
was expected from ltnpgt 100
ltsubjgt was expected 42
ltsubjgt is fact 100
fact is ltdobjgt 100
33Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
34Test Data
- Manual annotation to support project
investigating multiple perspective QA (ARDA
AQUAINT NRRC) - 0.77 ave pair-wise kappa
- 0.89 ave pair-wise kappa with borderline
sentences removed (11 of the corpus) - Wilson Wiebe, SIGDIAL 2003, describes the
annotation scheme and agreement study
35Example
The Foreign Ministry said Thursday that it was
surprised, to put it mildly
by the U.S. State Departments criticism of
Russias human rights
record and objected in particular to the odious
section on Chechnya.
36(No Transcript)
37Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
38Evaluation of Learned Patterns
- Test data
- 3947 sentences
- 54 subjective
- Train Test
- F gt 9 P 100 P 85 Recall 41
- F gt 1 P gt 59 P 71 Recall 92
39Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
4017000
Subjective Classifier
subjective sentences
unlabeled sentences
Extraction Pattern Learner
17000
Objective Classifier
objective sentences
subjective patterns
new subjective sentences
unlabeled sentences
Pattern-Based Subjective Classifier
4117000
Subjective Classifier
subjective sentences
unlabeled sentences
Extraction Pattern Learner
17000
Objective Classifier
objective sentences
subjective patterns
9500 new subjective sentences
unlabeled sentences
Pattern-Based Subjective Classifier gt 0 instances
of patterns with F gt4 P 1 on training data
4217000
7500
Subjective Classifier
subjective sentences
unlabeled sentences
Extraction Pattern Learner
9500 new subjective sentences
17000
Objective Classifier
objective sentences
unlabeled sentences
Pattern-Based Subjective Classifier
4317000
7500
Subjective Classifier
subjective sentences
unlabeled sentences
Extraction Pattern Learner
9500 new subjective sentences
17000
Objective Classifier
objective sentences
new subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
4248 patterns P gt .59 on training data 308
patterns P 1.0 on training data
4417000
7500
Subjective Classifier
subjective sentences
unlabeled sentences
Extraction Pattern Learner
9500 new subjective sentences
17000
Objective Classifier
objective sentences
new subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
Evaluate new old patterns on test set
Recall 24 Prec -0.52
45Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
46unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
Extraction Pattern Learner
47unlabeled sentences
subjective patterns F gt 9, P 1.0 on training
data
Subjective Classifier New subjective Sentences
1 old clue 1 new gt1 new
old new subjective sentences
Extraction Pattern Learner
Known subjective vocabulary
48unlabeled sentences
subjective patterns F gt 9, P 1.0 on training
data
Subjective Classifier New subjective Sentences
1 old clue 1 new gt1 new
old new subjective sentences
Extraction Pattern Learner
Known subjective vocabulary
49Evaluation on Test Data
- Original subjective classifier
- 32.9 recall 91.3 precision
- Augmented subjective classifier
- 40.1 recall 90.2 precision
50Future Work
51Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
52- Improve original high-precision classifier
- Identify new objective sentences during
bootstrapping
Known subjective vocabulary
Extraction Pattern Learner
objective sentences
Objective Classifier
objective sentences
unlabeled sentences
Pattern-Based Objective Classifier
53Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
54Unannotated Text Collection
unlabeled sentences
Subjective Classifier Iteration 0 Iteration 1
subjective sentences
Known subjective vocabulary
55- Build up subjective lexicon as the process
- is applied to new corpora.
- Human review of high precision patterns
-
- Tough act to follow linguistic subjectivity
- Rush Limbaugh opinionated source
- police lightning rod topic
-
Known subjective vocabulary
- Richer Representation with deeper knowledge
- (theta roles, polarity, tone, ambiguity,)
56Conclusions
- High-precision subjectivity classification can be
used to generate large amounts of labeled
training data - Extraction pattern learning techniques can learn
linguistically rich subjective patterns - Bootstrapping process results in higher recall
with little loss in precision
57Annotation Scheme
- The annotation scheme was developed as part of a
U.S. government-sponsored project (ARDA AQUAINT
NRRC) to investigate multiple perspective
question answering. - Annotators labeled private state expressions.
- Each private state can have low, medium, or high
strength. - Our gold standard considers a sentence to be
subjective if it contains at least one private
state expression of medium or higher strength.
58Two Ways of Expressing Private States
- Explicit mentions of private states and speech
events - The United States fears a spill-over from the
anti-terrorist campaign - Expressive subjective elements
- The part of the US human rights report about
China is full of absurdities and fabrications.
59Nested Sources
60OnlyFactive
OnlyFactiveyes
The US fears a spill-over, said Xirao-Nima, a
professor of foreign affairs at the Central
University for Nationalities.
61Example
The Foreign Ministry said Thursday that it was
surprised, to put it mildly
by the U.S. State Departments criticism of
Russias human rights
record and objected in particular to the odious
section on Chechnya.
62Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier
63Unannotated Text Collection
unlabeled sentences
subjective patterns
Subjective Classifier
subjective sentences
Known subjective vocabulary
unlabeled sentences
Extraction Pattern Learner
objective sentences
Objective Classifier
subjective sentences
subjective patterns
unlabeled sentences
Pattern-Based Subjective Classifier