Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining May, 18th-20th, 2005 Shotaro Matsumoto, Hiroya Takamura and Manabu Okumura Tokyo Institute of Technology - PowerPoint PPT Presentation

About This Presentation
Title:

Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining May, 18th-20th, 2005 Shotaro Matsumoto, Hiroya Takamura and Manabu Okumura Tokyo Institute of Technology

Description:

Sentiment Classification. using Word Sub-Sequences. and ... Document sentiment classification ... Positive( ) weight shows positive sentiment polarity ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining May, 18th-20th, 2005 Shotaro Matsumoto, Hiroya Takamura and Manabu Okumura Tokyo Institute of Technology


1
Sentiment Classificationusing Word
Sub-Sequencesand Dependency Sub-TreesPacific-As
ia Knowledge Discovery and Data MiningMay,
18th-20th, 2005Shotaro Matsumoto, Hiroya
Takamura and Manabu OkumuraTokyo Institute of
Technology
2
Table of Contents
  • 1. Motivation
  • 2. Our Approach
  • 3. Experiments
  • 4. Result and Discussion
  • 5. Conclusion and Future Work

3
Table of Contents
  • Motivation
  • Background
  • Document Sentiment Classification
  • Early Studies
  • Issue
  • Objective
  • 2. Our Approach
  • 3. Experiments
  • 4. Results and Discussions
  • 5. Conclusion and Future Work

4
Background
  • Online grass-roots reviews are rapidly increasing
  • Contain useful reputation
  • There are so many such documents that we cannot
    read them all
  • Mining reputation from such documents is
    important

5
Document sentiment classification
  • a task to classify an overall document according
    to the positive or negative polarity of its
    opinion(desirable or undesirable)

6
Two steps for the classification
  • Feature extraction
  • convert a document to a feature vector, which
    preserves features of the original document
  • 2. Binary classification
  • Classify the feature vector to positive or
    negative sentiment polarity

7
Early Studies
  • Pang 02
  • Features unigrams in the document
  • Classifier Naïve Bayes, ME Model, Support Vector
    Machines (SVMs)
  • Showed SVMs is superior to others
  • Pang 04
  • Features unigrams obtained from the summary
  • Classifier SVMs
  • Mullen 04
  • Features unigrams, unigrams of lemmatized
    words, prior knowledge from Internet and
    thesaurus
  • Classifier SVMs
  • Get better results than Pang 02

8
Issue
  • Features in early studies
  • A document is represented as a bag-of-words,where
    a text is regarded as a set of words ?
    Word order and syntactic relations between words
    in a sentence, intuitively important for the
    classification, are discarded

9
Objective
  • We propose a method for extracting word order and
    syntactic relations as features.
  • We use frequent sub-patterns in sentences as
    these features.

10
Table of Contents
  • 1. Motivation
  • 2. Our Approach
  • Overview
  • Word Sub-Sequence
  • Dependency Sub-Tree
  • Frequent Sub-pattern
  • 3. Experiments
  • 4. Result and Discussion
  • 5. Conclusion and Future Work

11
Overview
  • We use a word sequence and a dependency tree as
    structured representations of a sentence
  • We extract frequent sub-patterns from sentences
    as features for the classification

12
Word Sub-Sequence
  • A word sequence S
  • Just a sequence of words which represents a
    sentence
  • preserves word order in a sentence
  • A word sub-sequence S of a word sequence S
  • Obtained by removing zero or more words from the
    original sequence
  • Preserve the word order of the original sentence

13
Dependency Sub-Tree
  • A dependency tree D
  • Expresses dependency between words in the
    sentence by child-parent relationships of nodes
  • Preserves syntactic relations between words in
    the sentence
  • A dependency sub-tree D of a dependency tree D
  • Obtained by removing zero or more nodes from the
    original tree
  • Preserves syntactic relations between words in
    the original sentence

14
Frequent Sub-Pattern
  • The number of all sub-patterns(subsequences or
    subtrees) is too large? Use only frequent
    sub-patterns
  • Definition
  • A sentence contains a pattern if and only if the
    pattern is a subsequence or a subtree of the
    sentence
  • A support of a pattern is the number of sentences
    containing the pattern in a dataset
  • If a support of a pattern is a given support
    threshold or more, the pattern is frequent.(In
    this experiment, we fixed support threshold to
    10.)
  • As implementations for mining frequent
    sub-patterns, we use Kudos Prefixspan and FREQT.

15
Table of Contents
  • 1. Motivation
  • 2. Our Approach
  • 3. Experiments
  • Movie review dataset
  • Features
  • Classifiers and Tests
  • 4. Result and Discussion
  • 5. Conclusion and Future Work

16
Movie review dataset
  • Dataset 1 used in Pang 02, Mullen 04
  • 690 positive reviews and 690 negative reviews
  • Written in English
  • 3-fold cross-validation
  • Dataset 2 used in Pang 04
  • 1000 positives and 1000 negatives
  • Written in English
  • 10-fold cross-validation

17
Features
  • We employ the following features and their
    combinations for the classification
  • Bag-of-words features
  • Unigram (ex good, film) uni
  • Unigram patterns appear in at least 2 distinct
    sentences
  • Bigram (ex very good, film is) bi
  • Bigram patterns appear in at least 2 distinct
    sentences
  • Frequent sub-pattern features
  • Word Sub-Sequence seq
  • Dependency Sub-tree dep
  • Features of lemmatized words
  • As in the extraction of the features uni, bi,
    seq, dep,also extract unil, bil, seql, depl

18
Classifiers and Tests (1/2)
  • Classifier
  • Method SVMs, binary classifier based on
    supervised learning
  • Kernel function linear kernel
  • Performance closely depends onits learning
    parameter C (called soft margin parameter) ? We
    carry out three kind of experiments

19
Classifiers and Tests (2/2)
  • Test 1 fix C as 1
  • The result is used for comparison to the early
    studies
  • Test 2 best accuracy with C ? e-2.0, e-1.5, ,
    e2.0
  • Observe potential performance of features
  • Use the result for finding the best effective
    combination of bag-of-words features
  • Test 3 predict a proper value of C from training
    data
  • Observe practical performance of features

20
Table of Contents
  • 1. Motivation
  • 2. Our Approach
  • 3. Experiments
  • 4. Results and Discussion
  • Results
  • Discussion
  • 5. Conclusion and Future Work

21
Results (1/2)
  • Results for dataset 1
  • vs Pang 82.9 ? 87.3 (error reduction 26)vs
    Mullen 84.6 ? 87.3 (error reduction 18)

22
Results (2/2)
  • Results for dataset 2
  • vs Pang 87.1 ? 92.9 (error reduction 45)

23
Discussion
  • From the results of the test1, our method proved
    to be effective
  • Accuracy by features
  • bow dep ? bow dep seq (93)gtgt bow seq
    (89) gt bow (87)
  • Lemmatized features are not always more effective
    than the original ones

24
Table of Contents
  • 1. Motivation
  • 2. Our Approach
  • 3. Experiments
  • 4. Results and Discussion
  • 5. Conclusion and Future Work
  • Conclusion
  • Future Work

25
Conclusion
  • We proposed a method for incorporating word order
    and syntactic relations between words in a
    sentence into document sentiment classification
    by using frequent word sub-sequences and
    dependency sub-trees as features.
  • Experimental results on movie review datasets
    show that our classifiers obtained the best
    results yet published using these datasets.

26
Future Work (1/2)
  • Negative/Interrogative Sentence
  • affirmative sentence This film is good. (1)
  • Negative sentence This film is not good. (2)
  • Interrogative sentence Is this film good? (3)
  • All sub-patterns in sentence (1) are also
    contained in sentence (2).
  • Similarly, there is a large overlap of patterns
    between (1) and (3).
  • Distinguishing these sentence-types would solve
    these problems.

27
Future Work (2/2)
  • Incorporating discourse structures in a document
  • Example (positive movie review)
  • The scenario is simplistic.But I love this film.
  • By a word but, we would know thatI love this
    film is a more important sentence than The
    scenario is simplistic in the sense of
    sentiment classification.

28
Thank you.
29
Examples of Weighed Patterns
  • Positive() weight shows positive sentiment
    polarity
  • Negative(-) weight shows negative sentiment
    polarity
  • The absolute value of each weight indicates how
    large the contribution of the feature is

30
A Word Sequence A Clause
  • Sentences are too long to be used for mining
    frequent sub-sequences
  • Instead of sentences, we used clauses of
    sentences as word sequences
  • As in the figure on the right,We split a
    sentence to a main clause and subordinate
    clauseswith information of parse tree
  • In addition, we removed stopwords.
  • Conjunction, preposition, number, etc

31
References
  • Pang 02
  • Pang 04
  • Mullen 04
Write a Comment
User Comments (0)
About PowerShow.com