Veeranna 'A 'Y - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Veeranna 'A 'Y

Description:

Number of Views:49

Avg rating:3.0/5.0

Slides: 20

Provided by: Pych

Category:

Tags: interpretability | veeranna

Transcript and Presenter's Notes

Title: Veeranna 'A 'Y

1
Sentiment Analysis

2
Hindi SentiWordNet

Giving sentiment scores to every Hindi synset
just like English SentiWordNet.
Three scores pos, neg obj scores to every
synset.
Claim Sentiment scores given to English word
senses also apply to their Hindi counterpart.
Hindi-English synset linking work is in progress,
half of Hindi WordNet has been mapped to English
WordNet.

3
Building Hindi SentiWordNet

Various mappings required
Hindi-English synset mapping (mapped to English
WordNet ver-2.1 is used)
ver-2.1 to ver-2.0 synset mapping (SentiWordNet
is based on ver-2.0)
ver-2.0 to SentiWordNet sense no's mapping
Sense no's are not same across all the vers of
English WordNet, hence mapping of synsets across
WordNet vers is required.
ver-2.0 SentiWordNet offset no's doesnt match.
Currently 12408 sense have been linked given
senti-scores.

4
Table 1 Difference in Sense no's across WordNet
ver 2.1 , 2.0 SentiWordNet
Table 2 Examples from Hindi SentiWordNet
5
Connotation

Connotation is a subjective cultural and/or
emotional coloration addition to the explicit or
denotative meaning of any specific word or phrase
in a language, i.e. emotional association with a
word.
Connotations are used to express opinions.
E.g. sea of people, you are a dog, stock
market hit a decade low.

6
Feature Engineering

Connotation of word can give a sentence positive,
negative objective interpretation.
E.g. you are a dog has a negative
interpretation of a person.
every child needs a good home has a positive
interpretation of home.
How much can a word shift the interpretation of
sentence to either positive or negative from
neutral can be measured by Interpretability score
Dan Tufis (FASSBL,2008)

7
Feature Engineering

where, maxP(swk) and maxN(swk) representing the
highest positive and negative scores among the
senses of swk senti-word. 0 IS 1.
Currently in SentiWordNet, the senti-words with
the highest interpretability score (IS 0.875)
are pretty, immoral and gross.
Words with IS score IS gt n (n is a parameter)
are chosen as feature words for experiment.
Each document is represented as a vector of these
features.

8
Naïve Bayes

Finding the probability of class being c given
that the document is d .
The document d is classified as class
Each document d is represented by the document
vector
, where ni(d)be the number of times fi occurs in
document d.
Document vector can be made in two ways, by
letting ni(d) to take either the count of
features or take the binary value (0 or 1)
depending on the presence or absence of feature.

9
Results

Experiment was done using 1000 positive 1000
negative documents.
Testing was done with unigrams only with
unigrams multi-grams.
It was observed that as the no of features
decreased accuracy increased.
Naive Bayes gives high accuracy when document
vector is taken as frequency count than when
taken as binary presence value
Highest accuracy of 90.2 was achieved with
features being Unigrams multi-grams ni(d) set
to frequency count. IS 0.5 (for Unigrams 500
unigrams) IS 0.2 (for multi-grams 1740
multi-grams.

10
SVM

Idea behind the training procedure is to find a
hyperplane, represented by vector w , that not
only separates the document vectors (dj) in one
class (cj) from those in the other, but for which
the separation, or margin, is as large as
possible
where the aj s are obtained by solving a
dual optimization problem.
package was used for training and
testing, with all parameters set to their default
values.

11
Results

Experiment was done using 1000 positive 1000
negative documents.
Testing was done with unigrams only with
unigrams multi-grams.
SVM gives good accuracy when document vector is
taken as binary presence value than when taken as
frequency count.
Accuracy of 73.35 was achieved with features
being Unigrams multi-grams ni(d) set to
frequency count. IS 0.5 (for Unigrams 500
unigrams) IS 0.1 (for multi-grams 7215
multi-grams).
Accuracy of 74.55 was achieved with features
being Unigrams multi-grams ni(d) set to
binary presence value. IS 0.5 (for Unigrams
500 unigrams) IS 0.2 (for multi-grams 1740
multi-grams.

12
Minimum Graph Cut

xis are sentences in a document.
Framework
indj (xi) estimate that xi belongs to Cj based
on features present in the sentence only
(individual score).
assoc (xi, xk) estimate that xi xk belong to
the same class (association score).
Finding minimum partition cost problem reduces to
finding minimum cut in graph.
Minimum Cut in a graph finding a cut through
edges in a graph such that partitioned components
are connected graphs and sum of edges lying on
cut should be minimum.

15
Results

Experiment was done using 1000 positive 1000
negative documents, 5000 subjective 5000
objective sentences.
Either Naive Bayes or SVM along with graph cut
method can be used as Subjectivity detector at
the initial phase to get an subjective extract of
documents.
Similarly both can be used as polarity detector
to classify the subjective extract got in initial
phase.
Accuracy of 92.4 was achieved with Naive Bayes
as subjectivity detector graph cut, to get
subjective extract again Naive Bayes as
polarity detector to get final classification.
Other parameters were, features being Unigrams
multi-grams ni(d) set to frequency count, IS
0.5 (for Unigrams 500 unigrams) IS 0.2 (for
multi-grams 1740 multi-grams.

16
Comparison with existing algorithms without
SentiWordNet

Table 3 Comparison of accuracies of existing
methods with without using SentiWordNet
Comparison is done only for best accuracies
achieved in both cases. And also parameters vary
in both cases.
There is an increase in accuracy by including
SentiWordNet into current existing algorithms for
Sentiment Analysis.
No of features have been reduced from previous
methods.

17
Conclusion

Direct mapping of English Sentiment scores to
Hindi Synsets is justified when the mapping of
Hindi-English Synsets mapping of synset nos
across versions of WordNet is justified.
Main aim of the SentiWordNet is to capture the
intricacies of texts. Hence SentiWordNet gives
sentiment scores to each synset rather than just
hard binary classification of synsets. One of the
intricacy of texts is Connotation which has been
captured by giving each synset an
Interpretability Score.
Sentiment Analysis based on Connotation Analysis
shows good results. Connotations are good at
capturing Positive, Negative objective
interpretations of texts.
There is an increase in accuracy by including
SentiWordNet connotation into current existing
algorithms for Sentiment Analysis.

18
References

Pang Bo, Lillian Lee, and Shivakumar
Vaithyanathan. Thumbs up? Sentiment
Classification using Machine Learning Techniques.
In EMNLP 2002.
Pang Bo, Lillian Lee. A Sentimental Education
Sentiment Analysis Using Subjectivity
Summarization Based on Minimum Cuts. In
Proceedings of ACL 2004.
"CONNOTATION ANALYSIS" Dan Tufis (FASSBL,2008)
J.Ramanand, Akshay Ukey, Brahm Kiran Singh and
Pushpak Bhattacharyya, Mapping and Structural
Analysis of Multilingual WordNet, IEEE Data
Engineering Bulletin, 30(1), March 2007.