Veeranna 'A 'Y - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Veeranna 'A 'Y

Description:

... the senti-words with the highest interpretability score (IS= 0.875) are pretty, ... which has been captured by giving each synset an Interpretability Score. ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 20
Provided by: Pych
Category:

less

Transcript and Presenter's Notes

Title: Veeranna 'A 'Y


1
Sentiment Analysis
  • Veeranna .A .Y
  • 05005023

2
Hindi SentiWordNet
  • Giving sentiment scores to every Hindi synset
    just like English SentiWordNet.
  • Three scores pos, neg obj scores to every
    synset.
  • Claim Sentiment scores given to English word
    senses also apply to their Hindi counterpart.
  • Hindi-English synset linking work is in progress,
    half of Hindi WordNet has been mapped to English
    WordNet.

3
Building Hindi SentiWordNet
  • Various mappings required
  • Hindi-English synset mapping (mapped to English
    WordNet ver-2.1 is used)
  • ver-2.1 to ver-2.0 synset mapping (SentiWordNet
    is based on ver-2.0)
  • ver-2.0 to SentiWordNet sense no's mapping
  • Sense no's are not same across all the vers of
    English WordNet, hence mapping of synsets across
    WordNet vers is required.
  • ver-2.0 SentiWordNet offset no's doesnt match.
  • Currently 12408 sense have been linked given
    senti-scores.

4
Table 1 Difference in Sense no's across WordNet
ver 2.1 , 2.0 SentiWordNet
Table 2 Examples from Hindi SentiWordNet
5
Connotation
  • Connotation is a subjective cultural and/or
    emotional coloration addition to the explicit or
    denotative meaning of any specific word or phrase
    in a language, i.e. emotional association with a
    word.
  • Connotations are used to express opinions.
  • E.g. sea of people, you are a dog, stock
    market hit a decade low.

6
Feature Engineering
  • Connotation of word can give a sentence positive,
    negative objective interpretation.
  • E.g. you are a dog has a negative
    interpretation of a person.
  • every child needs a good home has a positive
    interpretation of home.
  • How much can a word shift the interpretation of
    sentence to either positive or negative from
    neutral can be measured by Interpretability score
    Dan Tufis (FASSBL,2008)

7
Feature Engineering
  • where, maxP(swk) and maxN(swk) representing the
    highest positive and negative scores among the
    senses of swk senti-word. 0 IS 1.
  • Currently in SentiWordNet, the senti-words with
    the highest interpretability score (IS 0.875)
    are pretty, immoral and gross.
  • Words with IS score IS gt n (n is a parameter)
    are chosen as feature words for experiment.
  • Each document is represented as a vector of these
    features.

8
Naïve Bayes
  • Finding the probability of class being c given
    that the document is d .
  • The document d is classified as class
  • Each document d is represented by the document
    vector

  • , where ni(d)be the number of times fi occurs in
    document d.
  • Document vector can be made in two ways, by
    letting ni(d) to take either the count of
    features or take the binary value (0 or 1)
    depending on the presence or absence of feature.

9
Results
  • Experiment was done using 1000 positive 1000
    negative documents.
  • Testing was done with unigrams only with
    unigrams multi-grams.
  • It was observed that as the no of features
    decreased accuracy increased.
  • Naive Bayes gives high accuracy when document
    vector is taken as frequency count than when
    taken as binary presence value
  • Highest accuracy of 90.2 was achieved with
    features being Unigrams multi-grams ni(d) set
    to frequency count. IS 0.5 (for Unigrams 500
    unigrams) IS 0.2 (for multi-grams 1740
    multi-grams.

10
SVM
  • Idea behind the training procedure is to find a
    hyperplane, represented by vector w , that not
    only separates the document vectors (dj) in one
    class (cj) from those in the other, but for which
    the separation, or margin, is as large as
    possible
  • where the aj s are obtained by solving a
    dual optimization problem.
  • package was used for training and
    testing, with all parameters set to their default
    values.

11
Results
  • Experiment was done using 1000 positive 1000
    negative documents.
  • Testing was done with unigrams only with
    unigrams multi-grams.
  • SVM gives good accuracy when document vector is
    taken as binary presence value than when taken as
    frequency count.
  • Accuracy of 73.35 was achieved with features
    being Unigrams multi-grams ni(d) set to
    frequency count. IS 0.5 (for Unigrams 500
    unigrams) IS 0.1 (for multi-grams 7215
    multi-grams).
  • Accuracy of 74.55 was achieved with features
    being Unigrams multi-grams ni(d) set to
    binary presence value. IS 0.5 (for Unigrams
    500 unigrams) IS 0.2 (for multi-grams 1740
    multi-grams.

12
Minimum Graph Cut
  • xis are sentences in a document.
  • Framework
  • indj (xi) estimate that xi belongs to Cj based
    on features present in the sentence only
    (individual score).
  • assoc (xi, xk) estimate that xi xk belong to
    the same class (association score).
  • Finding minimum partition cost problem reduces to
    finding minimum cut in graph.
  • Minimum Cut in a graph finding a cut through
    edges in a graph such that partitioned components
    are connected graphs and sum of edges lying on
    cut should be minimum.

13
  • Fig Graph cut based subjectivity extraction
  • Courtesy Lillian Lee, et al. ACL 2004.

14
  • Fig Polarity classification via subjectivity
    detection
  • Courtesy Lillian Lee, et al. ACL 2004

15
Results
  • Experiment was done using 1000 positive 1000
    negative documents, 5000 subjective 5000
    objective sentences.
  • Either Naive Bayes or SVM along with graph cut
    method can be used as Subjectivity detector at
    the initial phase to get an subjective extract of
    documents.
  • Similarly both can be used as polarity detector
    to classify the subjective extract got in initial
    phase.
  • Accuracy of 92.4 was achieved with Naive Bayes
    as subjectivity detector graph cut, to get
    subjective extract again Naive Bayes as
    polarity detector to get final classification.
    Other parameters were, features being Unigrams
    multi-grams ni(d) set to frequency count, IS
    0.5 (for Unigrams 500 unigrams) IS 0.2 (for
    multi-grams 1740 multi-grams.

16
Comparison with existing algorithms without
SentiWordNet
  • Table 3 Comparison of accuracies of existing
    methods with without using SentiWordNet
  • Comparison is done only for best accuracies
    achieved in both cases. And also parameters vary
    in both cases.
  • There is an increase in accuracy by including
    SentiWordNet into current existing algorithms for
    Sentiment Analysis.
  • No of features have been reduced from previous
    methods.

17
Conclusion
  • Direct mapping of English Sentiment scores to
    Hindi Synsets is justified when the mapping of
    Hindi-English Synsets mapping of synset nos
    across versions of WordNet is justified.
  • Main aim of the SentiWordNet is to capture the
    intricacies of texts. Hence SentiWordNet gives
    sentiment scores to each synset rather than just
    hard binary classification of synsets. One of the
    intricacy of texts is Connotation which has been
    captured by giving each synset an
    Interpretability Score.
  • Sentiment Analysis based on Connotation Analysis
    shows good results. Connotations are good at
    capturing Positive, Negative objective
    interpretations of texts.
  • There is an increase in accuracy by including
    SentiWordNet connotation into current existing
    algorithms for Sentiment Analysis.

18
References
  • Pang Bo, Lillian Lee, and Shivakumar
    Vaithyanathan. Thumbs up? Sentiment
    Classification using Machine Learning Techniques.
    In EMNLP 2002.
  • Pang Bo, Lillian Lee. A Sentimental Education
    Sentiment Analysis Using Subjectivity
    Summarization Based on Minimum Cuts. In
    Proceedings of ACL 2004.
  • "CONNOTATION ANALYSIS" Dan Tufis (FASSBL,2008)
  • J.Ramanand, Akshay Ukey, Brahm Kiran Singh and
    Pushpak Bhattacharyya, Mapping and Structural
    Analysis of Multilingual WordNet, IEEE Data
    Engineering Bulletin, 30(1), March 2007.

19
  • Thank You
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com