NLTK Sentiment Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

NLTK Sentiment Analysis

Description:

Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. – PowerPoint PPT presentation

Number of Views:1050
Slides: 30
Provided by: learntek12
Tags: nltk

less

Transcript and Presenter's Notes

Title: NLTK Sentiment Analysis


1
  • NLTK Sentiment Analysis

2
  • CHAPTER 4
  • THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN
    DEVELOPMENT

3
NLTK Sentiment Analysis About NLTK
  The Natural Language Toolkit, or more
commonly NLTK, is a suite of libraries and
programs for symbolic and statistical natural
language processing (NLP) for English written in
the Python programming language. It was developed
by Steven Bird and Edward Loper in the Department
of Computer and Information Science at the
University of Pennsylvania.
Copyright _at_ 2019 Learntek. All Rights Reserved.
4
Sentiment Analysis   Sentiment Analysis is a
branch of computer science, and overlaps heavily
with Machine Learning, and Computational
Linguistics Sentiment Analysis is the most common
text classification tool that analyses an
incoming message and tells whether the underlying
sentiment is positive, negative our neutral. It
the process of computationally identifying and
categorizing opinions expressed in a piece of
text, especially in order to determine whether
the writers attitude towards a particular topic,
product, etc. is positive, negative, or neutral.
5
Sentiment Analysis is a concept of Natural
Language Processing and Sometimes referred to as
opinion mining, although the emphasis in this
case is on extraction
6
  • Examples of the sentimental analysis are as
    follows  
  • Is this product review positive or negative?
  • Is this customer email satisfied or dissatisfied?
  • Based on a sample of tweets, how are people
    responding to this ad campaign/product
    release/news item?
  • How have bloggers attitudes about the president
    changed since the election?
  • The purpose of this Sentiment Analysis is to
    automatically classify a tweet as a positive or
    Negative Tweet Sentiment wise

7
  • Given a movie review or a tweet, it can be
    automatically classified in categories. These
    categories can be user defined (positive,
    negative) or whichever classes you want.
  • Sentiment Analysis for Brand Monitoring
  • Sentiment Analysis for Customer Service
  • Sentiment Analysis for Market Research and
    Analysis

8
(No Transcript)
9
  • Sample Positive Tweets
  • I love this car
  • This view is amazing
  • I feel great this morning
  • I am so excited about the concert
  • He is my best friend
  • Sample Negative Tweets
  • I do not like this car
  • This view is horrible
  • I feel tired this morning
  • I am not looking forward to the concert
  • He is my enemy

10
  • Sentimental Analysis Process
  • The list of word features need to be extracted
    from the tweets.
  • It is a list with every distinct word ordered by
    frequency of appearance.
  • The use of Feature Extractor to decide which
    features are more relevant.
  • The one we are going to use returns a dictionary
    indicating that words are contained in the input
    passed.

11
(No Transcript)
12
  • Naive Bayes Classifier
  • It uses the prior probability of each label
    which is the frequency of each label in the
    training set and the contribution from each
    feature.
  • In our case, the frequency of each label is the
    same for positive and negative.
  • Word amazing appears in 1 of 5 of the positive
    tweets and none of the negative tweets.
  • This means that the likelihood of the positive
    label will be multiplied by 0.2 when this word is
    seen as part of the input

13
  • Sentiment Analysis Example 1
  • Training Data
  • This is a good book! Positive
  • This is a awesome book! Positive
  • This is a bad book! Negative
  • This is a terrible book! Negative
  • Testing Data
  • This is a good article
  • This is a bad article

14
We will train the model with the help of training
data by using Naïve Bayes Classifier. And then
test the model on testing data.
15
gtgtgt def form_sent(sent) ...return word True
for word in nltk.word_tokenize(sent) ... gtgtgt
form_sent("This is a good book") 'This' True,
'is' True, 'a' True, 'good' True, 'book'
True gtgtgt s1'This is a good book gtgtgt s2'This
is a awesome book gtgtgt s3'This is a bad book
gtgtgt s4'This is a terrible book'
gtgtgt training_dataform_sent(s1),'pos',form_sen
t(s2),'pos',form_sent(s3),'neg',form_sent(s4),
'neg'
gtgtgt for t in training_dataprint(t) ...
'This' True, 'is' True, 'a' True, 'good'
True, 'book' True, 'pos 'This' True,
'is' True, 'a' True, 'awesome' True, 'book'
True, 'pos'
16
'This' True, 'is' True, 'a' True, 'bad'
True, 'book' True, 'neg 'This' True,
'is' True, 'a' True, 'terrible' True, 'book'
True, 'neg gtgtgt from nltk.classify import
NaiveBayesClassifier gtgtgt model
NaiveBayesClassifier.train(training_data)
gtgtgtmodel.classify(form_sent('This is a good
article)) 'pos gtgtgtmodel.classify(form_sent('Thi
s is a bad article)) 'neg gtgtgt
17
(No Transcript)
18
Accuracy NLTK has a built-in method that
computes the accuracy rate of our model
gtgtgt from nltk.classify.util import accuracy 
Sentiment Analysis Example 2   Gender
Identification we know that male and female
names have some distinctive characteristics.
Generally, Names ending in a, e and i are likely
to be female, while names ending in k, o, r, s
and t are likely to be male. We build a
classifier to model these differences more
precisely.
19
gtgtgt def gender_features(word) ...     return
'last_letter' word-1 gtgtgt gender_features('Sh
rek) 'last_letter' 'k'
Now that weve defined a feature extractor, we
need to prepare a list of examples and
corresponding class labels.
gtgtgt from nltk.corpus import names gtgtgt
labeled_names ((name, 'male') for name in
names.words('male.txt') ... (name, 'female')
for name in names.words('female.txt')) gtgtgt
import random gtgtgt random.shuffle(labeled_names)
20
Next, the feature extractor is using to process
the names data and divide the resulting list of
feature sets into a training set and a test set.
The training set is used to train a new naive
Bayes classifier.
gtgtgt featuresets (gender_features(n), gender)
for (n, gender) in labeled_names gtgtgt
train_set, test_set featuresets500,
featuresets500 gtgtgt classifier
nltk.NaiveBayesClassifier.train(train_set)
21
(No Transcript)
22
Lets just test it out on some names that did not
appear in its training data
gtgtgt classifier.classify(gender_features('Neo))
'male gtgtgt classifier.classify(gender_features
('olvin)) 'male gtgtgt classifier.classify(gend
er_features('ricky)) 'female gtgtgt
classifier.classify(gender_features('serena))
'female'
23
gtgtgt classifier.classify(gender_features('cyra))
'female gtgtgt classifier.classify(gender_feature
s('leeta)) 'female gtgtgt classifier.classify(g
ender_features('rock)) 'male gtgtgt
classifier.classify(gender_features('jack))
'male'
24
We can systematically evaluate the classifier on
a much larger quantity of unseen data
gtgtgt print(nltk.classify.accuracy(classifier,
test_set)) 0.77
Finally, we can examine the classifier to
determine which features it found most effective
for distinguishing the names genders
25
gtgtgt classifier.show_most_informative_features(20)
Most Informative Features last_letter
'a'            female male        35.5 1.0
last_letter 'k'             male female     
30.7 1.0 last_letter 'p'              male
female      20.8 1.0 last_letter
'f'              male female      15.9 1.0
last_letter 'd'              male female
     11.5 1.0 last_letter 'v'             
male female       9.8 1.0
26
last_letter 'o'              male female
      8.7 1.0 last_letter 'w'             
male female       8.4 1.0 last_letter
'm'              male female       8.2 1.0
last_letter 'r'              male female
      7.0 1.0 last_letter 'g'             
male female       5.1 1.0 last_letter
'b'              male female       4.4 1.0
last_letter 's'              male female
      4.3 1.0
27
last_letter 'z'              male female
      3.9 1.0 last_letter 'j'             
male female       3.9 1.0 last_letter
't'              male female       3.8 1.0
last_letter 'i'            female male  
      3.8 1.0 last_letter 'u'             
male female       3.0 1.0 last_letter
'n'              male female       2.1 1.0
last_letter 'e'            female male  
      1.8 1.0
28
(No Transcript)
29
For more Training Information , Contact
Us Email info_at_learntek.org USA 1734 418
2465 INDIA 40 4018 1306
7799713624
Write a Comment
User Comments (0)
About PowerShow.com