NLTK Sentiment Analysis - PowerPoint PPT Presentation

About This Presentation

Title:

NLTK Sentiment Analysis

Description:

Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. – PowerPoint PPT presentation

Number of Views:1050

Slides: 30

Provided by: learntek12

Category: How To, Education & Training

Tags: nltk

more less

Transcript and Presenter's Notes

Title: NLTK Sentiment Analysis

1

NLTK Sentiment Analysis

CHAPTER 4
THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN
DEVELOPMENT

3
NLTK Sentiment Analysis About NLTK
The Natural Language Toolkit, or more
commonly NLTK, is a suite of libraries and
programs for symbolic and statistical natural
language processing (NLP) for English written in
the Python programming language. It was developed
by Steven Bird and Edward Loper in the Department
of Computer and Information Science at the
University of Pennsylvania.
Copyright _at_ 2019 Learntek. All Rights Reserved.
4
Sentiment Analysis Sentiment Analysis is a
branch of computer science, and overlaps heavily
with Machine Learning, and Computational
Linguistics Sentiment Analysis is the most common
text classification tool that analyses an
incoming message and tells whether the underlying
sentiment is positive, negative our neutral. It
the process of computationally identifying and
categorizing opinions expressed in a piece of
text, especially in order to determine whether
the writers attitude towards a particular topic,
product, etc. is positive, negative, or neutral.
5
Sentiment Analysis is a concept of Natural
Language Processing and Sometimes referred to as
opinion mining, although the emphasis in this
case is on extraction
6

Examples of the sentimental analysis are as
follows
Is this product review positive or negative?
Is this customer email satisfied or dissatisfied?
Based on a sample of tweets, how are people
responding to this ad campaign/product
release/news item?
How have bloggers attitudes about the president
changed since the election?
The purpose of this Sentiment Analysis is to
automatically classify a tweet as a positive or
Negative Tweet Sentiment wise

Given a movie review or a tweet, it can be
automatically classified in categories. These
categories can be user defined (positive,
negative) or whichever classes you want.
Sentiment Analysis for Brand Monitoring
Sentiment Analysis for Customer Service
Sentiment Analysis for Market Research and
Analysis

8
(No Transcript)
9

Sample Positive Tweets
I love this car
This view is amazing
I feel great this morning
I am so excited about the concert
He is my best friend
Sample Negative Tweets
I do not like this car
This view is horrible
I feel tired this morning
I am not looking forward to the concert
He is my enemy

Sentimental Analysis Process
The list of word features need to be extracted
from the tweets.
It is a list with every distinct word ordered by
frequency of appearance.
The use of Feature Extractor to decide which
features are more relevant.
The one we are going to use returns a dictionary
indicating that words are contained in the input
passed.

11
(No Transcript)
12

Naive Bayes Classifier
It uses the prior probability of each label
which is the frequency of each label in the
training set and the contribution from each
feature.
In our case, the frequency of each label is the
same for positive and negative.
Word amazing appears in 1 of 5 of the positive
tweets and none of the negative tweets.
This means that the likelihood of the positive
label will be multiplied by 0.2 when this word is
seen as part of the input

Sentiment Analysis Example 1
Training Data
This is a good book! Positive
This is a awesome book! Positive
This is a bad book! Negative
This is a terrible book! Negative

Testing Data
This is a good article
This is a bad article

14
We will train the model with the help of training
data by using Naïve Bayes Classifier. And then
test the model on testing data.
15
gtgtgt def form_sent(sent) ...return word True
for word in nltk.word_tokenize(sent) ... gtgtgt
form_sent("This is a good book") 'This' True,
'is' True, 'a' True, 'good' True, 'book'
True gtgtgt s1'This is a good book gtgtgt s2'This
is a awesome book gtgtgt s3'This is a bad book
gtgtgt s4'This is a terrible book'
gtgtgt training_dataform_sent(s1),'pos',form_sen
t(s2),'pos',form_sent(s3),'neg',form_sent(s4),
'neg'
gtgtgt for t in training_dataprint(t) ...
'This' True, 'is' True, 'a' True, 'good'
True, 'book' True, 'pos 'This' True,
'is' True, 'a' True, 'awesome' True, 'book'
True, 'pos'
16
'This' True, 'is' True, 'a' True, 'bad'
True, 'book' True, 'neg 'This' True,
'is' True, 'a' True, 'terrible' True, 'book'
True, 'neg gtgtgt from nltk.classify import
NaiveBayesClassifier gtgtgt model
NaiveBayesClassifier.train(training_data)
gtgtgtmodel.classify(form_sent('This is a good
article)) 'pos gtgtgtmodel.classify(form_sent('Thi
s is a bad article)) 'neg gtgtgt
17
(No Transcript)
18
Accuracy NLTK has a built-in method that
computes the accuracy rate of our model
gtgtgt from nltk.classify.util import accuracy
Sentiment Analysis Example 2 Gender
Identification we know that male and female
names have some distinctive characteristics.
Generally, Names ending in a, e and i are likely
to be female, while names ending in k, o, r, s
and t are likely to be male. We build a
classifier to model these differences more
precisely.
19
gtgtgt def gender_features(word) ...     return
'last_letter' word-1 gtgtgt gender_features('Sh
rek) 'last_letter' 'k'
Now that weve defined a feature extractor, we
need to prepare a list of examples and
corresponding class labels.
gtgtgt from nltk.corpus import names gtgtgt
labeled_names ((name, 'male') for name in
names.words('male.txt') ... (name, 'female')
for name in names.words('female.txt')) gtgtgt
import random gtgtgt random.shuffle(labeled_names)
20
Next, the feature extractor is using to process
the names data and divide the resulting list of
feature sets into a training set and a test set.
The training set is used to train a new naive
Bayes classifier.
gtgtgt featuresets (gender_features(n), gender)
for (n, gender) in labeled_names gtgtgt
train_set, test_set featuresets500,
featuresets500 gtgtgt classifier
nltk.NaiveBayesClassifier.train(train_set)
21
(No Transcript)
22
Lets just test it out on some names that did not
appear in its training data
gtgtgt classifier.classify(gender_features('Neo))
'male gtgtgt classifier.classify(gender_features
('olvin)) 'male gtgtgt classifier.classify(gend
er_features('ricky)) 'female gtgtgt
classifier.classify(gender_features('serena))
'female'
23
gtgtgt classifier.classify(gender_features('cyra))
'female gtgtgt classifier.classify(gender_feature
s('leeta)) 'female gtgtgt classifier.classify(g
ender_features('rock)) 'male gtgtgt
classifier.classify(gender_features('jack))
'male'
24
We can systematically evaluate the classifier on
a much larger quantity of unseen data
gtgtgt print(nltk.classify.accuracy(classifier,
test_set)) 0.77
Finally, we can examine the classifier to
determine which features it found most effective
for distinguishing the names genders
25
gtgtgt classifier.show_most_informative_features(20)
Most Informative Features last_letter
'a'            female male        35.5 1.0
last_letter 'k'             male female
30.7 1.0 last_letter 'p'              male
female      20.8 1.0 last_letter
'f'              male female      15.9 1.0
last_letter 'd'              male female
     11.5 1.0 last_letter 'v'
male female       9.8 1.0
26
last_letter 'o'              male female
      8.7 1.0 last_letter 'w'
male female       8.4 1.0 last_letter
'm'              male female       8.2 1.0
last_letter 'r'              male female
      7.0 1.0 last_letter 'g'
male female       5.1 1.0 last_letter
'b'              male female       4.4 1.0
last_letter 's'              male female
      4.3 1.0
27
last_letter 'z'              male female
      3.9 1.0 last_letter 'j'
male female       3.9 1.0 last_letter
't'              male female       3.8 1.0
last_letter 'i'            female male
      3.8 1.0 last_letter 'u'
male female       3.0 1.0 last_letter
'n'              male female       2.1 1.0
last_letter 'e'            female male
      1.8 1.0
28
(No Transcript)
29
For more Training Information , Contact
Us Email info_at_learntek.org USA 1734 418
2465 INDIA 40 4018 1306
7799713624

Write a Comment

User Comments (0)