- Create a new file and import the chosen packages:
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
- Describe a function to extract features:
def collect_features(word_list):
word = []
return dict ([(word, True) for word in word_list])
- Adopt movie reviews in NLTK as training data:
if __name__=='__main__':
plus_filenum = movie_reviews.fileids('pos')
minus_filenum = movie_reviews.fileids('neg')
- Divide the data into positive and negative reviews:
feature_pluspts = [(collect_features(movie_reviews.words(fileids=[f])),
'Positive') for f in plus_filenum]
feature_minuspts = [(collect_features(movie_reviews.words(fileids=[f])),
'Negative') for f in minus_filenum]
- Segregate the data into training and testing datasets:
threshold_fact = 0.8
threshold_pluspts = int(threshold_fact * len(feature_pluspts))
threshold_minuspts = int(threshold_fact * len(feature_minuspts))
- Extract the features:
feature_training = feature_pluspts[:threshold_pluspts] + feature_minuspts[:threshold_minuspts]
feature_testing = feature_pluspts[threshold_pluspts:] + feature_minuspts[threshold_minuspts:]
print "nNumber of training datapoints:", len(feature_training)
print "Number of test datapoints:", len(feature_testing)
- Consider the Naive Bayes classifier and train it with an assigned objective:
# Train a Naive Bayes classifiers
classifiers = NaiveBayesClassifier.train(feature_training)
print "nAccuracy of the classifiers:",nltk.classify.util.accuracy(classifiers,feature_testing)
print "nTop 10 most informative words:"
for item in classifiers.most_informative_features()[:10]:print item[0]
# Sample input reviews
in_reviews = [
"The Movie was amazing",
"the movie was dull. I would never recommend it to anyone.",
"The cinematography is pretty great in the movie",
"The direction was horrible and the story was all over the place"
]
print "nPredictions:"
for review in in_reviews:
print "nReview:", review
probdist = classifiers.prob_classify(collect_features(review.split()))
predict_sentiment = probdist.max()
print "Predicted sentiment:", predict_sentiment
print "Probability:", round(probdist.prob(predict_sentiment), 2)
- The result obtained for sentiment analysis is shown as follows: