Conducting sentiment analysis using R

brain-954816_640

I’ve been asked on numerous occasions how to conduct sentiment analyses in R. There used to be a sentiment package, but it’s now only available via the archive.

An alternative approach (and one that works with languages other than R) is to use a sentiment analysis web service. There are numerous services to choose from.

What is sentiment analysis?

Sentiment analysis is also known as opinion mining. It takes a document (snippet of text, tweet, review, e-mail, etc.) and attempts to determine whether it is positive, negative or neutral in tone. More advanced algorithms attempt to assign an emotional state to the document (e.g. angry, depressed).

Sentiment analysis services

Mashape is a marketplace for APIs and is a great place to start if you are looking for a web service to perform a particular task. A search of the marketplace for “sentiment” currently reveals dozens of options. Some of the services are free while others require you to pay after you have analyzed a set number of documents (freemium).

One of the most popular free options is the eponymous “Sentiment” created by Vivek Narayanan. You can try out the service manually via the web interface. We’ll be making use of the API which returns a sentiment (“positive”, “negative”, “neutral”) and a confidence level for documents submitted to it.

Interacting with the web service

The web service contains a “batch” feature that allows multiple documents to be analyzed in the same request. The service is called over HTTP, receiving and returning JSON.

To interact with the web service we’ll use the httr package. The JSON will be processed using the jsonlite package.

install.packages("stringi")
install.packages("httr")
install.packages(jsonlite)

library(httr)
library(jsonlite)

I found that I had to install the stringi package to get httr to load. This may be a temporary problem.

Let’s create a set of documents to analyse. Our documents will be phrases/sentences—similar in size to tweets.

documents <- c(
  "This is a good pizza", 
  "This is a great pizza",   
  "This is a terrible pizza", 
  paste(
    "Great design, Glances, stylish, battery life, reply to text messages, ", 
    "photo watch faces, Apple Pay"), 
  paste(
    "Apps are slow to load when not native, no GPS, no third-party watch faces, ", 
    "doesn't offer huge amounts over what is on the market"))

documents.df <- data.frame(documents)

[The paste operations are purely for formatting of the post.]

The first three documents are simple tests. The second two are the “pros” and “cons” summaries of an Apple Watch review on Pocket-lint.

To use the service we POST a list of documents, as a JSON array, to http://sentiment.vivekn.com/api/batch/. It returns JSON containing the sentiment and confidence levels for each document—ordered to correspond to the ordering of the submitted documents.

toJSON from the jsonlite package is use to prepare the request body. We add headers to tell the service we are sending and expect JSON data.

response <- POST(
  url = "http://sentiment.vivekn.com/api/batch/", 
  body = toJSON(documents.df[, 1], auto_unbox = TRUE),
  add_headers("Content-Type" = "application/json", "Accept" = "application/json"))

We then deserialize the response to a data frame, after converting the returned content to text.

sentiments.df <- fromJSON(content(response, "text"))

Finally, we can join the original data frame to the results of the sentiment analyis.

labeled.documents.df <- cbind(documents.df, sentiments.df)

Let’s look at the results.

View(labeled.documents.df)

Result of sentiment analsyis

As you can see, our three trivial test documents have been classified appropriately, with high confidence. Notice how “great” is more confidently positive than “good”. More interestingly, the “pros” and “cons” summaries from the Apple Watch review have been classified appropriately, with high confidence.

If you work with data, Learning Tree has a number of courses that may interest you.

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.