An alternative approach (and one that works with languages other than R) is to use a sentiment analysis web service. There are numerous services to choose from.
Sentiment analysis is also known as opinion mining. It takes a document (snippet of text, tweet, review, e-mail, etc.) and attempts to determine whether it is positive, negative or neutral in tone. More advanced algorithms attempt to assign an emotional state to the document (e.g. angry, depressed).
Mashape is a marketplace for APIs and is a great place to start if you are looking for a web service to perform a particular task. A search of the marketplace for "sentiment" currently reveals dozens of options. Some of the services are free while others require you to pay after you have analyzed a set number of documents (freemium).
One of the most popular free options is the eponymous "Sentiment" created by Vivek Narayanan. You can try out the service manually via the web interface. We’ll be making use of the API which returns a sentiment ("positive", "negative", "neutral") and a confidence level for documents submitted to it.
The web service contains a "batch" feature that allows multiple documents to be analyzed in the same request. The service is called over HTTP, receiving and returning JSON.
install.packages("stringi") install.packages("httr") install.packages(jsonlite) library(httr) library(jsonlite)
I found that I had to install the stringi package to get httr to load. This may be a temporary problem.
Let’s create a set of documents to analyse. Our documents will be phrases/sentences—similar in size to tweets.
documents <- c( "This is a good pizza", "This is a great pizza", "This is a terrible pizza", paste( "Great design, Glances, stylish, battery life, reply to text messages, ", "photo watch faces, Apple Pay"), paste( "Apps are slow to load when not native, no GPS, no third-party watch faces, ", "doesn't offer huge amounts over what is on the market")) documents.df <- data.frame(documents)
[The paste operations are purely for formatting of the post.]
The first three documents are simple tests. The second two are the "pros" and "cons" summaries of an Apple Watch review on Pocket-lint.
To use the service we POST a list of documents, as a JSON array, to
http://sentiment.vivekn.com/api/batch/. It returns JSON containing the sentiment and confidence levels for each document—ordered to correspond to the ordering of the submitted documents.
toJSON from the jsonlite package is use to prepare the request body. We add headers to tell the service we are sending and expect JSON data.
response <- POST( url = "http://sentiment.vivekn.com/api/batch/", body = toJSON(documents.df[, 1], auto_unbox = TRUE), add_headers("Content-Type" = "application/json", "Accept" = "application/json"))
We then deserialize the response to a data frame, after converting the returned content to text.
sentiments.df <- fromJSON(content(response, "text"))
Finally, we can join the original data frame to the results of the sentiment analyis.
labeled.documents.df <- cbind(documents.df, sentiments.df)
Let’s look at the results.
As you can see, our three trivial test documents have been classified appropriately, with high confidence. Notice how "great" is more confidently positive than "good". More interestingly, the "pros" and "cons" summaries from the Apple Watch review have been classified appropriately, with high confidence.
If you work with data, Learning Tree has a number of courses that may interest you.