Sentiment AnalysisSentiment analysis (also known as opinion mining) refers to the use of natural language processing (NLP), text analysis and computational linguistics to identify and extract subjective information from the source materials. Generally speaking, sentiment analysis aims to determine the attitude of a writer or a speaker with respect to a specific topic or the overall contextual polarity of a document.

Globally, business enterprises can leverage opinion polarity and sentiment topic recognition to gain deeper understanding of the drivers and the overall scope. Subsequently, these insights can advance competitive intelligence and improve customer service, thereby creating a better brand image and providing a competitive edge. Let’s begin by first understanding the adoption of sentiment analysis across industries.

Sentiment Analysis Adoption

Here are some of the top industry verticals, where sentiment analysis is being widely utilized and yielding positive results.

E-Commerce

The e-commerce industry is benefitting greatly by utilizing sentiment analysis. Generally, on e-commerce portals, buyers often express their opinions in the form of comments (positive or negative) for the products they have purchased, making this a huge data trove for sentiment analysis. Correspondingly, analysis of such opinion-related data (comments) can provide deep-insights to the key stakeholders. A thorough sentiment analysis reveals deep-insights on the product, quality and performance. Additional insights that can be extracted using sentiment analysis include.

  • Insights on competitors
  • Feedback on newly launched products
  • Influencing factors affecting other customer decisions
  • Company news and trends
Financial Domain

Sentiment analysis is widely used across the financial domain for trading and investing. Notably, financial analysts and traders monitor/analyze social networks (i.e. utilizing StockTwits) to quickly identify the trending stocks and fluctuations in the stock markets, which enable them to react swiftly to any major changes in the stock market.

Aviation Sector

In case of the aviation sector, sentiment analysis can help aviation companies detect sentiment polarity and sentiment topics by making use of data (text) and examining the reputation of airlines by computing their Airline Quality Rating (AQR). In this blog, we have considered use case of leading US airliners (Delta, JetBlue and United Airlines) to demonstrate the fundamentals of sentiment analysis.

By and large, social media plays a significant role in sentiment analysis. Here is a comprehensive list of social media websites/platforms, which can be used for sentiment analysis to identify customer likes, dislikes, opinions, feedback, etc.

  • Twitter
  • Facebook
  • LinkedIn
  • YouTube
  • Google+
  • Pinterest
  • SlideShare
  • iTunes
  • Quora
  • Blogs

In this blog, we have considered the twitter social media platform to find out how tweets from the twitter feed can be utilized to perform sentiment analysis. As mentioned earlier, we performed sentiment analysis on three leading airlines and R programming language has been extensively used to perform this analysis.

Sentiment Analysis Approach

The approach followed here is to count the positive and negative words in each tweet and assign a sentiment score. This way, we can ascertain how positive or negative a tweet is. Nevertheless, there are multiple ways to calculate such scores; here is one formula to perform such calculations.

Score = Number of positive words - Number of negative words
If Score > 0, means that the tweet has 'positive sentiment'
If Score < 0, means that the tweet has 'negative sentiment'
If Score = 0, means that the tweet has 'neutral sentiment'

To find out the list of positive and negative words, an opinion lexicon (English language) can be utilized.

Extracting and Analyzing Tweets

TwitterR offers an easy way to extract tweets containing a given hashtag, word or term from a user’s account or public tweets. However, before loading twitterR library and using its functions, developers should create an app on dev.twitter.com and then run the following code, which is written in the R programming language.

Setting Authorization to Extract Tweets

Run the following code in the R Studio to set authorization to extract tweets.

reqURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "http://api.twitter.com/oauth/access_token"
authURL <- "http://api.twitter.com/oauth/authorize"
api_key <- "yourconsumerkey"
api_secret <- "yourconsumersecret"
access_token <- "consumeraccess token"
access_token_secret <- "consumer access secret token"
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)
Required Libraries

Here is the code to load required libraries.

library(twitteR) ### for fetching the tweets
library(plyr) ## for breaking the data into manageable pieces
library(ROAuth) # for R authentication
library(stringr) # for string processing
library(ggplot2) # for plotting the results
Importing Files

Developers have to import files containing a dictionary of positive and negative words. Likewise, text files containing positive and negative sentiments can be imported using the below code. These files can be downloaded using the Google search engine.

posText <- read.delim("…/positive-words.txt", header=FALSE, stringsAsFactors=FALSE)
posText <- posText$V1
posText <- unlist(lapply(posText, function(x) { str_split(x, "\n") }))
negText <- read.delim("…/negative-words.txt", header=FALSE, stringsAsFactors=FALSE)
negText <- negText$V1
negText <- unlist(lapply(negText, function(x) { str_split(x, "\n") }))
pos.words = c(posText, 'upgrade')
neg.words = c(negText, 'wtf', 'wait', 'waiting','epicfail', 'mechanical')
Extracting Tweets with Hashtags

To demonstrate sentiment analysis, we analyzed tweets relating to Delta, JetBlue and United Airlines. In order to extract specific tweets relating to these airlines, developers should query twitter for tweets with the hashtag Delta, JetBlue and United.

delta_tweets = searchTwitter('@delta', n=5000)
jetblue_tweets = searchTwitter('@jetblue', n=5000)
united_tweets = searchTwitter('@united', n=5000)
Processing Tweets

Step 1 – Convert the tweets to a text format.

delta_txt = sapply(delta_tweets, function(t) t$getText() )
jetblue_txt = sapply(jetblue_tweets, function(t) t$getText() )
united_txt = sapply(united_tweets, function(t) t$getText() )

Step 2 – Calculate the number of tweets for each airline.

noof_tweets = c(length(delta_txt), length(jetblue_txt),length(united_txt))

Step 3 – Combine the text of all these airlines

airline<- c(delta_txt,jetblue_txt,united_txt)
Sentiment Analysis Application (Code)

The code below showcases how sentiment analysis is written and executed. However, before we proceed with sentiment analysis, a function needs to be defined that will calculate the sentiment score.

score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
# Parameters
# sentences: vector of text to score
# pos.words: vector of words of positive sentiment
# neg.words: vector of words of negative sentiment
# .progress: passed to laply() to control of progress bar
# create a simple array of scores with laply
scores = laply(sentences,
function(sentence, pos.words, neg.words)
{
# remove punctuation
sentence = gsub("[[:punct:]]", "", sentence)
# remove control characters
sentence = gsub("[[:cntrl:]]", "", sentence)
# remove digits?
sentence = gsub('\\d+', '', sentence)
# define error handling function when trying tolower
tryTolower = function(x)
{
# create missing value
y = NA
# tryCatch error
try_error = tryCatch(tolower(x), error=function(e) e)
# if not an error
if (!inherits(try_error, "error"))
y = tolower(x)
# result
return(y)
}
# use tryTolower with sapply
sentence = sapply(sentence, tryTolower)
# split sentence into words with str_split (stringr package)
word.list = str_split(sentence, "\\s+")
words = unlist(word.list)
# compare words to the dictionaries of positive & negative terms
pos.matches = match(words, pos.words)
neg.matches = match(words, neg.words)
# get the position of the matched term or NA
# we just want a TRUE/FALSE
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
# final score
score = sum(pos.matches) - sum(neg.matches)
return(score)
}, pos.words, neg.words, .progress=.progress )
# data frame with scores for each sentence
scores.df = data.frame(text=sentences, score=scores)
return(scores.df)
}

Now, we can start processing the tweets to calculate the sentiment score.

scores = score.sentiment(airline, pos.words,neg.words , .progress='text')

Step 1 – Create a variable in the data frame.

scores$airline = factor(rep(c("Delta", "JetBlue","United"), noof_tweets))

Step 2 – Calculate positive, negative and neutral sentiments.

scores$positive <- as.numeric(scores$score >0)
scores$negative <- as.numeric(scores$score <0)
scores$neutral <- as.numeric(scores$score==0)

Step 3 – Split the data frame into individual datasets for each airline.

delta_airline <- subset(scores, scores$airline=="Delta")
jetblue_airline <- subset(scores,scores$airline=="JetBlue")
united_airline <- subset(scores,scores$airline=="United")

Step 4 – Create polarity variable for each data frame.

delta_airline$polarity <- ifelse(delta_airline$score >0,"positive",ifelse(delta_airline$score < 0,"negative",ifelse(delta_airline$score==0,"Neutral",0)))
jetblue_airline$polarity <- ifelse(jetblue_airline$score <0,"positive",ifelse(jetblue_airline$score < 0,"negative",ifelse(jetblue_airline$score==0,"Neutral",0)))
united_airline$polarity <- ifelse(united_airline$score >0,"positive",ifelse(united_airline$score < 0,"negative",ifelse(united_airline$score==0,"Neutral",0)))

Generating Graphs

After the above steps are executed, developers can go ahead and create insightful graphs. The steps below outline the process to create graphs.

Polarity Plot – Customer Sentiments (Delta Airlines)
qplot(factor(polarity), data=delta_airline, geom="bar", fill=factor(polarity))+xlab("Polarity Categories") + ylab("Frequency") + ggtitle("Customer Sentiments - Delta Airlines")

Customer Sentiments Delta Airlines

The bar graph above depicts polarity, if we closely analyze the graph; it reveals that out of 5,000 twitter users, 1,100 twitter users have commented in a negative way, 2,380 users are neutral. However, 1,520 users are pretty positive about the airline.

qplot(factor(score), data=delta_airline, geom="bar", fill=factor(score))+xlab("Sentiment Score") + ylab("Frequency") + ggtitle("Customer Sentiment Scores - Delta Airlines")
Customer Sentiment Scores (Delta Airlines)

Customer Sentiment Scores Delta Airlines

The bar graph above depicts twitter user’s sentiment score, negative score denoted by the (-) symbol, which indicates unhappiness of users with the airline, whereas the positive score denotes that users are happy with the airline. While, zero represents that twitter users are neutral.

Polarity Plot – Customer Sentiments (JetBlue Airlines)
qplot(factor(polarity), data=jetblue_airline, geom="bar", fill=factor(polarity))+xlab("Polarity Categories") + ylab("Frequency") + ggtitle(" Customer Sentiments - JetBlue Airlines ")

Customer Sentiments Jetblue Airlines

The bar graph above represents polarity. In this case, out of the 5,000 twitter users, 550 users have commented negatively, 2,700 users remain neutral, whereas 1,750 users are positive about the airline.

Customer Sentiment Scores (JetBlue Airlines)
qplot(factor(score), data=jetblue_airline, geom="bar", fill=factor(score))+xlab("Sentiment Score") + ylab("Frequency") + ggtitle("Customer Sentiment Scores - JetBlue Airlines")

Customer Sentiment Scores Jetblue Airlines

The bar graph above depicts twitter user’s sentiment score, negative score denoted by the (-) symbol, which indicates unhappiness with the airline, whereas the positive score denotes that users are quite happy. Whereas, zero here represents that users are neutral.

Polarity Plot – Customer Sentiments (United Airlines)
qplot(factor(polarity), data=united_airline, geom="bar", fill=factor(polarity))+xlab("Polarity Categories") + ylab("Frequency") + ggtitle("Customer Sentiments - United Airlines")

Customer Sentiments United Airlines

The bar graph above represents polarity. In this case, out of the 5,000 twitter users, 1,350 users have commented negatively, whereas 2,200 users are neutral and remaining 1,450 users remain positive about the airline.

Customer Sentiment Scores (United Airlines)
qplot(factor(score), data=united_airline, geom="bar", fill=factor(score))+xlab("Sentiment Score") + ylab("Frequency") + ggtitle("Customer Sentiment Scores - United Airlines ")

Customer Sentiment Scores United Airlines

The bar graph above depicts twitter user’s sentiment score, negative score denoted by the (-) symbol indicates unhappiness of users with the airline, whereas the positive score denotes that users are quite happy. While, zero represents that users are neutral about their opinion.

Summarizing Scores

  • The code below will help developers to summarize the overall positive, negative and neutral scores
    df = ddply(scores, c("airline"), summarise,
    pos_count=sum( positive ),
    neg_count=sum( negative ),
    neu_count=sum(neutral))
  • To put it in another way, developers can create total count by adding positive, negative and neutral sum.
    df$total_count = df$pos_count +df$neg_count + df$neu_count
  • Additionally, developers can calculate positive, negative and neutral percentages using the below code.
    df$pos_prcnt_score = round( 100 * df$pos_count / df$total_count )
    df$neg_prcnt_score = round( 100 * df$neg_count / df$total_count )
    df$neu_prcnt_score = round( 100 * df$neu_count / df$total_count )
    

Comparison Charts

Positive Comparative Analysis

Here is the code to create a positive comparison pie chart for these three airlines:

attach(df)
lbls <-paste(df$airline,df$pos_prcnt_score)
lbls <- paste(lbls,"%",sep="")
pie(pos_prcnt_score, labels = lbls, col = rainbow(length(lbls)), main = "Positive Comparative Analysis - Airlines")

The pie chart below represents positive percentage score of these airlines.

Positive Analysis

Negative Comparative Analysis

Here is the code to create a negative comparison pie chart for these three airlines:

lbls <-paste(df$airline,df$neg_prcnt_score)
lbls <- paste(lbls,"%",sep="")
pie(neg_prcnt_score, labels = lbls, col = rainbow(length(lbls)), main = " Negative Comparative Analysis - Airlines")

The pie chart below represents negative percentage score of these three airlines.

Negative Analysis

Neutral Comparative Analysis

Here is the code to create a neutral comparison pie chart:

lbls <-paste(df$airline,df$neu_prcnt_score)
lbls <- paste(lbls,"%",sep="")
pie(neu_prcnt_score, labels = lbls, col = rainbow(length(lbls)), main = "Neutral Comparative Analysis - Airlines")

The pie chart below represents neutral percentage score of these three airlines.

Neutral Analysis

Conclusion

As can be seen, sentiment analysis enables enterprises to understand consumer sentiments in relation to specific products/services. Moreover, these insights could be used to improve their products and services by gauging consumers’ comments and feedback using sentiment analysis. In the long run, sentiment analysis, if implemented the right way can aid business enterprises in improving the overall consumer experience, enhance brand image and propel business growth.

IT Consulting Services

At Evoke, we enable businesses augment data science to solve day-to-day business problems and make informed decisions by applying data science. With more than a decade’s experience in software development and deployment, we provide IT solutions that decrease your company’s expenditures, while increasing your bottom line. And our highly trained, dedicated teams of IT engineers remain available to support you 24/7.

Call Evoke today at +1 (937) 202-4161 (select Option 2 for Sales) or contact us online to learn more about how we can help your company take steps today toward greater digital maturity and profitability.