Getting started with Sentiment Analysis using Python, Pandas & TextBlob

Sentiment analysis can provide key insight into the feelings of your customers towards your company & hence is becoming an increasingly important part of data analysis.

Building a machine learning model to identify positive and negative sentiments is pretty complex, but luckily for us, there is a Python library that can help us out. It’s called TextBlob.

Through this post, we’ll look at how we use TextBlob with Python & the CSV functionality & also with Pandas, using dataframes. Before we can do anything, we need to get it installed on our system:

To process large bodies of text, we’ll need one more thing:

Now we’re installed you can open your Python terminal or IDE (whichever you prefer). In the below code, we are doing the following:

  • Importing the required CSV and TextBlob packages
  • Setting the path of the source data
  • Opening the CSV in ‘read mode’ and iterating through the rows
  • With each iteration, we’re putting row[0] (column1) into the TextBlob called ‘blob’
  • We then run each statement through the built in polarity and subjectivity functions and print the outcome in the terminal

Pretty simple, huh? I guess you’re probably wondering what polarity and subjectivity are? Well, polarity is a measure of how positive or negative a statement is, ranging from -1 (very negative) to +1 (very positive) and subjectivity is how opinionated the comment is ranging from 0 (very opinionated) to 1 (very fact based views).

We can do this more elegantly & provide ourselves with a nice looking output, using Pandas. Here we are:

  • Importing the Pandas and TextBlob libraries
  • Setting the path to the source data
  • Reading the data from the CSV into a Pandas Dataframe
  • Running a lambda function on the ‘Text’ column of the dataframe, which passes the column value into the textblob & provides the subjectivity and polarity.
  • Create a status which gives us red or green, depending on positivity & negativity

We can take it a step further, by cleaning up input data and creating columns to say ‘yes’ it’s positive or negative. In my tests, I ran this across a 5,000 row dataset of Amazon reviews. It achieved a 90% accuracy (when manually checking 500 rows).