Sentiment analysis can provide key insight into the feelings of your customers towards your company & hence is becoming an increasingly important part of data analysis.
Building a machine learning model to identify positive and negative sentiments is pretty complex, but luckily for us, there is a Python library that can help us out. It’s called TextBlob.
Through this post, we’ll look at how we use TextBlob with Python & the CSV functionality & also with Pandas, using dataframes. Before we can do anything, we need to get it installed on our system:
To process large bodies of text, we’ll need one more thing:
Now we’re installed you can open your Python terminal or IDE (whichever you prefer). In the below code, we are doing the following:
- Importing the required CSV and TextBlob packages
- Setting the path of the source data
- Opening the CSV in ‘read mode’ and iterating through the rows
- With each iteration, we’re putting row (column1) into the TextBlob called ‘blob’
- We then run each statement through the built in polarity and subjectivity functions and print the outcome in the terminal
Pretty simple, huh? I guess you’re probably wondering what polarity and subjectivity are? Well, polarity is a measure of how positive or negative a statement is, ranging from -1 (very negative) to +1 (very positive) and subjectivity is how opinionated the comment is ranging from 0 (very opinionated) to 1 (very fact based views).
We can do this more elegantly & provide ourselves with a nice looking output, using Pandas. Here we are:
- Importing the Pandas and TextBlob libraries
- Setting the path to the source data
- Reading the data from the CSV into a Pandas Dataframe
- Running a lambda function on the ‘Text’ column of the dataframe, which passes the column value into the textblob & provides the subjectivity and polarity.
- Create a status which gives us red or green, depending on positivity & negativity
We can take it a step further, by cleaning up input data and creating columns to say ‘yes’ it’s positive or negative. In my tests, I ran this across a 5,000 row dataset of Amazon reviews. It achieved a 90% accuracy (when manually checking 500 rows).