Learn everything from Data wrangling and analysis to machine learning development
Python has become a leading technology in the data analysis space. It’s one of the most sought after skills & something that can help you advance your career in big data. The following series of articles will cover the language and libraries in detail, enabling you to get hands on with Python to tackle your own use cases.
Below is a quick overview of key Python functions. ‘….’ is used to show indentation.
Python is great for file analysis – we’ll be looking into more complex functions using the Pandas libraries in subsequent articles, but for now, let’s look at the out of the box functionality that we can use in Python.
NumPy stands for Numerical Python. It’s a library that contains a collection of routines for processing arrays.
An array, is a list. A multi dimensional array, is essentially a grid or table (it’s an array that contains 2 or more arrays).
Pandas series are one dimensional arrays (a list). An example one dimensional array is below:
Sentiment analysis can provide key insight into the feelings of your customers towards your company & hence is becoming an increasingly important part of data analysis. Click here to read more
The below script shows how we may handle RFM segmentation with Python. RFM stands for Recency, Frequency and Monetary: Click here to read more
Machine learning uses statistical techniques to give computer systems the ability to ‘learn’ rather than being explicitly programmed. By learning from historical inputs. we’re able to achieve far greater accuracy in our predictions & constantly refine the model with new data. Click here to read more
Supervised learning is where we provide the model with the actual outputs from the data. This let’s it build a picture of the data and form links between the historic parameters (or features) that have influenced the output. To put a formula onto supervised learning, it would be as below, where, Y is the predicted output, produced by the model and X is the input data. So, by executing a function against X, we can predict Y. Click here to read more
A decision tree builds a model in the form of a tree structure – almost like a flow chart. In order to calculate the expected outcome, it uses decision points and based on the results of those decisions, it’ll bucket each input. In this article, we’ll talk about classification and regression decision trees, along with random forests. Click here to read more
Regression aims to predict the numeric value of something, given a set of input parameters. For example, the we could approximate the price of a car, given its mileage, age, brand, MOT status, etc.. In this simple example, we’re going to predict the output value, based on three randomly generated input variables. In our real-world example variables could be mileage, age and miles since last service. Click here to read more
The below is a logistic regression model, which uses some dummy data to determine whether people are at risk of diabetes or not – of course, this model couldn’t actually determine whether of not someone does have diabetes, it’s just a demonstration. Click here to read more
KMeans clustering searches for clusters of data within a dataset. This is an unsupervised learning model. If we look at plot 1 below, we can easily see the clusters of data – but we haven’t labeled the data (we haven’t told KMeans which cluster each datapoint belongs to). However, as you can see at the bottom of the page that the clusters have been correctly defined. Click here to read more
We discussed decision trees and random forests in quite a lot of detail here. This article will take you through a practical implementation, where based on historic data, we aim to predict future weather. The data for this model is continuous & hence requires a regression model, rather than a discrete classification model. Click here to read more