Machine learning: A simple linear regression model in Python

Machine learning is described in detail in this article. Today, I want to run through a simple machine learning model, that uses linear regression.

What is regression?

Regression aims to predict the numeric value of something, given a set of input parameters. For example, the we could approximate the price of a car, given its mileage, age, brand, MOT status, etc.. In this simple example, we’re going to predict the output value, based on three randomly generated input variables. In our real-world example variables could be mileage, age and miles since last service.

To put linear regression simply, it’s about creating a line of best fit on a graph. So in this example, if X is 3, then we would expect Y to be 6.

The process

The process that this model follows is:

  • 1. Create some training input and output data
  • 2. Inject that data into the model so that it can fit a line to it
  • 3. Test that model works

Of course, in this example, we’re creating the output data, knowing the exact relationship (i.e. output = a + b + (100*c). Hence, we expect the coefficients for the output to be 1, 1, 100. In the ‘real world’ we would not have such a direct relationship, so the output data would serve to train the model (rather than tell it what we already know).

The code

So, what about bringing my own data in (rather than randomly generating it)?

Below, I’ve done exactly that, using Pandas to read my CSV in, in addition to using Matplot Lib to show the linear regression plot.