# Machine Learning: A simple logistic regression model in Python

## What is logistic regression?

Logistic, or logit regression is the type of regression that we use when the dependent variable (the thing we’re trying to predict) is dichotomous, in other words, binary. For example, a logistic regression model may estimate pass/fail; yes/no or healthy/sick.

Widely adopted use cases of a logistic regression model have been to predict whether email is spam or to determine whether a tumor is malignant or not.

There are multiple types of logistic regression:

1. Binary logistic regression – which means the model only has two possible outcomes, for example, happy or sad.
2. Multinominal logistic regression is another logistic model which gives us three or more outcomes. For example, does a user prefer: pizza, pasta or curry. These are not ordered.
3. Ordinal logistic regression gives us three or more outcomes. but they’re ordered – e.g. customer experience rated between 1 and 5 or likelihood to buy again: low, medium and high.

## An example:

The below is a logistic regression model, which uses some dummy data to determine whether people are at risk of diabetes or not – of course, this model couldn’t actually determine whether of not someone does have diabetes, it’s just a demonstration of a very simple logistic implementation.

```In [1]: import pandas as pd
#
#bring in total data set, including the truth
In [2]: path = 'Desktop/diabetes.csv'
#
#display the contents of the ingested data
In [4]: df
Out[4]:
Patient Blood Sugar BMI Diabetic
0 111 120 31 1
1 112 122 44 1
2 113 130 33 1
3 114 50 25 0
4 115 40 15 0
5 116 40 25 0
6 117 103 50 1
7 118 101 30 1
#
#now, set out features to be Blood Sugar and BMI
In [5]: X = df[['Blood Sugar','BMI']]
#
#Let's look at the output
In [6]: X
Out[6]:
Blood Sugar BMI
0 120 31
1 122 44
2 130 33
3 50 25
4 40 15
5 40 25
6 103 50
7 101 30
#
#set the 'truth' to be the column that says 'yes' they're diabetic, or 'no' they're not
In [7]: y = df[['Diabetic']]
#
#Show the contents of the truth column
In [8]: y
Out[8]:
Diabetic
0 1
1 1
2 1
3 0
4 0
5 0
6 1
7 1
#
#Import the learning library
In [9]: from sklearn.linear_model import LogisticRegression
#
#Set the model type to be logistic regression
In [10]: model = LogisticRegression()
#
#Convert Y to be a 1D array
In [11]: matrix = y.as_matrix()
/Users/keenek1/anaconda3/bin/ipython:1: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
#!/Users/keenek1/anaconda3/bin/python
#
#then create the model with the X and y parameters
In [12]: model.fit(X,matrix.ravel())
Out[13]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
#
#Import our testing data
In [14]: testdata = 'Desktop/testdata.csv'