# Machine Learning: A simple logistic regression model in Python

The below is a logistic regression model, which uses some dummy data to determine whether people are at risk of diabetes or not – of course, this model couldn’t actually determine whether of not someone does have diabetes, it’s just a demonstration.

As I expand this model to take on additional features and larger datasets, it will improve its accuracy. I will check the fit of this model (whether it’s under or overfitted) and update my findings on this article.

• Is it under or over fitted?
• Is there a bias?
• Does more data make it more accurate?
```In [1]: import pandas as pd
#
#bring in total data set, including the truth
In [2]: path = 'Desktop/diabetes.csv'
#
#display the contents of the ingested data
In [4]: df
Out[4]:
Patient  Blood Sugar  BMI  Diabetic
0      111          120   31         1
1      112          122   44         1
2      113          130   33         1
3      114           50   25         0
4      115           40   15         0
5      116           40   25         0
6      117          103   50         1
7      118          101   30         1
#
#now, set out features to be Blood Sugar and BMI
In [5]: X = df[['Blood Sugar','BMI']]
#
#Let's look at the output
In [6]: X
Out[6]:
Blood Sugar  BMI
0          120   31
1          122   44
2          130   33
3           50   25
4           40   15
5           40   25
6          103   50
7          101   30
#
#set the 'truth' to be the column that says 'yes' they're diabetic, or 'no' they're not
In [7]: y = df[['Diabetic']]
#
#Show the contents of the truth column
In [8]: y
Out[8]:
Diabetic
0         1
1         1
2         1
3         0
4         0
5         0
6         1
7         1
#
#Import the learning library
In [9]: from sklearn.linear_model import LogisticRegression
#
#Set the model type to be logistic regression
In [10]: model = LogisticRegression()
#
#Convert Y to be a 1D array
In [11]: matrix = y.as_matrix()
/Users/keenek1/anaconda3/bin/ipython:1: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
#!/Users/keenek1/anaconda3/bin/python
#
#then create the model with the X and y parameters
In [12]: model.fit(X,matrix.ravel())
Out[13]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
#
#Import our testing data
In [14]: testdata = 'Desktop/testdata.csv'