Coef | Std err | t | P>[t] | |
Constant | A | C | E | G |
X | B | D | F | H |
In the table above, A is the constant (the Y intercept), also known as B0. X is the X multiplier (also known as B1). So in our equation Y = B0 + B1 (X); we can substitute B0 and B1 for the values in the coefficient column of the table.
The standard error column (C & D) tells us how accurate our predictions are. The lower the value, the more accurate the prediction will be.
E & F are the T-Statistic. Wikipedia defines this as: In statistics, the t-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. It is used in hypothesis testing via Student’s t-test. For example, it is used in estimating the population mean from a sampling distribution of sample means if the population standard deviation is unknown.
G & H show the P Value for each of the variables in the test. If the value is below 0.05, then it means the variable is significant, discussed here. The P Value of the Y is not so important; we are looking at the causal relationship of the X variables; so we should seek the X P-Value to be as close to 0.000 as possible.
Dep Variable: | A | R-Squared | D |
Model | B | ||
Method | C |
A is the dependent variable. The dependent variable is the one we’re trying to predict. It’s dependent on the X values.
B (the model) may be OLS (Ordinary least squares), which is the most common linear regression model. It finds a line in the model that minimises the number of squares between the data points and the line. Other models can include:
- Generalised least squares
- Maximum likelihood estimation
- Bayesian regression
- Kernel regression
- Gaussian process regression
C is the method, which can also be least squares.
R-Squared (D) measures the goodness of fit of your model. Generally, the more variables we include, the better the fit will be, as the model will take into account more of the variability. For example, the amount you earn may be influenced by:
- Your education
- Your marital status
- Your age
- Your industry
- And much more…
The more of these we fit into the model, the better the model will explain the variability of your data. An R score can be between 0 (bad) and 1 (great).