The intercept β0 represents the mean response Eyi when xi=0.
In the SEDA example:
For districts with a log median income of 0, the mean achievement gap between boys and girls is estimated to be -1.356 standard deviations from the national average.
Not incorrect, but awkward:
log median income is not a natural quantity
the sign is confusing
Parameter interpretations: intercept
Better:
For school districts with a median income of 1 dollar, the mean achievement gap is estimated to favor girls by 1.356 standard deviations from the national average.
Check your understanding:
why 1 dollar and not 0 dollars?
why not -1.356?
Not of particular interest here because no districts have a median income of 1 USD.
Parameter interpretations: slope
The slope β1 represents the change in mean response Eyi per unit change in xi.
In the SEDA example:
Each increase of log median income by 1 is associated with an estimated increase in mean achievement gap of 0.121 standard deviations from the national average in favor of boys.
Not incorrect, but a bit awkward – how much is a change in log median income of 1 unit?
Parameter interpretations: slope
Better:
Every doubling of median income is associated with an estimated increase in the mean achievement gap of 0.084 standard deviations from the national average in favor of boys.
Why doubling??
Hint:ˆβ1log(2x)=ˆβ1logx+ˆβ1log2
Parameter interpretations: error variance
The error variance σ2 represents the variability in the response y after accounting for the explanatory variable x.
In the SEDA example:
After adjusting for log median income, the gender achievement gap varies among districts by an estimated 0.11 standard deviations from the national average.
Note that ˆσ is reported for interpretation on the original scale, rather than ˆσ2.
Parameter interpretations: error variance
Compare the estimated ‘raw’ variance in gender gap with the estimated residual variance after accounting for log median income:
raw variance: 0.015355417268162816
estimated residual variance: 0.01317117061613733
The estimated variability in achievement gap diminishes a little after adjusting for log median income. The relative reduction is:
ˆσ2raw−ˆσ2ˆσ2raw
In the SEDA example, the reduction was about 14%:
Code
print('relative reduction in variance: ', (y.var(ddof =1) - sigmasq_hat)/y.var(ddof =1))
relative reduction in variance: 0.1422459978703541
Parameter interpretations: variance
A closely related quantity is the R-squared statistic, which simply adjusts the denominator of the error variance estimate:
ˆσ2raw−n−2n−1ˆσ2ˆσ2raw
used as a measure of fit
interpreted as the proportion of variation in the response explained by the model
General parameter interpretations
There is some general language for interpreting the parameter estimates:
(Intercept) When [xi=0] the mean [response variable] is estimated to be [ˆβ0 units].
(Slope) Every [one-unit increase in xi] is associated with an estimated change in mean [response variable] of [ˆβ1 units].
(Error variance) After adjusting for [explanatory variable], the remaining variability in [response variable] is an estimated [ˆσ units] about the mean.
You can use this standard language as a formulaic template for interpreting estimated parameters.
Centering the explanatory variable
If we want the intercept to be meaningful, we could center the explanatory variable and instead fit:
yi=β0+β1(xi−ˉx)+ϵi
Code
# center log median incomelog_income_ctr = (regdata.log_income - regdata.log_income.mean()).values# form x matrixx_ctr = sm.tools.add_constant(log_income_ctr)# refit modelslr_ctr = sm.OLS(endog = y, exog = x_ctr)beta_hat_ctr = slr_ctr.fit().params# display parameter estimatesprint('coefficient estimates: ', beta_hat_ctr)print('error variance estimate: ', slr_ctr.fit().scale)
We could seek to adjust the model so that the intercept is interpreted as the gap at the district with the smallest median income:
yi=β0+β1log(xi−x(1)+1),xi:median income for district i
But this changes the meaning of the other model terms:
β1 represents the change in mean gap associated with multiplicative changes in the amount by which a district’s median income exceeds that of the poorest district
σ2 is the variability of the gap after adjusting for the log of the difference in median income from the median income of the poorest district
Other transformations
Unsurprisingly, estimates are not invariant under arbitrary transformations, so if the meanings of the other parameters change, then so do the estimates:
# center log median incomeincome = np.exp(regdata.log_income) income_shifted = income - income.min()log_income_shifted = np.log(income_shifted +1)# form x matrixx_shifted = sm.tools.add_constant(log_income_shifted)# refit modelslr_shifted = sm.OLS(endog = y, exog = x_shifted)beta_hat_shifted = slr_shifted.fit().params# display parameter estimatesprint('coefficient estimates: ', beta_hat_shifted)print('error variance estimate: ', slr_shifted.fit().scale)
Note also it’s not possible to express the old parameters as functions of the new parameters; this is a fundamentally different model.
Uncertainty quantification
A great benefit of the simple linear regression model relative to a best-fit line is that the error variance estimate allows for uncertainty quantification.
What that means is that one can describe precisely:
variation in the estimates (i.e., estimated model reliability);
variation in predictions made using the estimated model (i.e., predictive reliability).
Understanding variation in estimates
What would happen to the estimates if they were computed from a different sample?
We can explore this idea a little by calculating least squares estimates from several distinct subsamples of the dataset.
The lines are pretty similar, but they change a bit from subsample to subsample.
Variance of least squares
How much should one expect the estimates to change depending on the data they are fit to?
It can be shown that the variances and covariance of the estimates are:
[varˆβ0cov(ˆβ0,ˆβ1)cov(ˆβ0,ˆβ1)varˆβ1]=σ2(X′X)−1
Bear in mind that the randomness comes from the ϵi model term.
these quantify how much the parameters vary across collections of yi’s measured at exactly the same values of xi
these are not variances of the parameters; β0 and β1 are constants, i.e., not random
they are also not variances of the estimates – e.g.,0.121 is yet another constant
Standard errors
So the variances can be estimated by plugging in ˆσ2 for σ in the variance-covariance matrix from the previous slide.
The estimated standard deviations are known as standard errors:
SE(ˆβ0)=√ˆσ2(X′X)−111andSE(ˆβ1)=√ˆσ2(X′X)−122
Computations and intepretations
The estimated variance-covariance of the least squares estimates is computed by the .cov_params() method: