Linear Regression with Multiple Regressors

Author

Zahid Asghar

School of Economics, QAU, Islamabad

Outline

Omitted variable bias
Causality and regression analysis
Multiple regression and OLS
Measures of fit
Sampling distribution of the OLS estimator

Omitted Variable Bias

The error $u$ arises because of factors, or variables, that influence Y but are not included in the regression function. There are always omitted variables. Sometimes, the omission of those variables can lead to bias in the OLS estimator.

The bias in the OLS estimator that occurs as a result of an omitted factor, or variable, is called omitted variable bias. For omitted variable bias to occur, the omitted variable $Z$ must satisfy two conditions: The two conditions for omitted variable bias (1) $Z$ is a determinant of $Y$ (i.e. $Z$ is part of $u$); and

(2) $Z$ is correlated with the regressor $X$ i.e. $corr(Z,X) \neq 0$

Both conditions must hold for the omission of $Z$ to result in omitted variable bias.

In the test score example:
1. English language ability (whether the student has English as a second language) plausibly affects standardized test scores: $Z$ is a determinant of $Y$.
2. Immigrant communities tend to be less affluent and thus have smaller school budgets and higher STR: $Z$ is correlated with $X$. Accordingly, $\hat\beta_1$ is biased. What is the direction of this bias?
What does common sense suggest?
If common sense fails you, there is a formula…$\sum^n_{i = 1} (Y_i - b_0 - b_1 X_i)^2$.

\[\begin{align} {\hat\beta_1}-\beta_1 = \frac{1}{n} \frac{\sum^n_{i = 1} \left(X_i - \bar{X} \right) u_i } {\sum^n_{i = 1}(X_i - \bar{X})^2}. \tag{6.1} \end{align}\]

From $E(u_i\vert X_i) = 0$ implies $corr(X_i,u_i)=0$

\[\hat\beta_1 \xrightarrow[]{p} \beta_1 + \rho_{Xu} \frac{\sigma_u}{\sigma_X}. \tag{6.1}\]

If If an omitted variable $Z$ is both: (1) a determinant of $Y$ (that is, it is contained in $u$); and (2) correlated with $X$, then $\rho_{Xu} \neq0$, and the OLS estimator $\hat{\beta_1}$ is biased and is not consistent.
For example, districts with few ESL students (1) do better on standardized tests and (2) have smaller classes (bigger budgets), so ignoring the effect of having many ESL students factor would result in overstating the class size effect. Is this is actually going on in the CA data?

Table 6.1

Code

library(haven)
library(kableExtra)
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.7     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()     masks stats::filter()
✖ dplyr::group_rows() masks kableExtra::group_rows()
✖ dplyr::lag()        masks stats::lag()

Code

caschool <- read_dta("caschool.dta")
library(modelsummary)
library(rstatix)


Attaching package: 'rstatix'

The following object is masked from 'package:stats':

    filter

Code

caschool<-caschool %>% mutate(elq1=ifelse(el_pct<=1.9,1,0))
caschool<-caschool %>% mutate(elq2=ifelse(el_pct>=1.9 & el_pct<8.8,1,0))
caschool<-caschool %>% mutate(elq3=ifelse(el_pct>=8.8 & el_pct<23,1,0))
caschool<-caschool %>% mutate(elq4=ifelse(el_pct>=23,1,0))
caschool<-caschool %>%   mutate(str_20 = ifelse(str<20, 1,0))

Code

overall<-caschool %>% t_test(testscr~str_20,var.equal=T)
el_1.9<-caschool %>% filter(elq1==1) %>% t_test(testscr~str_20,var.equal=T)
el_8.8<-caschool %>% filter(elq2==1) %>% t_test(testscr~str_20,var.equal=T)
el_23<-caschool %>% filter(elq3==1) %>% t_test(testscr~str_20,var.equal=T)
el_gr23<-caschool %>% filter(elq4==1) %>% t_test(testscr~str_20,var.equal=T)
table_6.1<-bind_rows(overall,el_1.9,el_8.8,el_23,el_gr23)

df_tbl<- as_tibble(table_6.1)
df_tbl<-df_tbl %>% select(-".y.")
df_tbl$statistic<-round(df_tbl$statistic,2)
df_tbl$p<-round(df_tbl$p,3)
df_tbl

# A tibble: 5 × 7
  group1 group2    n1    n2 statistic    df     p
  <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl>
1 0      1        182   238     -4      418 0    
2 0      1         27    76      0.27   101 0.789
3 0      1         44    64     -1.05   106 0.296
4 0      1         50    54     -1.73   102 0.087
5 0      1         61    44     -0.7    103 0.483

Districts with fewer English Learners have higher test scores
Districts with lower percent EL (PctEL) have smaller classes
Among districts with comparable PctEL, the effect of class size

is small (recall overall “test score gap” = 7.4)

Causality and regression analysis

The test score/STR/fraction English Learners example shows that, if an omitted variable satisfies the two conditions for omitted variable bias, then the OLS estimator in the regression omitting that variable is biased and inconsistent. So, even if n is large, $\hat{\beta_1}$ will not be close to $\beta_1$ . This raises a deeper question: how do we define $\beta_1$ ? That is, what precisely do we want to estimate when we run a regression?

What precisely do we want to estimate when we run a regression? There are (at least) three possible answers to this question:

We want to estimate the slope of a line through a scatterplot as a simple summary of the data to which we attach no substantive meaning. This can be useful at times, but isn’t very interesting intellectually and isn’t what this course is about.

We want to make forecasts, or predictions, of the value of $Y$ for an entity not in the data set, for which we know the value of $X$.

Forecasting is an important job for economists, and excellent forecasts are possible using regression methods without needing to know causal effects. We will return to forecasting later in the course.

We want to estimate the causal effect on Y of a change in $X$.

This is why we are interested in the class size effect. Suppose the school board decided to cut class size by 2 students per class. What would be the effect on test scores? This is a causal question (what is the causal effect on test scores of STR?) so we need to estimate this causal effect. Except when we discuss forecasting, the aim of this course is the estimation of causal effects using regression methods.

What, precisely, is a causal effect?

“Causality” is a complex concept!
In this course, we take a practical approach to defining causality:
- A causal effect is defined to be the effect measure in an ideal randomized controlled experiment in an ideal randomized controlled experiment.

Ideal Randomized Controlled Experiment

Ideal: subjects all follow the treatment protocol – perfect compliance, no errors in reporting, etc.!
Randomized: subjects from the population of interest are randomly assigned to a treatment or control group (so there are no confounding factors)
Controlled: having a control group permits measuring the differential effect of the treatment
Experiment: the treatment is assigned as part of the experiment: the subjects have no choice, so there is no “reverse causality” in which subjects choose the treatment they think will work best.

Back to class size:

Imagine an ideal randomized controlled experiment for measuring the effect on Test Score of reducing STR…

In that experiment, students would be randomly assigned to classes, which would have different sizes.
Because they are randomly assigned, all student characteristics (and thus $u_i$) would be distributed independently of STRi.
Thus, $E(u_i|STR_i)=0$ – that is, LSA #1 holds in a randomized controlled experiment.

How does our observational data differ from this ideal?

The treatment is not randomly assigned
Consider PctEL – percent English learners – in the district.

It plausibly satisfies the two criteria for omitted variable bias: Z = PctEL is:

a determinant of Y; and
correlated with the regressor X.

Thus, the “control” and “treatment” groups differ in a systematic way, so $corr(STR,PctEL)\neq0$

Randomization + control group means that any differences between the treatment and control groups are random – not systematically related to the treatment

We can eliminate the difference in PctEL between the large (control) and small (treatment) groups by examining the effect of class size among districts with the same PctEL.
- If the only systematic difference between the large and small class size groups is in PctEL, then we are back to the randomized controlled experiment – within each PctEL group.
- This is one way to “control” for the effect of PctEL when estimating the effect of STR.

Return to omitted variable bias

Three ways to overcome omitted variable bias

Run a randomized controlled experiment in which treatment (STR) is randomly assigned: then PctEL is still a determinant of TestScore, but PctEL is uncorrelated with STR. (This solution to OV bias is rarely feasible.)
Adopt the “cross tabulation” approach, with finer gradations of STR and PctEL – within each group, all classes have the same PctEL, so we control for PctEL (But soon you will run out of data, and what about other determinants like family income and parental education?)
Use a regression in which the omitted variable (PctEL) is no longer omitted: include PctEL as an additional regressor in a multiple regression.

The Population Regression Function

Consider the two regressors:

$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i, \ i=1,\dots,n.$

$Y_i$ is dependent variable
$X1$, $X2$ are the two independent variables (regressors)
($Y_i$, $X_{1i}$, $X_{2i}$) denote the ith observation on $Y$, $X1$, and $X2$.
$\beta_0$ = unknown population intercept
$\beta_1$ = effect on Y of a change in X1, holding X2 constant
$\beta_2$ = effect on Y of a change in X2, holding X1 constant
$u_i$ = the regression error (omitted factors)

Interpretation of coefficients in multiple regression

$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i, \ i=1,\dots,n.$
Consider changing $X_1$ by $X_1$ while holding $X_2$ constant: Population regression line before the change: $Y_i = \beta_0 + \beta_1 X_1 + \beta_2 X_2$
Population regression line, after the change: $Y_i+{\Delta Y} = \beta_0 + \beta_1 (X_1+{\Delta X_1}) + \beta_2 X_2$
Difference After minus Before ${\Delta Y} = \beta_1 {\Delta X_1}$

$\beta_1=\frac{\Delta Y}{\Delta X_1}$ holding $X_2$ constant

$\beta_2=\frac{\Delta Y}{\Delta X_2}$ holding $X_1$ constant

$\beta_0 = predicted value of Y when X1 = X2 = 0$. (Theoretically speaking)

The OLS Estimator in Multiple Regression

(SW Section 6.3) With two regressors, the OLS estimator solves: $min_{b_1,b_2}\sum^n_{i = 1} (Y_i - b_0 - b_1 X_{1i}-b_2 X_{2i})^2.$

The OLS estimator minimizes the average squared difference between the actual values of Yi and the prediction (predicted value) based on the estimated line.
This minimization problem is solved using calculus
This yields the OLS estimators of $\beta_0$ , $\beta_1$ and $\beta_2$

Example: the California test score data

Regression of TestScore against STR:

$\widehat{TestScore}= 698.9 – 2.28STR$

Now include percent English Learners in the district (PctEL):

Code

models<-list(mod1<-lm(testscr~str,data = caschool),
mod2<-lm(testscr~str+el_pct,data = caschool))
mod1<-lm(testscr~str,data = caschool)
mod2<-lm(testscr~str+el_pct,data = caschool)
modelsummary(
  models,
  fmt = 2,
  estimate  = c(
                "{estimate} ({std.error}){stars}"),
  statistic = NULL)

	Model 1	Model 2
(Intercept)	698.93 (9.47)***	686.03 (7.41)***
str	−2.28 (0.48)***	−1.10 (0.38)**
el_pct		−0.65 (0.04)***
Num.Obs.	420	420
R2	0.051	0.426
R2 Adj.	0.049	0.424
AIC	3650.5	3441.1
BIC	3662.6	3457.3
Log.Lik.	−1822.250	−1716.561
F	22.575	155.014
RMSE	18.54	14.41

$\widehat{TestScore} = 686.0 – 1.10STR – 0.65PctEL$

What happens to the coefficient on STR? - Why? (Note: $corr(STR, PctEL) = 0.19$)

Regression of TestScore against STR: TestScore

Regress $X_1$ on $X_2$ , $X_3$,…, $X_k$, and let $\tilde{X_1}$ denote the residuals from this regression;
Regress $Y$ on $X_2$ , $X_3$,…, $X_k$, and let $\tilde{Y}$ denote the residuals from this regression; and
Regress $\tilde{Y}$ on $\tilde{X_1}$

Code

y_x2<-lm(testscr~el_pct, data=caschool)
x1_x2<-lm(str~el_pct,data=caschool)
resid1<-y_x2$residuals
resid2<-x1_x2$residuals
resid_mod<-lm(resid1~resid2)
models<-list(mod1, mod2,resid_mod)
modelsummary(models)

	Model 1	Model 2	Model 3
(Intercept)	698.933	686.032	0.000
	(9.467)	(7.411)	(0.705)
str	−2.280	−1.101
	(0.480)	(0.380)
el_pct		−0.650
		(0.039)
resid2			−1.101
			(0.380)
Num.Obs.	420	420	420
R2	0.051	0.426	0.020
R2 Adj.	0.049	0.424	0.017
AIC	3650.5	3441.1	3439.1
BIC	3662.6	3457.3	3451.2
Log.Lik.	−1822.250	−1716.561	−1716.561
F	22.575	155.014	8.407
RMSE	18.54	14.41	14.41

Measures of Fit for Multiple Regression

(SW Section 6.4) $Actual = predicted + residual$ $Y_i = \hat{Y_i} + \hat{u_i}$

SER = std. deviation of $\hat{u_i}$ (with d.f. correction) RMSE = std. deviation of $\hat{u_i}$ (without d.f. correction) $R^2$ = fraction of variance of Y explained by X $\bar{R^2}$ = “adjusted $R^2$” = $R^2$ with a degrees-of-freedom correction that adjusts for estimation uncertainty; $\bar{R^2} <R^2$

SER and RMSE

As in regression with a single regressor, the SER and the RMSE are measures of the spread of the Ys around the regression line:

$SER = \sqrt {\frac{1}{n-k-1} \sum_{i = 1}^n \hat{u}^2_i}$

$RMSE = \sqrt {\frac{1}{n} \sum_{i = 1}^n \hat{u}^2_i}$

$R^2$ and $\bar{R^2}$ (adjusted R2)

The $R^2$ is the fraction of the variance explained – same definition as in regression with a single regressor: $R^2 = {ESS/TSS}=1-{SSR/TSS}$

$\bar{R}^2 = 1-\frac{n-1}{n-k-1} \, \frac{SSR}{TSS}.$

Measures of fit, ctd. Test score example: (1) $\widehat{TestScore}= 698.9 – 2.28STR$
(2) $\widehat{TestScore} = 686.0 – 1.10STR – 0.65PctEL,\\R^2=0.426, \bar{R^2}=0.424, SER=14.5$

What – precisely – does this tell you about the fit of regression (2) compared with regression (1)?
Why are the $R^2$ and $\bar{R^2}$ so close in (2)?

OLS Assumptions in the MLR Model

The multiple regression model is given by $Y_i = \beta_0 + \beta_1 X_{1i} + \beta_1 X_{2i} + \dots + \beta_k X_{ki} + u_i \ , \ i=1,\dots,n.$

The OLS assumptions in the multiple regression model are an extension of the ones made for the simple regression model:

$u_i$ is an error term with conditional mean zero given the regressors, i.e., $E(u_i\vert X_{1i}, X_{2i}, \dots, X_{ki}) = 0.$

2.Regressors $(X_{1i}, X_{2i}, \dots, X_{ki}, Y_i) \ , \ i=1,\dots,n$ are drawn such that the i.i.d. assumption holds.

Large outliers are unlikely, formally $X_{1i},\dots,X_{ki}$ and $Y_i$ have finite fourth moments.
No perfect multicollinearity.

Assumption #1: the conditional mean of u given the

included Xs is zero. $E(u_i\vert X_{1i}, X_{2i}, \dots, X_{ki}) = 0.$ - This has the same interpretation as in regression with a single regressor. - Failure of this condition leads to omitted variable bias, specifically, if an omitted variable (1) belongs in the equation (so is in $u$) and (2) is correlated with an included $X$ then this condition fails and there is OV bias. - The best solution, if possible, is to include the omitted variable in the regression. - A second, related solution is to include a variable that controls for the omitted variable (discussed in Ch. 7)

Assumption #2

$(X_{1i}, X_{2i}, \dots, X_{ki}, Y_i) \ , \ i=1,\dots,n$ This is satisfied automatically if the data are collected by simple random sampling.

Assumption #3: large outliers are rare (finite fourth moments) This is the same assumption as we had before for a single regressor. As in the case of a single regressor, OLS can be sensitive to large outliers, so you need to check your data (scatterplots!) to make sure there are no crazy values (typos or coding errors).

Assumption #4 : There is no perfect multicollinearity

Perfect multicollinearity is when one of the regressors is an exact linear function of the other regressors. Example: Suppose you accidentally include STR twice

	Model 1
(Intercept)	698.933
	(9.467)
str	−2.280
	(0.480)
Num.Obs.	420
R2	0.051
R2 Adj.	0.049
AIC	3650.5
BIC	3662.6
Log.Lik.	−1822.250
F	22.575
RMSE	18.54

Perfect multicollinearity

It occurs when one of the regressors is an exact linear function of the other regressors. - In the previous regression, $\beta_1$ is the effect on TestScore of a unit change in STR, holding STR constant (???) - We will return to perfect (and imperfect) multicollinearity shortly, with more examples… With these least squares assumptions in hand, we now can derive the sampling distribution of $\hat{\beta_1}, \dots,\hat{\beta_k}$

The Sampling Distribution of the OLS Estimator

Under the four Least Squares Assumptions

- The sampling distribution of $\hat{\beta_1}$ has mean $\beta_1$ -$var(\hat{\beta_1}$ is inversely proportional to n.

- Other than its mean and variance, the exact (finite-n) distribution of $\hat{\beta_1}$ is very complicated; but for large n…

$\hat{\beta_1}$ is consistent $\hat\beta_1 \xrightarrow[]{p} \beta_1$ (law of large numbers) is approximately distributed N(0,1) (CLT) oThese statements hold for 1 $\frac{\hat{\beta_1} - \beta_1}{SE(\hat{\beta_1})}$ is approximately N(0,1) (CLT)
These statements hold for $\hat{\beta_1},\dots,\hat{\beta_k}$ Conceptually, there is nothing new here

Multicollinearity, Perfect and Imperfect

Perfect multicollinearity is when one of the regressors is an exact linear function of the other regressors.

Some more examples of perfect multicollinearity

The example from before: you include STR twice,
Regress TestScore on a constant, D, and B, where: $D_i = 1$ if $STR ≤ 20$, $= 0$ otherwise; $B_i = 1$ if $STR >20, = 0$ otherwise, so $B_i = 1 – D_i$ and there is perfect multicollinearity.
Would there be perfect multicollinearity if the intercept (constant) were excluded from this regression? This example is a special case of…

The dummy variable trap

Suppose you have a set of multiple binary (dummy) variables, which are mutually exclusive and exhaustive – that is, there are multiple categories and every observation falls in one and only one category (Freshmen, Sophomores, Juniors, Seniors, Other). If you include all these dummy variables and a constant, you will have perfect multicollinearity – this is sometimes called the dummy variable trap.
- Why is there perfect multicollinearity here? - Solutions to the dummy variable trap: 1. Omit one of the groups (e.g. Senior), or 2. Omit the intercept - What are the implications of (1) or (2) for the interpretation of the coefficients?

Perfect multicollinearity, ctd. Perfect multicollinearity usually reflects a mistake in the definitions of the regressors, or an oddity in the data
- If you have perfect multicollinearity, your statistical software will let you know – either by crashing or giving an error message or by “dropping” one of the variables arbitrarily - The solution to perfect multicollinearity is to modify your list of regressors so that you no longer have perfect multicollinearity.

Imperfect multicollinearity

Imperfect and perfect multicollinearity are quite different despite the similarity of the names. Imperfect multicollinearity occurs when two or more regressors are very highly correlated. - Why the term “multicollinearity”? If two regressors are very highly correlated, then their scatterplot will pretty much look like a straight line – they are “co-linear” – but unless the correlation is exactly $\pm1$, that collinearity is imperfect.

Imperfect multicollinearity, ctd.

Imperfect multicollinearity implies that one or more of the regression coefficients will be imprecisely estimated. - The idea: the coefficient on $X_1$ is the effect of $X_1$ holding $X_2$ constant; but if $X_1$ and $X_2$ are highly correlated, there is very little variation in $X_1$ once $X_2$ is held constant – so the data don’t contain much information about what happens when $X_1$ changes but $X_2$ doesn’t. If so, the variance of the OLS estimator of the coefficient on $X_1$ will be large. - Imperfect multicollinearity (correctly) results in large standard errors for one or more of the OLS coefficients. - The math? See SW, App. 6.2

Hypothesis Testing and Confidence Intervals in Multiple Regressions

Joint Hypothesis Testing Using the F-Statistic

$\widehat{TestScore} = \underset{(15.21)}{649.58} -\underset{(0.48)}{0.29} \times size - \underset{(0.04)}{0.66} \times english + \underset{(1.41)}{3.87} \times expenditure.$

$F-Statistics$ is given as below $F = \frac{(SSR_{\text{restricted}} - SSR_{\text{unrestricted}})/q}{SSR_{\text{unrestricted}} / (n-k-1)}$ with
S S R r e s t r i c t e d being the sum of squared residuals from the restricted regression, i.e., the regression where we impose the restriction. $SSR_{unrestricted}$ is the sum of squared residuals from the full model, $q$ is the number of restrictions under the null and $k$ is the number of regressors in the unrestricted regression. It is fairly easy to conduct $F-tests$ in $R$. We can use the function linearHypothesis()contained in the Hypothesis Testing and Confidence Intervals in Multiple Regressions.

Code

library(car)

Loading required package: carData


Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

Code

# estimate the multiple regression model
model <- lm(testscr ~ str + el_pct + meal_pct, data = caschool)

# execute the function on the model object and provide both linear restrictions 
# to be tested as strings
linearHypothesis(model, c("str=0", "el_pct=0"))

Linear hypothesis test

Hypothesis:
str = 0
el_pct = 0

Model 1: restricted model
Model 2: testscr ~ str + el_pct + meal_pct

  Res.Df   RSS Df Sum of Sq     F  Pr(>F)    
1    418 37303                               
2    416 34298  2    3004.3 18.22 2.6e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To test $\beta_2+\beta_3=1$ we can proceed as follows:

Code

linearHypothesis(model,c('el_pct+meal_pct=1'))

Linear hypothesis test

Hypothesis:
el_pct  + meal_pct = 1

Model 1: restricted model
Model 2: testscr ~ str + el_pct + meal_pct

  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1    417 410435                                  
2    416  34298  1    376136 4562.1 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlations

Code

library(corrr)
caschool|> select(testscr, str, el_pct, meal_pct, calw_pct)|>
  correlate()|>
  autoplot()+geom_text(aes(label=round(r,digits = 2)),size=2)

Fig 7.2

Code

p1<- ggplot(caschool)+aes(x=el_pct,y=testscr)+geom_point()+
  labs(x="percent",y="Test Score")
p2<- ggplot(caschool)+aes(x=meal_pct,y=testscr)+geom_point()+
  labs(x="percent",y="Test Score")
p3<- ggplot(caschool)+aes(x=calw_pct,y=testscr)+geom_point()+
  labs(x="percent",y="Test Score")
library(gridExtra)


Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine

Code

#grid.arrange(p1,p2,p3, nrow=3)
p1

Code

p2

Code

p3

Table 7.1 Regressions of Test Scores on Student-Teacher Ratio

Code

#vtable(caschool)
m1<-lm(testscr~str,data = caschool)
m2<-lm(testscr~str+el_pct,data = caschool)
m3<-lm(testscr~str+el_pct+meal_pct,data = caschool)
m4<-lm(testscr~str+el_pct+calw_pct,data = caschool)
m5<-lm(testscr~str+el_pct+meal_pct+calw_pct,data = caschool)
library(fixest)
library(modelsummary)
models<-list(m1,m2,m3,m4,m5)
library(huxtable)
#etable(m1,m2,m3,m4)
modelsummary(models,estimate = "{estimate}{stars}", output="huxtable")

	Model 1	Model 2	Model 3	Model 4	Model 5
(Intercept)	698.933***	686.032***	700.150***	697.999***	700.392***
	(9.467)	(7.411)	(4.686)	(6.024)	(4.698)
str	-2.280***	-1.101**	-0.998***	-1.308***	-1.014***
	(0.480)	(0.380)	(0.239)	(0.307)	(0.240)
el_pct		-0.650***	-0.122***	-0.488***	-0.130***
		(0.039)	(0.032)	(0.033)	(0.034)
meal_pct			-0.547***		-0.529***
			(0.022)		(0.032)
calw_pct				-0.790***	-0.048
				(0.053)	(0.061)
Num.Obs.	420	420	420	420	420
R2	0.051	0.426	0.775	0.629	0.775
R2 Adj.	0.049	0.424	0.773	0.626	0.773
AIC	3650.5	3441.1	3051.0	3260.7	3052.4
BIC	3662.6	3457.3	3071.2	3280.9	3076.6
Log.Lik.	-1822.250	-1716.561	-1520.499	-1625.328	-1520.188
F	22.575	155.014	476.306	234.638	357.054
RMSE	18.54	14.41	9.04	11.60	9.03

Code

#modelsummary(models, fmt=4)
#modelsummary(models,
 #            statistic = "{std.error} ({p.value})")
#modelsummary(models,
 #            estimate = "{estimate}{stars}",
  #           gof_omit = ".*")

Outline

Omitted Variable Bias

Table 6.1

What, precisely, is a causal effect?

Ideal Randomized Controlled Experiment

Back to class size:

Return to omitted variable bias

The Population Regression Function

Interpretation of coefficients in multiple regression

The OLS Estimator in Multiple Regression

Example: the California test score data

Measures of Fit for Multiple Regression

SER and RMSE

\(R^2\) and \(\bar{R^2}\) (adjusted R2)

OLS Assumptions in the MLR Model

Assumption #1: the conditional mean of u given the

Assumption #2

Assumption #4 : There is no perfect multicollinearity

Perfect multicollinearity

The Sampling Distribution of the OLS Estimator

Multicollinearity, Perfect and Imperfect

The dummy variable trap

Imperfect multicollinearity

Imperfect multicollinearity, ctd.

Hypothesis Testing and Confidence Intervals in Multiple Regressions

Joint Hypothesis Testing Using the F-Statistic

Correlations

Fig 7.2

Table 7.1 Regressions of Test Scores on Student-Teacher Ratio