Associations between Variables
January 29, 2025
Simple example: does treatment have an impact on y?
Or, is there an association between these two variables
We can calculate the means in each treatment group
The treatment effect is the difference in means where T=1 and T=0
Null hypothesis: there is no relationship between treatment and outcome, the difference is due to chance
Alternative hypothesis: there is a relationship, the difference is not due to chance
Under the null hypothesis, treatment has NO impact on y
This means that if we were to change (reshuffle) the values of the treatment variable, the values on y would stay the same.
This means we can simulate the null distribution by:
Let’s move on to a real and more interesting example
Bertrand and Mullainathan studied racial discrimination in responses to job applications in Chicago and Boston. They sent 4,870 resumes, randomly assigning names associated with different racial groups.
Data are in openintro
package as an object called resume
I will save as myDat
Let’s save the means for white and black applicants.
And calculate the treatment effect. The treatment effect is the difference in means.
Before formal hypothesis tests, let’s look at the data–the estimates and the confidence intervals…
We can use tidymodels
for this
The p-value is very small (below .05 threshold)
Therefore, we reject the null hypothesis: the racial gap is extremely unlikely to have occurred due to chance alone
This is evidence of racial discrimination
Use the gender variable in the resume
data to assess whether there is gender discrimination in call backs
Challenge problem: Examine gender discrimination separately for those with and without a college degree (using same process as above). [the variable is college_degree]