library(tidyverse)
jobs_program <- tibble(
outcome = c(rep("unemployed", 15), rep("employed", 45))
)
glimpse(jobs_program)
Rows: 60
Columns: 1
$ outcome <chr> "unemployed", "unemployed", "unemployed", "unemployed", "unemp…
A single proportion/mean
January 29, 2025
International development organizations are sometimes interested in providing training to people in order to help them find a job.
Imagine the unemployment rate in a low-income country is 30%
One organization claimed that its jobs training program is a success because only 15 of the 60 people that they trained did not have a job [25% unemployment rate]
What should we think about this claim? Is this a successful program?
Is it possible to assess the organization’s claim using the data and information presented thus far?
“Our jobs program is a success because only 15 of the 60 people that we trained did not have a job. Thus our 25% unemployment rate beats the country’s unemployment rate of 30%.”
No.
We need to know more about how people were selected for the program in order to assess causality [e.g., were they randomly assigned]
We can still ask whether the unemployment rate of 25 percent could be due to random chance
We are going to assume “nothing is going on”
We are going to figure out what the distribution of outcomes we we might observe could be if nothing is going on
We will assess how likely we would be to observe our data if nothing is going on
Unemployment rate among those in the jobs program is no different than the country average of 30%.
Unemployment rate is lower than the country average of 30%.
Hypothesis test: If the null hypothesis were true, is the data we have in our sample likely to have been generated by chance (due to random variability)?
If yes, we do NOT reject the null hypothesis
If not very likely, we reject the null hypothesis
Conduct a hypothesis test under the assumption that the null hypothesis is true and calculate a p-value (probability of observed or more extreme outcome given that the null hypothesis is true)
When sampling from the null distribution, what is the expected proportion of success (unemployment)?
Will we get 0.30 in every draw of 60?
sim1
employed unemployed
42 18
[1] 0.3
sim2
employed unemployed
41 19
[1] 0.3166667
sim3
employed unemployed
38 22
[1] 0.3666667
Response: outcome (factor)
Null Hypothesis: point
# A tibble: 2,000 × 2
replicate stat
<dbl> <dbl>
1 1 0.367
2 2 0.2
3 3 0.283
4 4 0.2
5 5 0.4
6 6 0.317
7 7 0.3
8 8 0.333
9 9 0.283
10 10 0.25
# ℹ 1,990 more rows
# A tibble: 1 × 1
mean
<dbl>
1 0.301
p-value: in what % of the simulations was the simulated sample proportion at least as extreme as the observed sample proportion?
Conventionally, people use a p-value of 0.05 as a cutoff (“signifigance level”) for determining “statistical significance”
Always remember that this is a convention
When people report “statistically significant” results, they mean that the p-value from their analysis is less than 0.05
Our finding: if the true unemployment rate were 30 percent and we draw samples of 60, about 23 percent of the time we will get an unemployment rate lower than the one among the participants in the program (simply due to random chance)
What should we conclude?