Inference Review

May 19, 2025

Example Study


Fujiwara, Thomas, and Wantchekon, Leonard. “Can Informed Public Deliberation Overcome Clientelism? Experimental Evidence from Benin.”


SUMMARY: This paper studies the electoral effects of town hall meetings based on programmatic, nonclientelist platforms. The experiment involves the cooperation of leading candidates in a presidential election in Benin. A campaign strategy based solely on these meetings was assigned to randomly selected villages and compared to the standard strategy of clientelist rallies. We find that treatment reduces the prevalence of clientelism and does not affect turnout. Treatment also lowers the vote shares for the candidate with a political stronghold in the village and is more effective in garnering votes in regions where a candidate does not have a political stronghold.

Read in replication data

library(tidyverse)
library(estimatr)
library(haven)
library(knitr)
library(tidymodels)
library(haven)
repData <- read_dta("data/survey_data_AEJ.dta") %>% 
  mutate(treat = factor(treat)) %>% 
  select(depcom, treat, index) %>% arrange(depcom)
head(repData)
# A tibble: 6 × 3
  depcom treat  index
   <dbl> <fct>  <dbl>
1      1 0     -0.627
2      1 1     -0.513
3      2 0      0.137
4      2 1     -0.283
5     13 0     -0.170
6     13 1     -0.486

Examine Design


We have 24 villages, 12 in treatment and 12 in control.

repData %>% 
  count(treat)
# A tibble: 2 × 2
  treat     n
  <fct> <int>
1 0        12
2 1        12

Means in treatment and control

repData %>% 
  group_by(treat) %>% 
  summarize(group_means = round(mean(index), 3))  
# A tibble: 2 × 2
  treat group_means
  <fct>       <dbl>
1 0           0    
2 1          -0.227

Calculate 95% CIs

First, estimate ATE and 95% CI

ests <- repData %>% 
    group_by(treat) %>% 
      summarize(
       mean = mean(index),
    sd = sd(index),
    se = sd/sqrt(n()-1),
    conf.low = mean - 1.96*se,
    conf.high = mean + 1.96*se
  )  %>% 
  select(treat, mean, conf.low, conf.high)

Interpretation

ests %>% kable(digits = 2)
treat mean conf.low conf.high
0 0.00 -0.36 0.36
1 -0.23 -0.50 0.05

Plot CIs

Code
library(lemon)
ests %>% 
  ggplot(., aes(y = mean, x = treat, ymin = conf.low, ymax = conf.high)) +
  geom_point(color = "steelblue4") +
  geom_errorbar(width = 0.05, color = "steelblue4") +
  theme_bw() +
  labs(x = "Treatment Group", y = "Mean Clientelism") +
  scale_y_symmetric(mid = 0)  +
  geom_hline(yintercept = 0, linetype =1, color = "grey")

Treatment Effect Estimate

ate_estimate <- repData %>%
  specify(response = index, explanatory = treat) %>%
  calculate(stat = "diff in means", 
            order = c(1, 0))
ate_estimate
Response: index (numeric)
Explanatory: treat (factor)
# A tibble: 1 × 1
    stat
   <dbl>
1 -0.227

Hypotheses

  • What is the null hypothesis?

  • What is the alternative hypothesis?

What will we do to test?

What will we do to test?

  • Reshuffle the treatment variable (permutation)
  • Calculate treatment effect
  • Repeat MANY times
  • Generates distribution of estimated treatment effects we would observe if the null hypothesis is true
  • Will use this distribution to see how likely we would be to observe our treatment effect, if the null hypothesis is true

Analysis

library(tidymodels)
null_dist <- repData %>%
  specify(response = index, explanatory = treat)%>%
  hypothesize(null = "independence") %>%
  generate(5000, type = "permute") %>%
  calculate(stat = "diff in means", 
            order = c("1", "0"))

Plot

Calculate the p-value: single-tailed test

Prob of getting a value equal to or less than -0.2272066

mean(null_dist$stat <= ate_estimate$stat)
[1] 0.1498

Calculate the p-value: two-tailed test

Probability of getting an ATE at least as large in absolute value than our actual estimate

mean(abs(null_dist$stat) >= abs(ate_estimate$stat))
[1] 0.3076

What should we conclude?

Other approaches

Difference-in-means test

library(infer)
repData %>% 
  t_test(x = ., response = index, 
         explanatory = treat, 
         order = c(1, 0)) %>% 
  select(estimate, p_value, lower_ci, upper_ci)
# A tibble: 1 × 4
  estimate p_value lower_ci upper_ci
     <dbl>   <dbl>    <dbl>    <dbl>
1   -0.227   0.315   -0.687    0.232

Other approaches

Using regression (our next topic!)

repData %>% 
  lm_robust(formula = index ~ treat, data = .) %>% 
  tidy()  %>% select(term, estimate, p.value, conf.low, conf.high) %>% kable(digits = 2)
term estimate p.value conf.low conf.high
(Intercept) 0.00 1.00 -0.37 0.37
treat1 -0.23 0.31 -0.68 0.23

Regression AND accounting for their research design


repData %>% 
  lm_robust(formula = index ~ treat, fixed_effects = depcom, data = .) %>% 
  tidy()  %>% 
  select(term, estimate, p.value, conf.low, conf.high) %>% kable(digits = 2)
term estimate p.value conf.low conf.high
treat1 -0.23 0.02 -0.4 -0.05

Break

Next: Homework 2