Will focus on this later in class: getting data and using it in R
In a deeper sense
There is a process that generates the data we work with
When analyzing and drawing conclusions from data, we need to keep this process in mind
Key aspects of the process
Sampling: how were the units that we are examining selected?
Research Design: does our design allow us to draw causal conclusions?
Measurement: do the measures we are examining really capture the concepts/constructs/outcomes that we think they are
Will come back to these issues throughout course, but important to mention at the outset and always consider
We need to learn about the data we are using
We need to be critical about the data we are using (or that others are using)
How were units selected into our data?
Two studies examine a civic education program and use a survey to understand satisfaction with and other attitudes about the program.
Study 1: Some participants volunteer to answer survey questions after the program is completed.
Study 2: Participants are randomly selected to answer survey questions after the program is completed.
Which do we prefer? Why?
How were units selected into our data?
Some violent events datasets rely on newspaper reports (and web scraping) to identify specific instances of and locations of violence in specific countries.
What are the costs and benefits of this approach?
How were units selected into our data?
We usually have a population that we are interested in learning about
We need to think about whether the sample we have (the specific rows in our dataset) is useful for teaching us about that population
More on this in future classes!
Causal Conclusions
In a post-conflict reconciliation program, program participants were surveyed about their attitudes about out-group members right before the program. Six months later, they were surveyed again. Program participants were more favorable toward out-group members six months later.
Is this evidence that the program caused an improvement in out-group attitudes? Why or why not?
Causal Conclusions
Researchers have long noticed that, on average, wealthier countries are more democratic than poorer countries.
Is this evidence that wealth causes democracy? Why or why not?
Causal Conclusions
Two studies want to know whether an education program improves employment prospects.
Study 1: Some participants are randomly assigned to the program while others are not (in the control group). The employment rates of participants and non-participants are compared at the end of the study to determine program impact.
Study 2: Participants apply to be part of the program. The employment rate of participants is compared to the employment rate of a set of randomly selected non-participants at the end of the study to determine program impact.
Which do we prefer? Why?
Measurement
In a voter turnout study, participants are randomly assigned to receive significant encouragement from a civic organization to turn out to vote (or to be in control).
To measure program impact, those in the study are asked after the election whether they voted or not.
What do we think of this measurement strategy?
Measurement
A democracy organization wants to generate a measure of how democratic every country in the world is. To do so, they send survey questions to professors at universities in the United States. They use answers to the questions to generate their measures.
What do we think of this measurement strategy?
Big picture
Always investigate where your data come from
Ask questions about this and be critical when consuming data
Will come back to some of these themes in more detail
country_name country_text_id year v2x_polyarchy
1 Mexico MEX 2000 0.671
2 Suriname SUR 2000 0.783
3 Sweden SWE 2000 0.914
4 Switzerland CHE 2000 0.888
5 Ghana GHA 2000 0.667
6 South Africa ZAF 2000 0.745
country_name country_text_id year v2x_polyarchy
1 Mexico MEX 2000 0.671
2 Mexico MEX 2001 0.682
3 Suriname SUR 2000 0.783
4 Suriname SUR 2001 0.781
5 Sweden SWE 2000 0.914
6 Sweden SWE 2001 0.914