research and design methods
Experimental Design 1
Running Head: EXPERIMENTAL DESIGN
Experimental Design and Some Threats to
Experimental Validity: A Primer
Susan Skidmore
Texas A&M University
Paper presented at the annual meeting of the Southwest Educational
Research Association, New Orleans, Louisiana, February 6, 2008.
Experimental Design 2
Abstract
Experimental designs are distinguished as the best method to respond to
questions involving causality. The purpose of the present paper is to explicate
the logic of experimental design and why it is so vital to questions that demand
causal conclusions. In addition, types of internal and external validity threats are
discussed. To emphasize the current interest in experimental designs, Evidence-
Based Practices (EBP) in medicine, psychology and education are highlighted.
Finally, cautionary statements regarding experimental designs are elucidated
with examples from the literature.
Experimental Design 3
The No Child Left Behind Act (NCLB) demands “scientifically based
research” as the basis for awarding many grants in education (2001).
Specifically, the 107th Congress (2001) delineated scientifically-based research
as that which “is evaluated using experimental or quasi-experimental designs”.
Recognizing the increased interest and demand for scientifically-based research
in education policy and practice, the National Research Council released the
publication, Scientific Research in Education (Shavelson & Towne, 2002) a year
after the implementation of NCLB. Almost $5 billion have been channeled to
programs that provide scientifically-based evidence of effective instruction, such
as the Reading First Program (U. S. Department of Education, 2007). With
multiple methods available to education researchers, why does the U. S.
government show partiality to one particular method? The purpose of the
present paper is to explicate the logic of experimental design and why it is so
vital to questions that demand causal conclusions. In addition, types of internal
and external validity threats are discussed. To emphasize the current interest in
experimental designs, Evidence-Based Practices (EBP) in medicine, psychology
and education are highlighted. Finally, cautionary statements regarding
experimental designs are elucidated with examples from the literature.
Experimental Design
An experiment is “that portion of research in which variables are
manipulated and their effects upon other variables observed” (Campbell &
Stanley, 1963, p. 171). Or stated another way, experiments are concerned with
an independent variable (IV) that causes or predicts the outcome of the
Experimental Design 4
dependent variable (DV). Ideally, all other variables are eliminated, controlled or
distributed in such a way that a conclusion that the IV caused the DV is validly
justified.
Figure 1. Diagram of an experiment.
In Figure 1 above you can see that there are two groups. One group
receives some sort of manipulation that is thought (theoretically or from previous
research) to have an impact on the DV. This is known as the experimental group
because participants in this group receive some type of treatment that is
presumed to impact the DV. The other group, which does not receive a treatment
or instead receives some type of alternative treatment, provides the result of
what would have happened without experimental intervention (manipulation of
the IV).
So how do you determine whether participants will be in the control group
or the experimental group? The answer to this question is one of the
characteristics that underlie the strength of true experimental designs. True
experiments must have three essential characteristics: random assignment to
Outcome measured as DV
No manipulation or alternate manipulation of IV (treatment
or intervention)
Control Group
Manipulation of IV (treatment or intervention)
Experimental Group
Experimental Design 5
groups, an intervention given to at least one group and an alternate or no
intervention for at least one other group, and a comparison of group
performances on some post-intervention measurement (Gall, Gall, & Borg,
2005).
Participants in a true experimental design are randomly allocated to either
the control group or the experimental group. A caution is necessary here.
Random assignment is not equivalent to random sampling. Random sampling
determines who will be in the study, while random assignment determines in
which groups participants will be. Random assignment makes “samples
randomly similar to each other, whereas random sampling makes a sample
similar to a population” (Shadish, Cook, & Campbell, 2002, p. 248, emphasis in
original). Nonetheless, random assignment is extremely important. By randomly
assigning participants (or groups of participants) to either the experimental or
control group, each participant (or groups of participants) is as likely to be
assigned to one group as to the other (Gall et al., 2005). In other words, by giving
each participant an equal probability of being a member of each group, random
assignment equates the groups on all other factors, except for the intervention
that is being implemented, thereby ensuring that the experiment will produce
“unbiased estimates of the average treatment effect” (Rosenbaum, 1995, p. 37).
To be clear, the term “unbiased estimates” describes the fact that any observed
effect differences between the study results and the “true” population are due to
chance (Shadish et al., 2002).
Experimental Design 6
This equality of groups assertion is based on the construction of infinite
number of random assignments of participants (or groups of participants) to
treatment groups in the study and not to the single random assignment in the
particular study (Shadish et al., 2002). Thankfully, researchers do not have to
conduct an infinite number of random assignments in an infinite number of
studies for this assumption to hold. The equality of groups‟ assumption is
supported in studies with large sample sizes, but not in studies with very small
sample sizes. This is true due to the law of large numbers. As Boger (2005)
explained, “If larger and larger samples are successively drawn from a population
and a running average calculated after each sample has been drawn, the
sequence of averages will converge to the mean, µ, of the population” (p. 175). If
the reader is interested in exploring this concept further, the reader is directed to
George Boger‟s article that details how to create a spreadsheet simulation of the
law of large numbers. In addition, a medical example of this is found in
Observational Studies (Rosenbaum, 1995, pp. 13-15).
To consider the case of small sample size, let us suppose that I have a
sample of 10 graduate students that I am going to randomly assign to one of two
treatment groups. The experimental group will have regularly scheduled graduate
advisor meetings to monitor students‟ educational progress. The control group
will not have regularly scheduled graduate advisor meetings. Just to see what
happens, I choose to do several iterations of this random assignment process. Of
course, I discover that the identity of the members in the groups across iterations
is wildly different.
Experimental Design 7
Recognizing that most people are outliers on at least some variables
(Thompson, 2006), there may be some observed differences that are due simply
to the variable characteristics of the members of the treatment groups. For
example, let‟s say that six of the ten graduate students are chronic
procrastinators, and might benefit greatly from regular scheduled visits with a
graduate advisor, while four of the ten graduate students are intrinsically
motivated and tend to experience increased anxiety with frequent graduate
advisor inquiries. If the random assignment process distributes these six
procrastinator graduate students equally among the two groups, a bias due to
this characteristic will not evidence itself in the results. If instead, due to chance
all four intrinsically motivated students end up in the experimental group, the
results of the study may not be the same had the groups been more evenly
distributed. Ridiculously small sample sizes, therefore would result in more
pronounced differences between the groups that are not due to treatment effects,
but instead are due to the variable characteristics of the members in the groups.
If instead I have a sample of 10,000 graduate students that that I am going
to randomly assign to one of two treatment groups, the law of large numbers
works for me. As explained by Thompson et al. (2005), “The beauty of true
experiments is that the law of large numbers creates preintervention group
equivalencies on all variables, even variables that we do not realize are essential
to control” (p. 183). While there is still not identical membership across treatment
groups, and I still expect that the observed differences between the control group
and the experimental group are going to be due to any possible treatment effects
Experimental Design 8
and to the error associated with the random assignment process, the expectation
of equality of groups is nevertheless reasonably approximated. In other words, I
expect the ratio of procrastinators to intrinsically motivated students to be
approximately the same across the two treatment groups. In fact, I expect
proportions of variables I am not even aware of to be the same, on average,
across treatment groups!
The larger sample size has greatly decreased the error due to chance
associated with the random assignment process. As you can see in Figure 2,
even if both of the sample studies produce identical treatment effects, the results
are not equally valid. The majority of the effect observed in the small sample
size study is actually due to error associated with the random assignment
process and not a result of the treatment. This effect due to error is greatly
reduced in the large sample size study.
Figure…
