In an experimental design, we distinguish between random and fixed factors. The “levels” of the random factors are quasi-random samples from a population of persons (subjects) or material (items). To avoid confusion with levels of fixed factors we will refer to levels of random factors as instances. For fixed factors, usual (quasi-)experimental ones, we must specify whether they are between- or within-subjects and between- or within-items. For a given fixed factor all four combinations are possible in principle. We also need to decide on a counterbalancing scheme; a common example is a Latin square applied to all or a subset of the factors.
In this vignette, we illustrate how to set up an experiment using subject (Subj) and item (Item) as random factors. In this fictive experiment, the words of a text are presented serially one at a time at a slow, medium, or high rate (i.e., fixed factor Speed with three levels) at the center of the screen. A second factor is cognitive load varying whether subjects have keep six digits in memory while reading or not (i.e., fixed factor Load with two levels yes and no).
Typically, in such an experiment, (1) each subject reads different texts (Item) in the 2 x 3 experimental conditions; (2) each subject reads the same number of texts in each condition; (3) across subjects each text (item) is presented equally often in the six experimental conditions. The final example in this vignette implements this design. However, for didactic reasons, we first show how within/between-subject and within/between-item features of the factors are specified without counterbalancing. These designs are preferred if repeated exposure to the same stimuli in an experimental condition does not have any confounding effects on the measure.
In the first version of the experimental design, each subject sees each text in each of 2 x 3 experimental conditions. Thus, the two fixed factors Speed and Load are both within-subject and within-item. A minimum of six subjects and six texts is required for a complete within-subject/within-item design. We summarize the design with the design formula:
Load(2) x Speed(3) x 6 Item x 6 Subj
This is a completely crossed design. Note the difference between specification of levels for fixed and instances for random factors. The product of numbers in the formula informs about the number of observations generated by the design. In this case: 216 observations.
design1 <-
fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
fixed.factor("Load", levels=c("yes", "no")) +
random.factor("Subj", instances=6) +
random.factor("Item", instances=6)
codes1 <- arrange(design.codes(design1), Subj, Item)[c(3, 4, 2, 1)]
codes1
## # A tibble: 216 x 4
## Speed Load Item Subj
## <fct> <fct> <fct> <fct>
## 1 slow yes Item1 Subj1
## 2 medium yes Item1 Subj1
## 3 fast yes Item1 Subj1
## 4 slow no Item1 Subj1
## 5 medium no Item1 Subj1
## 6 fast no Item1 Subj1
## 7 slow yes Item2 Subj1
## 8 medium yes Item2 Subj1
## 9 fast yes Item2 Subj1
## 10 slow no Item2 Subj1
## # … with 206 more rows
## # A tibble: 10 x 4
## Speed Load Item Subj
## <fct> <fct> <fct> <fct>
## 1 fast yes Item5 Subj6
## 2 slow no Item5 Subj6
## 3 medium no Item5 Subj6
## 4 fast no Item5 Subj6
## 5 slow yes Item6 Subj6
## 6 medium yes Item6 Subj6
## 7 fast yes Item6 Subj6
## 8 slow no Item6 Subj6
## 9 medium no Item6 Subj6
## 10 fast no Item6 Subj6
## Speed
## Load slow medium fast
## yes 36 36 36
## no 36 36 36
## , , Speed = slow
##
## Load
## Subj yes no
## Subj1 6 6
## Subj2 6 6
## Subj3 6 6
## Subj4 6 6
## Subj5 6 6
## Subj6 6 6
##
## , , Speed = medium
##
## Load
## Subj yes no
## Subj1 6 6
## Subj2 6 6
## Subj3 6 6
## Subj4 6 6
## Subj5 6 6
## Subj6 6 6
##
## , , Speed = fast
##
## Load
## Subj yes no
## Subj1 6 6
## Subj2 6 6
## Subj3 6 6
## Subj4 6 6
## Subj5 6 6
## Subj6 6 6
## , , Speed = slow
##
## Load
## Item yes no
## Item1 6 6
## Item2 6 6
## Item3 6 6
## Item4 6 6
## Item5 6 6
## Item6 6 6
##
## , , Speed = medium
##
## Load
## Item yes no
## Item1 6 6
## Item2 6 6
## Item3 6 6
## Item4 6 6
## Item5 6 6
## Item6 6 6
##
## , , Speed = fast
##
## Load
## Item yes no
## Item1 6 6
## Item2 6 6
## Item3 6 6
## Item4 6 6
## Item5 6 6
## Item6 6 6
The first command generates the list design1
. The function design.codes()
extracts the generated variable coding as a dataframe in the tibble format. After resorting and rearranging the variables, the code is converted to the long format (i.e, N=216). Obviously, having subjects read each text six times may lead to practice effects that would need to be taken into account by counterbalancing the order in which texts are presented across subjects.
In the second example, we replace the factor Load with a factor Type of text. We assume that Items 1 to 3 are simple texts and items 4 to 6 are complex texts. Subjects read both simple and complex texts; Type of text is a within-subject factor. Each text (item), however is either simple or complex. Thus, Type is a between-item factor in this design.
Such a design is realized by specifying Type with the groups argument in the corresponding random.factor() command. We generate 3 items (instances) within each of the two levels of the factor Type, that is, as in the first example, we will have again six different items.
Design formula:
Type(2) x Speed(3) x 3 Item[Type] x 6 Subj
We read the item-part of this formula: “3 Items nested under levels of Type.” The total number of different instances for the random factor Item is 3 items x 2 levels of Type, that is 6 items. The design generates 108 observations; it is no longer completely crossed.
design2 <- fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
fixed.factor("Type", levels=c("simple", "complex")) +
random.factor("Subj", instances=6) +
random.factor("Item", groups="Type", instances=3)
codes2 <- arrange(design.codes(design2), Subj, Item)[c(3, 4, 1, 2)]
codes2
## # A tibble: 108 x 4
## Speed Type Subj Item
## <fct> <fct> <fct> <fct>
## 1 slow complex Subj1 Item1
## 2 medium complex Subj1 Item1
## 3 fast complex Subj1 Item1
## 4 slow complex Subj1 Item2
## 5 medium complex Subj1 Item2
## 6 fast complex Subj1 Item2
## 7 slow complex Subj1 Item3
## 8 medium complex Subj1 Item3
## 9 fast complex Subj1 Item3
## 10 slow simple Subj1 Item4
## # … with 98 more rows
## Type
## Item simple complex
## Item1 0 18
## Item2 0 18
## Item3 0 18
## Item4 18 0
## Item5 18 0
## Item6 18 0
## Type
## Subj simple complex
## Subj1 9 9
## Subj2 9 9
## Subj3 9 9
## Subj4 9 9
## Subj5 9 9
## Subj6 9 9
#xtabs( ~ Subj + Item + Type + Speed, codes2)
#xtabs(~ Type + Speed, codes2)
#xtabs(~ Subj + Type + Speed, codes2)
#xtabs(~ Item + Type + Speed, codes2)
The tables shows that for Items 1 to 3 all available codes for the factor Type are complex and for Items 4 to 6 all codes are simple. Thus, Type is varied between items. Each item is read three times (three levels of Speed) by six subjects. yielding 18 codes in each of the 6 non-zero cells of the Item x Type table.
Conversely, for all six subjects codes are available for simple and complex items. Thus, Type is varied within subjects. Each text is read three times (i.e., the three speed rates). Therefore, there are 3 texts x 3 levels of speed = 9 codes in each cell of the Subj x Type table.
The command to specify Speed as between_item factor would be:
random.factor("Item", groups="Speed", instances=2)
```
We need 2 instances within each of the 3 levels of _Speed_ to obtain 6 items in total.
**Design formula:**
```
Type(2) x Speed(3) x 2 Item[Speed] x 6 Subj
The total number of items is 2 x 3 = 6. This design generates 72 observations.
In this example, we replace the factor Load (or Type) with a between-subject factor Age, assuming that half the subjects are young and the other half old.
design3 <-
fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
fixed.factor("Age", levels=c("young", "old")) +
random.factor("Item", instances=6) +
random.factor("Subj", groups="Age", instances=3)
codes3 <- arrange(design.codes(design3), Subj, Item)[c(4, 3, 2, 1)]
codes3
## # A tibble: 108 x 4
## Age Speed Subj Item
## <fct> <fct> <fct> <fct>
## 1 old slow Subj1 Item1
## 2 old medium Subj1 Item1
## 3 old fast Subj1 Item1
## 4 old slow Subj1 Item2
## 5 old medium Subj1 Item2
## 6 old fast Subj1 Item2
## 7 old slow Subj1 Item3
## 8 old medium Subj1 Item3
## 9 old fast Subj1 Item3
## 10 old fast Subj1 Item4
## # … with 98 more rows
## Age
## Subj young old
## Subj1 0 18
## Subj2 0 18
## Subj3 0 18
## Subj4 18 0
## Subj5 18 0
## Subj6 18 0
## Age
## Item young old
## Item1 9 9
## Item2 9 9
## Item3 9 9
## Item4 9 9
## Item5 9 9
## Item6 9 9
The tables show that subjects 1 to 3 are old and subjects 4 to 6 are young (i.e., Age is a between-subject factor) and that all items are read by young and old subjects (i.e., Age is a within-item factor). The formula for this design can be written as: Age(2) x Speed(3) x 6 Item x 3 Subj[Age], yielding 108 observations.
Note that instances specifies the number of instances within groups. To generate code for 25 young and 25 old subjects (i.e., total N=50), we set instances=25
.
Design formula:
Age(2) x Speed(3) x 6 Item x 25 Subj[Age]
The total number of subjects is 25 x 2 = 50. This design generates 900 observations.
Continuing with the last example, it may also make sense to vary not only Age, but als Speed between subjects. Thus, every subject is either old or young (i.e., a quasi-experimental factor) and is randomly assigned to one of the three Speed conditions (i.e., an experimental factor). For this specification the two factors are included as a vector for the groups
argument. For the minimal design we need only 1 instance because 2 x 3 = 6. This means we generate codes for 1 subject in each of the six design cells, but each subjects reads each text in this condition (i.e., there are six measures for each subject.) To get code for 10 subjects in each of the 2 x 3 = 6 design cells (i.e., a total of 60 subjects), we set instances=10
.
Design formula:
Age(2) x Speed(3) x 6 Item x 10 Subj[Age x Speed]
The total number of subjects is 10 x 2 x 3 = 60. This design generates 360 observations.
design4 <-
fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
fixed.factor("Age", levels=c("simple", "complex")) +
random.factor("Subj", groups=c("Age", "Speed"), instances=10) +
random.factor("Item", instances=6)
codes4 <- arrange(design.codes(design4), Subj, Item)[c(3, 4, 2, 1)]
codes4
## # A tibble: 360 x 4
## Speed Age Item Subj
## <fct> <fct> <fct> <fct>
## 1 fast complex Item1 Subj01
## 2 fast complex Item2 Subj01
## 3 fast complex Item3 Subj01
## 4 fast complex Item4 Subj01
## 5 fast complex Item5 Subj01
## 6 fast complex Item6 Subj01
## 7 fast complex Item1 Subj02
## 8 fast complex Item2 Subj02
## 9 fast complex Item3 Subj02
## 10 fast complex Item4 Subj02
## # … with 350 more rows
## Age
## Subj simple complex
## Subj01 0 6
## Subj02 0 6
## Subj03 0 6
## Subj04 0 6
## Subj05 0 6
## Subj06 0 6
## Subj07 0 6
## Subj08 0 6
## Subj09 0 6
## Subj10 0 6
## Subj11 0 6
## Subj12 0 6
## Subj13 0 6
## Subj14 0 6
## Subj15 0 6
## Subj16 0 6
## Subj17 0 6
## Subj18 0 6
## Subj19 0 6
## Subj20 0 6
## Subj21 0 6
## Subj22 0 6
## Subj23 0 6
## Subj24 0 6
## Subj25 0 6
## Subj26 0 6
## Subj27 0 6
## Subj28 0 6
## Subj29 0 6
## Subj30 0 6
## Subj31 6 0
## Subj32 6 0
## Subj33 6 0
## Subj34 6 0
## Subj35 6 0
## Subj36 6 0
## Subj37 6 0
## Subj38 6 0
## Subj39 6 0
## Subj40 6 0
## Subj41 6 0
## Subj42 6 0
## Subj43 6 0
## Subj44 6 0
## Subj45 6 0
## Subj46 6 0
## Subj47 6 0
## Subj48 6 0
## Subj49 6 0
## Subj50 6 0
## Subj51 6 0
## Subj52 6 0
## Subj53 6 0
## Subj54 6 0
## Subj55 6 0
## Subj56 6 0
## Subj57 6 0
## Subj58 6 0
## Subj59 6 0
## Subj60 6 0
## Speed
## Subj slow medium fast
## Subj01 0 0 6
## Subj02 0 0 6
## Subj03 0 0 6
## Subj04 0 0 6
## Subj05 0 0 6
## Subj06 0 0 6
## Subj07 0 0 6
## Subj08 0 6 0
## Subj09 0 6 0
## Subj10 0 6 0
## Subj11 0 6 0
## Subj12 0 6 0
## Subj13 0 6 0
## Subj14 0 0 6
## Subj15 0 0 6
## Subj16 0 0 6
## Subj17 0 6 0
## Subj18 0 6 0
## Subj19 0 6 0
## Subj20 0 6 0
## Subj21 6 0 0
## Subj22 6 0 0
## Subj23 6 0 0
## Subj24 6 0 0
## Subj25 6 0 0
## Subj26 6 0 0
## Subj27 6 0 0
## Subj28 6 0 0
## Subj29 6 0 0
## Subj30 6 0 0
## Subj31 0 0 6
## Subj32 0 0 6
## Subj33 0 0 6
## Subj34 0 0 6
## Subj35 0 0 6
## Subj36 0 0 6
## Subj37 0 0 6
## Subj38 0 0 6
## Subj39 0 0 6
## Subj40 0 0 6
## Subj41 0 6 0
## Subj42 0 6 0
## Subj43 0 6 0
## Subj44 0 6 0
## Subj45 6 0 0
## Subj46 6 0 0
## Subj47 6 0 0
## Subj48 0 6 0
## Subj49 0 6 0
## Subj50 0 6 0
## Subj51 0 6 0
## Subj52 0 6 0
## Subj53 0 6 0
## Subj54 6 0 0
## Subj55 6 0 0
## Subj56 6 0 0
## Subj57 6 0 0
## Subj58 6 0 0
## Subj59 6 0 0
## Subj60 6 0 0
## Age
## Item simple complex
## Item1 30 30
## Item2 30 30
## Item3 30 30
## Item4 30 30
## Item5 30 30
## Item6 30 30
## Speed
## Item slow medium fast
## Item1 20 20 20
## Item2 20 20 20
## Item3 20 20 20
## Item4 20 20 20
## Item5 20 20 20
## Item6 20 20 20
The tables show that Age and Speed vary indeed between subjects and within items.
In this final example, we modify the very first example such that each subject reads one different texts in each of the six conditions, respecting the constraint that design cells are counterbalanced (i.e., each text is read equally often in each condition, each subject reads the same number of texts in each condition).
For this implementation we (1) add a third random factor defined as Subj-by-Item and (2) specify factors Speed and Load as varying between Subj-by-Item.
We start with the minimal design of 6 subjects reading 6 texts.
Design formula:
Speed(3) x Load(2) x 1 Item[Speed x Load] x 1 Subj[Speed x Load] x
(3 x 2) Item-by-Subj[Speed x Load x Item[Speed x Load] + Subj[Speed x Load]]
We have 1 item and 1 subject nested under the levels of the Speed x Load design. There are 36 instances of the random factor resulting from the multiplication of the random factors Item and Subj. The design generates 3 x 2 x 1 x 1 x (3 x 2) 36 observations.
design5 <-
fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
fixed.factor("Load", levels=c("simple", "complex")) +
random.factor("Subj", instances=1) +
random.factor("Item", instances=1) +
random.factor(c("Subj", "Item"), groups=c("Speed", "Load"))
codes5 <- arrange(design.codes(design5), Subj, Item)[c(3, 4, 1, 2)]
codes5
## # A tibble: 36 x 4
## Speed Load Subj Item
## <fct> <fct> <fct> <fct>
## 1 slow simple Subj1 Item1
## 2 slow complex Subj1 Item2
## 3 medium simple Subj1 Item3
## 4 medium complex Subj1 Item4
## 5 fast simple Subj1 Item5
## 6 fast complex Subj1 Item6
## 7 slow complex Subj2 Item1
## 8 medium simple Subj2 Item2
## 9 medium complex Subj2 Item3
## 10 fast simple Subj2 Item4
## # … with 26 more rows
## , , Load = simple
##
## Speed
## Subj slow medium fast
## Subj1 1 1 1
## Subj2 1 1 1
## Subj3 1 1 1
## Subj4 1 1 1
## Subj5 1 1 1
## Subj6 1 1 1
##
## , , Load = complex
##
## Speed
## Subj slow medium fast
## Subj1 1 1 1
## Subj2 1 1 1
## Subj3 1 1 1
## Subj4 1 1 1
## Subj5 1 1 1
## Subj6 1 1 1
## , , Load = simple
##
## Speed
## Item slow medium fast
## Item1 1 1 1
## Item2 1 1 1
## Item3 1 1 1
## Item4 1 1 1
## Item5 1 1 1
## Item6 1 1 1
##
## , , Load = complex
##
## Speed
## Item slow medium fast
## Item1 1 1 1
## Item2 1 1 1
## Item3 1 1 1
## Item4 1 1 1
## Item5 1 1 1
## Item6 1 1 1
## , , Load = yes, Speed = slow
##
## Item
## Subj Item1 Item2 Item3 Item4 Item5 Item6
## Subj1 1 1 1 1 1 1
## Subj2 1 1 1 1 1 1
## Subj3 1 1 1 1 1 1
## Subj4 1 1 1 1 1 1
## Subj5 1 1 1 1 1 1
## Subj6 1 1 1 1 1 1
##
## , , Load = no, Speed = slow
##
## Item
## Subj Item1 Item2 Item3 Item4 Item5 Item6
## Subj1 1 1 1 1 1 1
## Subj2 1 1 1 1 1 1
## Subj3 1 1 1 1 1 1
## Subj4 1 1 1 1 1 1
## Subj5 1 1 1 1 1 1
## Subj6 1 1 1 1 1 1
##
## , , Load = yes, Speed = medium
##
## Item
## Subj Item1 Item2 Item3 Item4 Item5 Item6
## Subj1 1 1 1 1 1 1
## Subj2 1 1 1 1 1 1
## Subj3 1 1 1 1 1 1
## Subj4 1 1 1 1 1 1
## Subj5 1 1 1 1 1 1
## Subj6 1 1 1 1 1 1
##
## , , Load = no, Speed = medium
##
## Item
## Subj Item1 Item2 Item3 Item4 Item5 Item6
## Subj1 1 1 1 1 1 1
## Subj2 1 1 1 1 1 1
## Subj3 1 1 1 1 1 1
## Subj4 1 1 1 1 1 1
## Subj5 1 1 1 1 1 1
## Subj6 1 1 1 1 1 1
##
## , , Load = yes, Speed = fast
##
## Item
## Subj Item1 Item2 Item3 Item4 Item5 Item6
## Subj1 1 1 1 1 1 1
## Subj2 1 1 1 1 1 1
## Subj3 1 1 1 1 1 1
## Subj4 1 1 1 1 1 1
## Subj5 1 1 1 1 1 1
## Subj6 1 1 1 1 1 1
##
## , , Load = no, Speed = fast
##
## Item
## Subj Item1 Item2 Item3 Item4 Item5 Item6
## Subj1 1 1 1 1 1 1
## Subj2 1 1 1 1 1 1
## Subj3 1 1 1 1 1 1
## Subj4 1 1 1 1 1 1
## Subj5 1 1 1 1 1 1
## Subj6 1 1 1 1 1 1
Number of subjects and items increase by six with each increment of the value of the instances
argument. For example,
...
random.factor("Subj", instances=10) +
random.factor("Item", instances= 4) +
...
will generate codes for 60 subjects and 24 texts.
design6 <-
fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
fixed.factor("Load", levels=c("simple", "complex")) +
random.factor("Subj", instances=10) +
random.factor("Item", instances=4) +
random.factor(c("Subj", "Item"), groups=c("Speed", "Load"))
codes6 <- arrange(design.codes(design6), Subj, Item)[c(3, 4, 1, 2)]
codes6
## # A tibble: 1,440 x 4
## Speed Load Subj Item
## <fct> <fct> <fct> <fct>
## 1 slow simple Subj01 Item01
## 2 slow complex Subj01 Item02
## 3 medium simple Subj01 Item03
## 4 medium complex Subj01 Item04
## 5 fast simple Subj01 Item05
## 6 fast complex Subj01 Item06
## 7 slow simple Subj01 Item07
## 8 slow complex Subj01 Item08
## 9 medium simple Subj01 Item09
## 10 medium complex Subj01 Item10
## # … with 1,430 more rows
## [1] 60
## [1] 24
## [1] 1440
Design formula:
Speed(3) x Load(2) x 4 Item[Speed x Load] x 10 Subj[Speed x Load] x
(3 x 2) Item-by-Subj[Speed x Load x Item[Speed x Load] x Subj[Speed x Load]]
The total number of items is 4 x 3 x 2 = 24; the total number of subjects is 10 x 3 x 2 = 60. The total number of instances of Item-by-Subj is 3 x 2 x (3 x 2) x 4 x 10 = 1440. The design yields 3 x 2 x 4 x 10 x (3 x 2) = 1440 observations.
The examples illustrate some of the basic functionalities. The generalization to a larger number of fixed or random factors and number of levels associated with them should be clear.
The codes generated with the above specifications can be extended with different assignment of presentation orders according to latin.square
(default), random.order
, or williams
. These options will be described in the second vignette.
The function also allows the specifations of fixed effects, variance and correlation parameters to generate input suitable for linear (mixed) models and the determination of statistical power via simulations from the model. The third vignette is a tutorial about these functionalities.
The development of this package was supported by German Research Foundation (DFG)/SFB 1287 Limits of variability in language and Center for Interdisciplinary Research, Bielefeld (ZiF)/Cooperation Group Statistical models for psychological and linguistic data.
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] designr_0.1.12 dplyr_1.0.4
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 nloptr_1.2.2.2 pillar_1.5.0 bslib_0.2.4
## [5] compiler_4.0.3 jquerylib_0.1.3 tools_4.0.3 boot_1.3-27
## [9] digest_0.6.27 lme4_1.1-26 statmod_1.4.35 nlme_3.1-152
## [13] jsonlite_1.7.2 evaluate_0.14 lifecycle_1.0.0 tibble_3.0.6
## [17] lattice_0.20-41 pkgconfig_2.0.3 rlang_0.4.10 Matrix_1.3-2
## [21] rstudioapi_0.13 cli_2.3.1 DBI_1.1.1 yaml_2.2.1
## [25] xfun_0.21 stringr_1.4.0 knitr_1.31 generics_0.1.0
## [29] vctrs_0.3.6 sass_0.3.1 grid_4.0.3 tidyselect_1.1.0
## [33] glue_1.4.2 R6_2.5.0 fansi_0.4.2 rmarkdown_2.7
## [37] minqa_1.2.4 purrr_0.3.4 magrittr_2.0.1 ellipsis_0.3.1
## [41] htmltools_0.5.1.1 MASS_7.3-53.1 splines_4.0.3 assertthat_0.2.1
## [45] utf8_1.1.4 stringi_1.5.3 crayon_1.4.1