Setup

knitr::opts_chunk$set(echo = TRUE)

library(dplyr)
library(designr)

# Set a "seed" for the random numnber generator
set.seed(12345)  

Experimental Design(s)

In an experimental design, we distinguish between random and fixed factors. The "levels" of the random factors are quasi-random samples from a population of persons (subjects) or material (items). To avoid confusion with levels of fixed factors we will refer to levels of random factors as instances. For fixed factors, usual (quasi-)experimental ones, we must specify whether they are between- or within-subjects and between- or within-items. For a given fixed factor all four combinations are possible in principle. We also need to decide on a counterbalancing scheme; a common example is a Latin square applied to all or a subset of the factors.

In this vignette, we illustrate how to set up an experiment using subject (Subj) and item (Item) as random factors. In this fictive experiment, the words of a text are presented serially one at a time at a slow, medium, or high rate (i.e., fixed factor Speed with three levels) at the center of the screen. A second factor is cognitive load varying whether subjects have keep six digits in memory while reading or not (i.e., fixed factor Load with two levels yes and no).

Typically, in such an experiment, (1) each subject reads different texts (Item) in the 2 x 3 experimental conditions; (2) each subject reads the same number of texts in each condition; (3) across subjects each text (item) is presented equally often in the six experimental conditions. The final example in this vignette implements this design. However, for didactic reasons, we first show how within/between-subject and within/between-item features of the factors are specified without counterbalancing. These designs are preferred if repeated exposure to the same stimuli in an experimental condition does not have any confounding effects on the measure.

Complete within-subject and within-item design; no counterbalancing

In the first version of the experimental design, each subject sees each text in each of 2 x 3 experimental conditions. Thus, the two fixed factors Speed and Load are both within-subject and within-item. A minimum of six subjects and six texts is required for a complete within-subject/within-item design. We summarize the design with the design formula:

Load(2) x Speed(3) x 6 Item x 6 Subj

This is a completely crossed design. Note the difference between specification of levels for fixed and instances for random factors. The product of numbers in the formula informs about the number of observations generated by the design. In this case: 216 observations.

design1 <- 
  fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
  fixed.factor("Load",  levels=c("yes", "no")) +
  random.factor("Subj", instances=6) +    
  random.factor("Item", instances=6)    


codes1 <- arrange(design.codes(design1), Subj, Item)[c(3, 4, 2, 1)]
codes1
tail(codes1, 10)

#xtabs( ~ Subj + Item + Load + Speed, codes1)
xtabs(~ Load + Speed, codes1)
xtabs(~ Subj + Load + Speed, codes1)
xtabs(~ Item + Load + Speed, codes1)

The first command generates the list design1. The function design.codes() extracts the generated variable coding as a dataframe in the tibble format. After resorting and rearranging the variables, the code is converted to the long format (i.e, N=216). Obviously, having subjects read each text six times may lead to practice effects that would need to be taken into account by counterbalancing the order in which texts are presented across subjects.

Speed within-subject/within-item, Text within-subject/between-item; no counterbalancing

In the second example, we replace the factor Load with a factor Type of text. We assume that Items 1 to 3 are simple texts and items 4 to 6 are complex texts. Subjects read both simple and complex texts; Type of text is a within-subject factor. Each text (item), however is either simple or complex. Thus, Type is a between-item factor in this design.

Such a design is realized by specifying Type with the groups argument in the corresponding random.factor() command. We generate 3 items (instances) within each of the two levels of the factor Type, that is, as in the first example, we will have again six different items.

Design formula:

Type(2) x Speed(3) x 3 Item[Type] x 6 Subj

We read the item-part of this formula: "3 Items nested under levels of Type." The total number of different instances for the random factor Item is 3 items x 2 levels of Type, that is 6 items. The design generates 108 observations; it is no longer completely crossed.

design2 <- fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
  fixed.factor("Type",  levels=c("simple", "complex")) +
  random.factor("Subj", instances=6) +   
  random.factor("Item", groups="Type", instances=3)

codes2 <- arrange(design.codes(design2), Subj, Item)[c(3, 4, 1, 2)]
codes2

xtabs(~ Item + Type, codes2)
xtabs(~ Subj + Type, codes2)

#xtabs( ~ Subj + Item + Type + Speed, codes2)
#xtabs(~ Type + Speed, codes2)
#xtabs(~ Subj + Type + Speed, codes2)
#xtabs(~ Item + Type + Speed, codes2)

The tables shows that for Items 1 to 3 all available codes for the factor Type are complex and for Items 4 to 6 all codes are simple. Thus, Type is varied between items. Each item is read three times (three levels of Speed) by six subjects. yielding 18 codes in each of the 6 non-zero cells of the Item x Type table.

Conversely, for all six subjects codes are available for simple and complex items. Thus, Type is varied within subjects. Each text is read three times (i.e., the three speed rates). Therefore, there are 3 texts x 3 levels of speed = 9 codes in each cell of the Subj x Type table.

The command to specify Speed as between_item factor would be:

random.factor("Item", groups="Speed", instances=2)
```
We need 2 instances within each of the 3 levels of _Speed_ to obtain 6 items in total. 

**Design formula:**
```
Type(2) x Speed(3) x 2 Item[Speed] x 6 Subj

The total number of items is 2 x 3 = 6. This design generates 72 observations.

Age between-subject/within-item, Speed within-subject/within-item; no counterbalancing

In this example, we replace the factor Load (or Type) with a between-subject factor Age, assuming that half the subjects are young and the other half old.

design3 <- 
  fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
  fixed.factor("Age",  levels=c("young", "old")) +
  random.factor("Item", instances=6) +
  random.factor("Subj", groups="Age", instances=3) 

codes3 <- arrange(design.codes(design3), Subj, Item)[c(4, 3, 2, 1)]
codes3

xtabs(~ Subj + Age, codes3)
xtabs(~ Item + Age, codes3)

#xtabs( ~ Subj + Item + Age + Speed, codes3)
#xtabs( ~ Subj + Age + Speed, codes3)

The tables show that subjects 1 to 3 are old and subjects 4 to 6 are young (i.e., Age is a between-subject factor) and that all items are read by young and old subjects (i.e., Age is a within-item factor). The formula for this design can be written as: Age(2) x Speed(3) x 6 Item x 3 Subj[Age], yielding 108 observations.

Note that instances specifies the number of instances within groups. To generate code for 25 young and 25 old subjects (i.e., total N=50), we set instances=25.

Design formula:

Age(2) x Speed(3) x 6 Item x 25 Subj[Age] 

The total number of subjects is 25 x 2 = 50. This design generates 900 observations.

Age between-subject/within-item, Speed between-subject/within-item; no counterbalancing

Continuing with the last example, it may also make sense to vary not only Age, but als Speed between subjects. Thus, every subject is either old or young (i.e., a quasi-experimental factor) and is randomly assigned to one of the three Speed conditions (i.e., an experimental factor). For this specification the two factors are included as a vector for the groups argument. For the minimal design we need only 1 instance because 2 x 3 = 6. This means we generate codes for 1 subject in each of the six design cells, but each subjects reads each text in this condition (i.e., there are six measures for each subject.) To get code for 10 subjects in each of the 2 x 3 = 6 design cells (i.e., a total of 60 subjects), we set instances=10.

Design formula:

Age(2) x Speed(3) x 6 Item x 10 Subj[Age x Speed]

The total number of subjects is 10 x 2 x 3 = 60. This design generates 360 observations.

design4 <- 
  fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
  fixed.factor("Age",  levels=c("simple", "complex")) +
  random.factor("Subj", groups=c("Age", "Speed"), instances=10) +
  random.factor("Item", instances=6)   

codes4 <- arrange(design.codes(design4), Subj, Item)[c(3, 4, 2, 1)]
codes4

xtabs( ~ Subj + Age, codes4)
xtabs( ~ Subj + Speed, codes4)
xtabs( ~ Item + Age, codes4)
xtabs( ~ Item + Speed, codes4)

#xtabs( ~ Subj + Item + Age + Speed, codes4)

The tables show that Age and Speed vary indeed between subjects and within items.

Counterbalancing Speed and Load

In this final example, we modify the very first example such that each subject reads one different texts in each of the six conditions, respecting the constraint that design cells are counterbalanced (i.e., each text is read equally often in each condition, each subject reads the same number of texts in each condition).

For this implementation we (1) add a third random factor defined as Subj-by-Item and (2) specify factors Speed and Load as varying between Subj-by-Item.

We start with the minimal design of 6 subjects reading 6 texts.

Design formula:

Speed(3) x Load(2) x 1 Item[Speed x Load] x 1 Subj[Speed x Load] x 
(3 x 2) Item-by-Subj[Speed x Load x Item[Speed x Load] + Subj[Speed x Load]] 
````

We have 1 item and 1 subject nested under the levels of the _Speed_ x _Load_ design. There are 36 instances of the random factor resulting from the multiplication of the random factors _Item_ and _Subj_. The design generates 3 x 2 x 1 x 1 x (3 x 2) **36 observations**.

```r
design5 <- 
  fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
  fixed.factor("Load",  levels=c("simple", "complex")) +
  random.factor("Subj", instances=1) + 
  random.factor("Item", instances=1) +
  random.factor(c("Subj", "Item"), groups=c("Speed", "Load"))


codes5 <- arrange(design.codes(design5), Subj, Item)[c(3, 4, 1, 2)]
codes5

xtabs(~ Subj + Speed + Load, codes5)
xtabs(~ Item + Speed + Load, codes5)

xtabs( ~ Subj + Item + Load + Speed, codes1)

Number of subjects and items increase by six with each increment of the value of the instances argument. For example,

  ...
  random.factor("Subj", instances=10) + 
  random.factor("Item", instances= 4) +
  ...

will generate codes for 60 subjects and 24 texts.

design6 <- 
  fixed.factor("Speed", levels=c("slow", "medium", "fast")) +
  fixed.factor("Load",  levels=c("simple", "complex")) +
  random.factor("Subj", instances=10) + 
  random.factor("Item", instances=4) +
  random.factor(c("Subj", "Item"), groups=c("Speed", "Load"))


codes6 <- arrange(design.codes(design6), Subj, Item)[c(3, 4, 1, 2)]
codes6
length(unique(codes6$Subj))
length(unique(codes6$Item))
length(unique(paste(codes6$Subj, codes6$Item)))

Design formula:

Speed(3) x Load(2) x 4 Item[Speed x Load] x 10 Subj[Speed x Load] x 
(3 x 2) Item-by-Subj[Speed x Load x Item[Speed x Load]  x  Subj[Speed x Load]] 
````

The total number of items is 4 x 3 x 2 = 24; the total number of subjects is 10 x 3 x 2 = 60. The total number of instances of _Item-by-Subj_ is 3 x 2 x (3 x 2) x 4 x 10 = 1440. The design yields 3 x 2 x 4 x 10 x (3 x 2) = **1440 observations**.


# Outlook

The examples illustrate some of the basic functionalities. The generalization to a larger number of fixed or random factors and number of levels associated with them should be clear.

The codes generated with the above specifications can be extended with different assignment of presentation orders according to `latin.square` (default),  `random.order`, or `williams`.  These options will be described in the second vignette.  

The function also allows the specifations of fixed effects, variance and correlation parameters to generate input suitable for linear (mixed) models and the determination of statistical power via simulations from the model. The third vignette is a tutorial about these functionalities.

# Appendix

## Acknowledgement

The development of this package was supported by German Research Foundation (DFG)/SFB 1287 _Limits of variability in language_ and Center for Interdisciplinary Research, Bielefeld (ZiF)/Cooperation Group _Statistical models for psychological and linguistic data_.

## Packages

```r
sessionInfo()


mmrabe/designr documentation built on May 12, 2023, 9:37 p.m.