# Simulate Correlated Variables In faux: Simulation for Factorial Designs

```knitr::opts_chunk\$set(
collapse = TRUE,
comment = "#>",
out.width = "100%"
)
ggplot2::theme_set(ggplot2::theme_bw())
set.seed(8675309)
```
```library(ggplot2)
library(dplyr)
library(tidyr)
library(faux)
```

The `rnorm_multi()` function makes multiple normally distributed vectors with specified parameters and relationships.

## Quick example

For example, the following creates a sample that has 100 observations of 3 variables, drawn from a population where A has a mean of 0 and SD of 1, while B and C have means of 20 and SDs of 5. A correlates with B and C with r = 0.5, and B and C correlate with r = 0.25.

```dat <- rnorm_multi(n = 100,
mu = c(0, 20, 20),
sd = c(1, 5, 5),
r = c(0.5, 0.5, 0.25),
varnames = c("A", "B", "C"),
empirical = FALSE)
```

`r get_params(dat) %>% knitr::kable()` Table: Sample stats

### Specify correlations {#spec_r}

You can specify the correlations in one of four ways:

• A single r for all pairs
• A vars by vars matrix
• A vars*vars length vector
• A vars*(vars-1)/2 length vector

#### One Number

If you want all the pairs to have the same correlation, just specify a single number.

```bvn <- rnorm_multi(100, 5, 0, 1, .3, varnames = letters[1:5])
```

`r get_params(bvn) %>% knitr::kable()` Table: Sample stats from a single rho

#### Matrix

If you already have a correlation matrix, such as the output of `cor()`, you can specify the simulated data with that.

```cmat <- cor(iris[,1:4])
bvn <- rnorm_multi(100, 4, 0, 1, cmat,
varnames = colnames(cmat))
```

`r get_params(bvn) %>% knitr::kable()` Table: Sample stats from a correlation matrix

#### Vector (vars*vars)

You can specify your correlation matrix by hand as a vars*vars length vector, which will include the correlations of 1 down the diagonal.

```cmat <- c(1, .3, .5,
.3, 1, 0,
.5, 0, 1)
bvn <- rnorm_multi(100, 3, 0, 1, cmat,
varnames = c("first", "second", "third"))
```

`r get_params(bvn) %>% knitr::kable()` Table: Sample stats from a vars*vars vector

#### Vector (vars*(vars-1)/2)

You can specify your correlation matrix by hand as a vars*(vars-1)/2 length vector, skipping the diagonal and lower left duplicate values.

```rho1_2 <- .3
rho1_3 <- .5
rho1_4 <- .5
rho2_3 <- .2
rho2_4 <- 0
rho3_4 <- -.3
cmat <- c(rho1_2, rho1_3, rho1_4, rho2_3, rho2_4, rho3_4)
bvn <- rnorm_multi(100, 4, 0, 1, cmat,
varnames = letters[1:4])
```

`r get_params(bvn) %>% knitr::kable()` Table: Sample stats from a (vars*(vars-1)/2) vector

### empirical

If you want your samples to have the exact correlations, means, and SDs you entered, set `empirical` to TRUE.

```bvn <- rnorm_multi(100, 5, 0, 1, .3,
varnames = letters[1:5],
empirical = T)
```

`r get_params(bvn) %>% knitr::kable()` Table: Sample stats with empirical = TRUE

## Pre-existing variables

Us `rnorm_pre()` to create a vector with a specified correlation to one or more pre-existing variables. The following code creates a new column called `B` with a mean of 10, SD of 2 and a correlation of r = 0.5 to the `A` column.

```dat <- rnorm_multi(varnames = "A") %>%
mutate(B = rnorm_pre(A, mu = 10, sd = 2, r = 0.5))
```
```get_params(dat) %>% knitr::kable(digits = 3)
```

Set `empirical = TRUE` to return a vector with the exact specified parameters.

```dat\$C <- rnorm_pre(dat\$A, mu = 10, sd = 2, r = 0.5, empirical = TRUE)
```
```get_params(dat) %>% knitr::kable(digits = 3)
```

You can also specify correlations to more than one vector by setting the first argument to a data frame containing only the continuous columns and r to the correlation with each column.

```dat\$D <- rnorm_pre(dat, r = c(.1, .2, .3), empirical = TRUE)
```
```get_params(dat) %>% knitr::kable(digits = 3)
```

Not all correlation patterns are possible, so you'll get an error message if the correlations you ask for are impossible.

```dat\$E <- rnorm_pre(dat, r = .9)
```

## Try the faux package in your browser

Any scripts or data that you put into this service are public.

faux documentation built on April 20, 2023, 9:13 a.m.