This vignette demonstrates how to perform a power analysis using powerstan
. I compare the results obtained through simulation (powerstan
) to theoretical results calculated using the pwr
library. When power can be expressed as a closed-form expression then applying that formula is clearly the best way to calculate power. When there is no obvious formula for calculating power then simulation (powerstan
) is a useful tool. This vignette proceeds in three steps:
powerstan
,pwr
,To calculate power using powerstan
one must specify two things:
Imagine we are drawing outcomes from a normal distribution with known variance but unknown mean and we want to test whether the mean equals zero. Below I setup a Stan model for drawing observations from a normal distribution.
library(powerstan) model_code <- " parameters { real mu; real sigma; } model {} generated quantities { real y; y <- normal_rng(mu, sigma); } " model <- stan_model(model_code = model_code)
I create a function data_simulator
that creates a new data frame with n
observations assuming $y$ is normally distributed with mean d
and variance 1.
data_simulator <- function(d, n) { simulate_data( model = model, parameters = list(mu = d, sigma = 1), n = n ) }
Next I create a function called reject
that rejects the null hypothesis given a data frame data
and a significance level sig.level
if the mean outcome divided by its standard error falls outside the 95% confidence interval.
reject <- function(data, sig.level) { n <- nrow(data) zscore <- mean(data$y) * sqrt(n) low <- qnorm(sig.level / 2, mean = 0, sd = 1) high <- qnorm(1 - (sig.level / 2), mean = 0, sd = 1) return( (zscore < low) | (high < zscore) ) }
Finally, I create a function power1
to calculate the power of this test given the mean of the normal distribution d
under the alternative hypothesis, n
the number of observations to be collected, sig.level
the significance level of the test.
power1 <- function(d, n, sig.level) { calculate_power( function() data_simulator(d = d, n = n), function(data) reject(data = data, sig.level = sig.level) ) }
Using the dplyr
library I apply this function to twenty different sample sizes to calculate the power of the test using simulated data.
library(dplyr) grid <- expand.grid(d = 0.5, n = (1:20) * 5, sig.level = 0.05) grid <- grid %>% rowwise() %>% mutate(simulation = power1(d = d, n = n, sig.level = sig.level)) %>% ungroup()
Using the pwr
library is a lot easier. pwr
comes equipped with a pwr.norm.test
that can be applied out of the box.
library(pwr) power2 <- function(...) { x <- pwr.norm.test(...) return(x$power) } grid <- grid %>% rowwise() %>% mutate(theory = power2(d = d, n = n, sig.level = sig.level)) %>% ungroup()
It's great to see the simulated results are lining up closely with the theoretical results.
library(ggplot2) ggplot(grid) + geom_line(aes(x = n, y = theory, color = "Theoretical")) + geom_point(aes(x = n, y = simulation, color = "Simulated")) + xlab("Sample Size") + ylab("Power") + ggtitle("Comparing Simulated Power Estimates to Theory") + theme_bw() + labs(color = "")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.