R/get.min.size.R
In LiftTest: A Bootstrap Proportion Test for Brand Lift Testing

Documented in get.min.size

#' A Bootstrap Proportion Test for Brand Lift Testing (Liu et al., 2023)
#' @description This function generates the minimum sample size required 
#' to obtain a statistically significant result for a given power.
#' For more details, please refer to the paper Liu et al., (2023).
#' 
#' @import stats
#' @usage
#' get.min.size(p1, p2, p_treat, method='relative', power=0.8, alpha=0.05)
#' @param p1 success probability of the control group
#' @param p2 success probability of the treatment group
#' @param p_treat the percentage of the treatment group
#' @param method two methods are provided: method = 
#' c(\eqn{\texttt{`relative'}}, \eqn{\texttt{`absolute'}}).
#' \eqn{\texttt{`relative'}} means min sample size based on the relative lift.
#' \eqn{\texttt{`absolute'}} means min sample size based on the absolute lift.
#' @param power the power you want to achieve. Industry
#' standard is power = 0.8, which is also the default value
#' @param alpha significance level. By default alpha = 0.05
#' 
#' @return
#' Return the required minimum sample size. This is the 
#' total sample size of control group + treatment group
#' @details 
#' 
#' The minimum required sample size is approximated by the asymptotic 
#' power function. 
#'  Let \eqn{N = n_1 + n_2} and \eqn{\kappa = n_1/N}. We define
#'  \deqn{
#'      \sigma_{a,n} = \sqrt{n_1^{-1}p_1(1-p_1) + n_2^{-1}p_2(1-p_2)},
#'  }
#'  \deqn{
#'      \bar\sigma_{a,n} = \sqrt{(n_1^{-1} +  n_2^{-1})\bar p(1-\bar p)}.
#'  }
#' where \eqn{\bar p = \kappa p_1 + (1-\kappa) p_2}. \eqn{\sigma_{a,n}}
#' is the standard deviation of the absolute lift and 
#' \eqn{\bar\sigma_{a,n}} can be viewed as the standard deviation of 
#' the combined sample of the control and treatment groups.
#' Let \eqn{\delta_a = p_2 - p_1} be the absolute lift.
#' The asymptotic power function based on the absolute lift is given by
#' \deqn{
#'     \beta_{Absolute}(\delta_a) \approx \Phi\left( -cz_{\alpha/2} + 
#'     \frac{\delta_a}{\sigma_{a,n}} \right) + \Phi\left( -cz_{\alpha/2} - 
#'     \frac{\delta_a}{\sigma_{a,n}} \right).
#' }
#' The asymptotic power function based on the relative lift is given by
#' \deqn{
#'     \beta_{Relative}(\delta_a) \approx \Phi 
#'     \left( -cz_{\alpha/2} \frac{p_0}{\bar p} + 
#'     \frac{\delta_a}{\sigma_{a,n}} \right) + 
#'     \Phi \left( -cz_{\alpha/2} \frac{p_0}{\bar p} - 
#'     \frac{\delta_a}{\sigma_{a,n}} \right),
#' }
#' 
#' where \eqn{\Phi(\cdot)} is the CDF of the standard normal distribution \eqn{N(0,1)}, 
#' \eqn{z_{\alpha/2}} is the upper \eqn{(1-\alpha/2)} quantile of \eqn{N(0,1)}, 
#' and \eqn{c = {\bar\sigma_{a,n}}/\sigma_{a,n}}.
#' 
#' Given a power (say power=0.80), it is difficult to get a closed form of the
#' minimum sample size. Note that when \eqn{\delta_a > 0}, the first term of 
#' the power function dominates the second term, so we can ignore the second
#' term and derive the closed form for the minimum sample size. Similarly, 
#' when \eqn{\delta_a < 0}, the second term of the power function dominates 
#' the first term, so we can ignore the first term. In particular, the closed
#' form for the minimum sample size is given by 
#' 
#' \deqn{
#'     N_{Relative} = \left( \frac{p_1(1-p_1)}{\kappa} + \frac{p_2(1-p_2)}{(1-\kappa)} \right) \left( \Phi^{-1}(\beta)p_1/\bar p + cz_{\alpha/2} \right)^2 / \delta_a^2,
#' }
#' \deqn{
#'     N_{Absolute} = \left( \frac{p_1(1-p_1)}{\kappa} + \frac{p_2(1-p_2)}{(1-\kappa)} \right) \left( \Phi^{-1}(\beta) + cz_{\alpha/2} \right)^2 / \delta_a^2.
#' }
#' @md
#' 
#' @export
#' @references
#' Wanjun Liu, Xiufan Yu, Jialiang Mao, Xiaoxu Wu, and Justin Dyer. 2023.
#' Quantifying the Effectiveness of Advertising: A Bootstrap Proportion Test
#' for Brand Lift Testing. \emph{In Proceedings of the 32nd ACM International Conference 
#' on Information and Knowledge Management (CIKM ’23)}
#' 
#' @examples
#' p1 <- 0.1; p2 <- 0.2
#' get.min.size(p1, p2, p_treat=0.5, method='relative', power=0.8, alpha=0.05)


get.min.size <- function(p1, p2, p_treat, method='relative', power=0.8, alpha=0.05){
  
  kappa <- 1 - p_treat
  z <- qnorm(1 - alpha/2, 0, 1)
  delta <- p2 - p1
  num_c <- (kappa*p1 + (1-kappa)*p2) * (1 - kappa*p1 - (1-kappa)*p2)
  denom_c <- p1*(1-p1)*(1-kappa) + p2*(1-p2)*kappa
  c <- sqrt(num_c / denom_c)
  z_beta <- qnorm(power, 0, 1)
  p_bar <- kappa*p1 + (1-kappa)*p2
  
  if (method == 'relative'){
    ratio <- p1 / p_bar
  } else if (method == 'absolute'){
    ratio <- 1
  } else {
    stop(paste0("No such method ", method, "!"))
  }
  min_sample_size <- (p1*(1-p1)/kappa + p2*(1-p2)/(1-kappa)) * (c*z*ratio + z_beta)^2 / (delta^2)
  
  return(ceiling(min_sample_size))
}