SAT.stage1.sampling: Pilot sampling by SGS
In xliu-stat/SAT: Surrogate Assisted Two-wave Case Boosting Sampling

Description Usage Arguments Details Value References Examples

View source: R/SAT.stage1.sampling.r

This function implements the stage 1 subsampling of SAT by SGS method.

1	SAT.stage1.sampling(r1, n, S, Rpar = 0.5)

`r1`	pilot subsample size.
`n`	total sample size.
`S`	a binary vector of length n. Surrogate observations for all samples.
`Rpar`	case proportion parameter. The recommended range is (0.3, 0.6), and default is 0.5.

The region of Rpar that corresponds to lower MSEs is (0.3, 0.5) for case prevalence (i.e., P(Y=1|X)) around 4%. To avoid failures in the estimation when the case prevalence is low and r1 is small, a slightly larger Rpar in (0.5, 0.6) can be used without compromising the performance of SAT. Using Rpar=0.5 is a safe choice for most situations.

The function returns a vector of patient index for whom the manual chart reviews are going to be collected.

Liu, X., Chubak, J., Hubbard, R. A. & Chen, Y. (2021). SAT: a Surrogate Assisted Two-wave case boosting sampling method, with application to EHR-based association studies.

library(SAT)
set.seed(0)
n <- 1e5
beta0  <- c(1/5, 0, 0, 1/2, rep(1/2, 4))
d <- length(beta0)

X <- rnorm(n*(d-1), -1.5, 1)
X <- matrix(X, nrow = n, ncol = d - 1)
X <- cbind(1, X)

P  <- 1 - 1 / (1 + exp(X %*% beta0))
Y  <- rbinom(n, 1, P)

a1 <- 0.85 # sensitivity
a2 <- 0.95 # specificity
pr_s <- vector(mode = "numeric", length = n)
pr_s <- a1*(Y==1) + (1-a2)*(Y==0)
S <- rbinom(n, 1, pr_s)

stage1.index <- SAT.stage1.sampling(r1 = 400, n = 1e5, S, Rpar = 0.5)
length(stage1.index)