SAT.stage1.sampling: Pilot sampling by SGS

Description Usage Arguments Details Value References Examples

View source: R/SAT.stage1.sampling.r

Description

This function implements the stage 1 subsampling of SAT by SGS method.

Usage

1
SAT.stage1.sampling(r1, n, S, Rpar = 0.5)

Arguments

r1

pilot subsample size.

n

total sample size.

S

a binary vector of length n. Surrogate observations for all samples.

Rpar

case proportion parameter. The recommended range is (0.3, 0.6), and default is 0.5.

Details

The region of Rpar that corresponds to lower MSEs is (0.3, 0.5) for case prevalence (i.e., P(Y=1|X)) around 4%. To avoid failures in the estimation when the case prevalence is low and r1 is small, a slightly larger Rpar in (0.5, 0.6) can be used without compromising the performance of SAT. Using Rpar=0.5 is a safe choice for most situations.

Value

The function returns a vector of patient index for whom the manual chart reviews are going to be collected.

References

Liu, X., Chubak, J., Hubbard, R. A. & Chen, Y. (2021). SAT: a Surrogate Assisted Two-wave case boosting sampling method, with application to EHR-based association studies.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
library(SAT)
set.seed(0)
n <- 1e5
beta0  <- c(1/5, 0, 0, 1/2, rep(1/2, 4))
d <- length(beta0)

X <- rnorm(n*(d-1), -1.5, 1)
X <- matrix(X, nrow = n, ncol = d - 1)
X <- cbind(1, X)

P  <- 1 - 1 / (1 + exp(X %*% beta0))
Y  <- rbinom(n, 1, P)

a1 <- 0.85 # sensitivity
a2 <- 0.95 # specificity
pr_s <- vector(mode = "numeric", length = n)
pr_s <- a1*(Y==1) + (1-a2)*(Y==0)
S <- rbinom(n, 1, pr_s)

stage1.index <- SAT.stage1.sampling(r1 = 400, n = 1e5, S, Rpar = 0.5)
length(stage1.index)

xliu-stat/SAT documentation built on Dec. 23, 2021, 7:10 p.m.