Description Usage Arguments Value References Examples
This function returns the sieve maximum likelihood estimators (SMLE) for the logistic regression model from Lotspeich et al. (2021).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Y_unval |
Column name of the error-prone or unvalidated continuous outcome. Subjects with missing values of |
Y |
Column name that stores the validated value of |
X_unval |
Column name(s) with the unvalidated predictors. If |
X |
Column name(s) with the validated predictors. If |
Z |
(Optional) Column name(s) with additional error-free covariates. |
Bspline |
Vector of column names containing the B-spline basis functions. |
data |
A dataframe with one row per subject containing columns: |
theta_pred |
Vector of columns in |
gamma_pred |
Vector of columns in |
initial_lr_params |
Initial values for parametric model parameters. Choices include (1) |
hn_scale |
Size of the perturbation used in estimating the standard errors via profile likelihood. If none is supplied, default is |
noSE |
Indicator for whether standard errors are desired. Defaults to |
TOL |
Tolerance between iterations in the EM algorithm used to define convergence. |
MAX_ITER |
Maximum number of iterations in the EM algorithm. The default number is |
verbose |
If |
coeff |
dataframe with final coefficient and standard error estimates (where applicable) for the analysis model. |
outcome_err_coeff |
dataframe with final coefficient estimates for the outcome error model. |
Bspline_coeff |
dataframe with final B-spline coefficient estimates (where applicable). |
vcov |
variance-covarianced matrix for |
converged |
indicator of EM algorithm convergence for parameter estimates. |
se_converged |
indicator of standard error estimate convergence. |
converged_msg |
(where applicable) description of non-convergence. |
iterations |
number of iterations completed by EM algorithm to find parameter estimates. |
od_loglik_at_conv |
value of the observed-data log-likelihood at convergence. |
Lotspeich, S. C., Shepherd, B. E., Amorim, G. G. C., Shaw, P. A., & Tao, R. (2021). Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort. Biometrics, biom.13512. https://doi.org/10.1111/biom.13512
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | set.seed(918)
# Set sample sizes ----------------------------------------
N <- 1000 # Phase-I = N
n <- 250 # Phase-II/audit size = n
# Generate true values Y, Xb, Xa --------------------------
Xa <- rbinom(n = N, size = 1, prob = 0.25)
Xb <- rbinom(n = N, size = 1, prob = 0.5)
Y <- rbinom(n = N, size = 1,prob = (1 + exp(-(- 0.65 - 0.2 * Xb - 0.1 * Xa))) ^ (- 1))
# Generate error-prone Xb* from error model P(Xb*|Xb,Xa) --
sensX <- specX <- 0.75
delta0 <- - log(specX / (1 - specX))
delta1 <- - delta0 - log((1 - sensX) / sensX)
Xbstar <- rbinom(n = N, size = 1,
prob = (1 + exp(- (delta0 + delta1 * Xb + 0.5 * Xa))) ^ (- 1))
# Generate error-prone Y* from error model P(Y*|Xb*,Y,Xb,Xa)
sensY <- 0.95
specY <- 0.90
theta0 <- - log(specY / (1 - specY))
theta1 <- - theta0 - log((1 - sensY) / sensY)
Ystar <- rbinom(n = N, size = 1,
prob = (1 + exp(- (theta0 - 0.2 * Xbstar + theta1 * Y - 0.2 * Xb - 0.1 * Xa))) ^ (- 1))
## V is a TRUE/FALSE vector where TRUE = validated --------
V <- seq(1, N) %in% sample(x = seq(1, N), size = n, replace = FALSE)
# Build dataset --------------------------------------------
sdat <- cbind(Y, Xb, Ystar, Xbstar, Xa)
# Make Phase-II variables Y, Xb NA for unaudited subjects ---
sdat[!V, c("Y", "Xb")] <- NA
# Fit models -----------------------------------------------
## Naive model -----------------------------------------
naive <- glm(Ystar ~ Xbstar + Xa, family = "binomial", data = data.frame(sdat))
## Generalized raking ----------------------------------
### Influence function for logistic regression
### Taken from: https://github.com/T0ngChen/multiwave/blob/master/sim.r
inf.fun <- function(fit) {
dm <- model.matrix(fit)
Ihat <- (t(dm) %*% (dm * fit$fitted.values * (1 - fit$fitted.values))) / nrow(dm)
## influence function
infl <- (dm * resid(fit, type = "response")) %*% solve(Ihat)
infl
}
naive_infl <- inf.fun(naive) # error-prone influence functions based on naive model
colnames(naive_infl) <- paste0("if", 1:3)
# Add naive influence functions to sdat -----------------------------------------------
sdat <- cbind(id = 1:N, sdat, naive_infl)
### Construct B-spline basis -------------------------------
### Since Xb* and Xa are both binary, reduces to indicators --
nsieve <- 4
B <- matrix(0, nrow = N, ncol = nsieve)
B[which(Xa == 0 & Xbstar == 0), 1] <- 1
B[which(Xa == 0 & Xbstar == 1), 2] <- 1
B[which(Xa == 1 & Xbstar == 0), 3] <- 1
B[which(Xa == 1 & Xbstar == 1), 4] <- 1
colnames(B) <- paste0("bs", seq(1, nsieve))
sdat <- cbind(sdat, B)
smle <- logistic2ph(Y_unval = "Ystar",
Y = "Y",
X_unval = "Xbstar",
X = "Xb",
Z = "Xa",
Bspline = colnames(B),
data = sdat,
noSE = FALSE,
MAX_ITER = 1000,
TOL = 1E-4)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.