logistic2ph: Sieve maximum likelihood estimator (SMLE) for two-phase...

Description Usage Arguments Value References Examples

View source: R/logistic2ph.R

Description

This function returns the sieve maximum likelihood estimators (SMLE) for the logistic regression model from Lotspeich et al. (2021).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
logistic2ph(
  Y_unval = NULL,
  Y = NULL,
  X_unval = NULL,
  X = NULL,
  Z = NULL,
  Bspline = NULL,
  data = NULL,
  theta_pred = NULL,
  gamma_pred = NULL,
  initial_lr_params = "Zeros",
  hn_scale = 1,
  noSE = FALSE,
  TOL = 1e-04,
  MAX_ITER = 1000,
  verbose = FALSE
)

Arguments

Y_unval

Column name of the error-prone or unvalidated continuous outcome. Subjects with missing values of Y_unval are omitted from the analysis. If Y_unval is null, the outcome is assumed to be error-free.

Y

Column name that stores the validated value of Y_unval in the second phase. Subjects with missing values of Y are considered as those not selected in the second phase. This argument is required.

X_unval

Column name(s) with the unvalidated predictors. If X_unval and X are null, all predictors are assumed to be error-free.

X

Column name(s) with the validated predictors. If X_unval and X are NULL, all predictors are assumed to be error-free.

Z

(Optional) Column name(s) with additional error-free covariates.

Bspline

Vector of column names containing the B-spline basis functions.

data

A dataframe with one row per subject containing columns: Y_unval, Y, X_unval, X, Z, and Bspline.

theta_pred

Vector of columns in data that pertain to the predictors in the analysis model.

gamma_pred

Vector of columns in data that pertain to the predictors in the outcome error model.

initial_lr_params

Initial values for parametric model parameters. Choices include (1) "Zeros" (non-informative starting values) or (2) "Complete-data" (estimated based on validated subjects only)

hn_scale

Size of the perturbation used in estimating the standard errors via profile likelihood. If none is supplied, default is hn_scale = 1.

noSE

Indicator for whether standard errors are desired. Defaults to noSE = FALSE.

TOL

Tolerance between iterations in the EM algorithm used to define convergence.

MAX_ITER

Maximum number of iterations in the EM algorithm. The default number is 1000. This argument is optional.

verbose

If TRUE, then show details of the analysis. The default value is FALSE.

Value

coeff

dataframe with final coefficient and standard error estimates (where applicable) for the analysis model.

outcome_err_coeff

dataframe with final coefficient estimates for the outcome error model.

Bspline_coeff

dataframe with final B-spline coefficient estimates (where applicable).

vcov

variance-covarianced matrix for coeff (where applicable).

converged

indicator of EM algorithm convergence for parameter estimates.

se_converged

indicator of standard error estimate convergence.

converged_msg

(where applicable) description of non-convergence.

iterations

number of iterations completed by EM algorithm to find parameter estimates.

od_loglik_at_conv

value of the observed-data log-likelihood at convergence.

References

Lotspeich, S. C., Shepherd, B. E., Amorim, G. G. C., Shaw, P. A., & Tao, R. (2021). Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort. Biometrics, biom.13512. https://doi.org/10.1111/biom.13512

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
 set.seed(918)
 
 # Set sample sizes ----------------------------------------
 N <- 1000 # Phase-I = N
 n <- 250 # Phase-II/audit size = n
 
 # Generate true values Y, Xb, Xa --------------------------
 Xa <- rbinom(n = N, size = 1, prob = 0.25)
 Xb <- rbinom(n = N, size = 1, prob = 0.5)
 Y <- rbinom(n = N, size = 1,prob = (1 + exp(-(- 0.65 - 0.2 * Xb - 0.1 * Xa))) ^ (- 1))
 
 # Generate error-prone Xb* from error model P(Xb*|Xb,Xa) --
 sensX <- specX <- 0.75
 delta0 <- - log(specX / (1 - specX))
 delta1 <- - delta0 - log((1 - sensX) / sensX)
 Xbstar <- rbinom(n = N, size = 1,
                  prob = (1 + exp(- (delta0 + delta1 * Xb + 0.5 * Xa))) ^ (- 1))
 
 # Generate error-prone Y* from error model P(Y*|Xb*,Y,Xb,Xa)
 sensY <- 0.95
 specY <- 0.90
 theta0 <- - log(specY / (1 - specY))
 theta1 <- - theta0 - log((1 - sensY) / sensY)
 Ystar <- rbinom(n = N, size = 1,
   prob = (1 + exp(- (theta0 - 0.2 * Xbstar + theta1 * Y - 0.2 * Xb - 0.1 * Xa))) ^ (- 1))
 
 
 ## V is a TRUE/FALSE vector where TRUE = validated --------
 V <- seq(1, N) %in% sample(x = seq(1, N), size = n, replace = FALSE)
 
 
 # Build dataset --------------------------------------------
 sdat <- cbind(Y, Xb, Ystar, Xbstar, Xa)
 # Make Phase-II variables Y, Xb NA for unaudited subjects ---
 sdat[!V, c("Y", "Xb")] <- NA
 
 # Fit models -----------------------------------------------
 ## Naive model -----------------------------------------
 naive <- glm(Ystar ~ Xbstar + Xa, family = "binomial", data = data.frame(sdat))
 
 
 ## Generalized raking ----------------------------------
 ### Influence function for logistic regression
 ### Taken from: https://github.com/T0ngChen/multiwave/blob/master/sim.r
 inf.fun <- function(fit) {
   dm <- model.matrix(fit)
   Ihat <- (t(dm) %*% (dm * fit$fitted.values * (1 - fit$fitted.values))) / nrow(dm)
   ## influence function
   infl <- (dm * resid(fit, type = "response")) %*% solve(Ihat)
   infl
 }
 naive_infl <- inf.fun(naive) # error-prone influence functions based on naive model
 colnames(naive_infl) <- paste0("if", 1:3)
 
 # Add naive influence functions to sdat -----------------------------------------------
 sdat <- cbind(id = 1:N, sdat, naive_infl)
 
 ### Construct B-spline basis -------------------------------
 ### Since Xb* and Xa are both binary, reduces to indicators --
 nsieve <- 4
 B <- matrix(0, nrow = N, ncol = nsieve)
 B[which(Xa == 0 & Xbstar == 0), 1] <- 1
 B[which(Xa == 0 & Xbstar == 1), 2] <- 1
 B[which(Xa == 1 & Xbstar == 0), 3] <- 1
 B[which(Xa == 1 & Xbstar == 1), 4] <- 1
 colnames(B) <- paste0("bs", seq(1, nsieve))
 sdat <- cbind(sdat, B)
 smle <- logistic2ph(Y_unval = "Ystar",
   Y = "Y",
   X_unval = "Xbstar",
   X = "Xb",
   Z = "Xa",
   Bspline = colnames(B),
   data = sdat,
   noSE = FALSE,
   MAX_ITER = 1000,
   TOL = 1E-4)

Epic-Doughnut/Combined-Reg documentation built on Dec. 17, 2021, 6:32 p.m.