reinforced_VS: Reinforced risk prediction with budget constraint, variable...
In Yinghao-Pan/reinforcedPred: Reinforced Risk Prediction with Budget Constraint

Description Usage Arguments Value Examples

reinforced_VS implements a cross-validation approach to find an optimal τ such that the misclassification error is minimized under a certain budget constraint. This function is used when the baseline covariates are of high-dimension.

1 2	reinforced_VS(Y, X, Z, budget, folds, startT, pve = 0.99, nbasis = 10, weight)

`Y`	The outcome variable, vector of length n, taking values in {1, 0, NA}, where 1 = disease, 0 = not, NA = missing.
`X`	Observed longitudinal biomarker, matrix of n by nTotal, where nTotal denotes the total number of time grids. Missing values are denoted by NA.
`Z`	Other baseline covariates.
`budget`	The budget constraint. For instance, if the time grids are {0,1/60,2/60,...,1}. Budget = 30 means that the average follow up was no longer than 30 time grids. This is equivalent to saying that on average, we want to make a definite prediction before time t = 0.5.
`folds`	Folds in cross-validation, usually 5 or 10.
`startT`	Time of the first prediction, denoted by t_1 in the manuscript. For instance, if the time grids are {0,1/60,2/60,...,1}, then startT = 25 means that the first prediction is made at t = 24/60.
`pve`	Proportion of variance explained in FPCA, default value is 0.99.
`nbasis`	Number of B-spline basis functions needed for estimation of the mean function and smoothing of covariance. Default value is 10 in refund package, sometimes a smaller number is needed when there are a small number of time grids.
`weight`	A user-supplied weight for each individual. If the user did not supply the weight, we use an inverse probability weighting method to calculate a weight. See details in section 3.4 of the manuscript.

`final.result`	The FPCA fit and the elastic net logistic regression fit at each time grid from startT to the end.
`final.tau`	The optimal τ that minimizes the misclassification error under the budget constraint.

library(reinforcedPred)
set.seed(1)

# take the example training data (high dimensional Z) from the reinforcedPred package
# see documentation for details about the data set train_data_mulZ
Y <- as.numeric(train_data_mulZ$Y)
tildeX.missing <- as.matrix(train_data_mulZ[,2:62])
Z <- as.matrix(train_data_mulZ[,63:dim(train_data_mulZ)[2]])

# analysis starts
budget <- 45
folds <- 5
startT <- 25

result <- reinforced_VS(Y, tildeX.missing, Z, budget, folds, startT, pve = 0.99, nbasis = 10)

# obtained elastic net logistic regression fit and FPCA decompositions
list_cvfit <- (result$final.result)$list_cvfit
list_fpcaFit <- (result$final.result)$list_fpcaFit

# optimal tau that minimizes the misclassification error under the budget constraint
final.tau <- result$final.tau
final.tau

# use the fitted model to predict the label Y for subjects in the test data
# see documentation for details about the data set test_data_mulZ
testY <- as.numeric(test_data_mulZ$testY)
test.tildeX.missing <- as.matrix(test_data_mulZ[,2:62])
test.Z <- as.matrix(test_data_mulZ[,63:dim(test_data_mulZ)[2]])

pred <- modelPredict_VS(list_fpcaFit, list_cvfit, test.tildeX.missing, test.Z, startT, final.tau)

# predicted outcome Y for each subject in the test data
predY.test <- pred$final.label
# misclassification error
mis.error <- sum(predY.test != testY, na.rm = TRUE) / sum(!is.na(testY))
mis.error

# the average cost when we applied the prediction procedure to the test data
pred$avg.cost