Description Usage Arguments Value Examples
reinforced_VS
implements a cross-validation approach to find an optimal τ such that the
misclassification error is minimized under a certain budget constraint. This function is used when the baseline
covariates are of high-dimension.
1 2 | reinforced_VS(Y, X, Z, budget, folds, startT, pve = 0.99, nbasis = 10,
weight)
|
Y |
The outcome variable, vector of length n, taking values in {1, 0, NA}, where 1 = disease, 0 = not, NA = missing. |
X |
Observed longitudinal biomarker, matrix of n by nTotal, where nTotal denotes the total number of time grids. Missing values are denoted by NA. |
Z |
Other baseline covariates. |
budget |
The budget constraint. For instance, if the time grids are {0,1/60,2/60,...,1}. Budget = 30 means that the average follow up was no longer than 30 time grids. This is equivalent to saying that on average, we want to make a definite prediction before time t = 0.5. |
folds |
Folds in cross-validation, usually 5 or 10. |
startT |
Time of the first prediction, denoted by t_1 in the manuscript. For instance, if the time grids are {0,1/60,2/60,...,1}, then startT = 25 means that the first prediction is made at t = 24/60. |
pve |
Proportion of variance explained in FPCA, default value is 0.99. |
nbasis |
Number of B-spline basis functions needed for estimation of the mean function and smoothing of covariance. Default value is 10 in refund package, sometimes a smaller number is needed when there are a small number of time grids. |
weight |
A user-supplied weight for each individual. If the user did not supply the weight, we use an inverse probability weighting method to calculate a weight. See details in section 3.4 of the manuscript. |
final.result |
The FPCA fit and the elastic net logistic regression fit at each time grid from startT to the end. |
final.tau |
The optimal τ that minimizes the misclassification error under the budget constraint. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | library(reinforcedPred)
set.seed(1)
# take the example training data (high dimensional Z) from the reinforcedPred package
# see documentation for details about the data set train_data_mulZ
Y <- as.numeric(train_data_mulZ$Y)
tildeX.missing <- as.matrix(train_data_mulZ[,2:62])
Z <- as.matrix(train_data_mulZ[,63:dim(train_data_mulZ)[2]])
# analysis starts
budget <- 45
folds <- 5
startT <- 25
result <- reinforced_VS(Y, tildeX.missing, Z, budget, folds, startT, pve = 0.99, nbasis = 10)
# obtained elastic net logistic regression fit and FPCA decompositions
list_cvfit <- (result$final.result)$list_cvfit
list_fpcaFit <- (result$final.result)$list_fpcaFit
# optimal tau that minimizes the misclassification error under the budget constraint
final.tau <- result$final.tau
final.tau
# use the fitted model to predict the label Y for subjects in the test data
# see documentation for details about the data set test_data_mulZ
testY <- as.numeric(test_data_mulZ$testY)
test.tildeX.missing <- as.matrix(test_data_mulZ[,2:62])
test.Z <- as.matrix(test_data_mulZ[,63:dim(test_data_mulZ)[2]])
pred <- modelPredict_VS(list_fpcaFit, list_cvfit, test.tildeX.missing, test.Z, startT, final.tau)
# predicted outcome Y for each subject in the test data
predY.test <- pred$final.label
# misclassification error
mis.error <- sum(predY.test != testY, na.rm = TRUE) / sum(!is.na(testY))
mis.error
# the average cost when we applied the prediction procedure to the test data
pred$avg.cost
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.