reinforced_VS: Reinforced risk prediction with budget constraint, variable...

Description Usage Arguments Value Examples

Description

reinforced_VS implements a cross-validation approach to find an optimal τ such that the misclassification error is minimized under a certain budget constraint. This function is used when the baseline covariates are of high-dimension.

Usage

1
2
reinforced_VS(Y, X, Z, budget, folds, startT, pve = 0.99, nbasis = 10,
  weight)

Arguments

Y

The outcome variable, vector of length n, taking values in {1, 0, NA}, where 1 = disease, 0 = not, NA = missing.

X

Observed longitudinal biomarker, matrix of n by nTotal, where nTotal denotes the total number of time grids. Missing values are denoted by NA.

Z

Other baseline covariates.

budget

The budget constraint. For instance, if the time grids are {0,1/60,2/60,...,1}. Budget = 30 means that the average follow up was no longer than 30 time grids. This is equivalent to saying that on average, we want to make a definite prediction before time t = 0.5.

folds

Folds in cross-validation, usually 5 or 10.

startT

Time of the first prediction, denoted by t_1 in the manuscript. For instance, if the time grids are {0,1/60,2/60,...,1}, then startT = 25 means that the first prediction is made at t = 24/60.

pve

Proportion of variance explained in FPCA, default value is 0.99.

nbasis

Number of B-spline basis functions needed for estimation of the mean function and smoothing of covariance. Default value is 10 in refund package, sometimes a smaller number is needed when there are a small number of time grids.

weight

A user-supplied weight for each individual. If the user did not supply the weight, we use an inverse probability weighting method to calculate a weight. See details in section 3.4 of the manuscript.

Value

final.result

The FPCA fit and the elastic net logistic regression fit at each time grid from startT to the end.

final.tau

The optimal τ that minimizes the misclassification error under the budget constraint.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
library(reinforcedPred)
set.seed(1)

# take the example training data (high dimensional Z) from the reinforcedPred package
# see documentation for details about the data set train_data_mulZ
Y <- as.numeric(train_data_mulZ$Y)
tildeX.missing <- as.matrix(train_data_mulZ[,2:62])
Z <- as.matrix(train_data_mulZ[,63:dim(train_data_mulZ)[2]])

# analysis starts
budget <- 45
folds <- 5
startT <- 25

result <- reinforced_VS(Y, tildeX.missing, Z, budget, folds, startT, pve = 0.99, nbasis = 10)

# obtained elastic net logistic regression fit and FPCA decompositions
list_cvfit <- (result$final.result)$list_cvfit
list_fpcaFit <- (result$final.result)$list_fpcaFit

# optimal tau that minimizes the misclassification error under the budget constraint
final.tau <- result$final.tau
final.tau

# use the fitted model to predict the label Y for subjects in the test data
# see documentation for details about the data set test_data_mulZ
testY <- as.numeric(test_data_mulZ$testY)
test.tildeX.missing <- as.matrix(test_data_mulZ[,2:62])
test.Z <- as.matrix(test_data_mulZ[,63:dim(test_data_mulZ)[2]])

pred <- modelPredict_VS(list_fpcaFit, list_cvfit, test.tildeX.missing, test.Z, startT, final.tau)

# predicted outcome Y for each subject in the test data
predY.test <- pred$final.label
# misclassification error
mis.error <- sum(predY.test != testY, na.rm = TRUE) / sum(!is.na(testY))
mis.error

# the average cost when we applied the prediction procedure to the test data
pred$avg.cost

reinforcedPred documentation built on May 2, 2019, 4:17 a.m.