reinforced: Reinforced risk prediction with budget constraint

Description Usage Arguments Value Examples

Description

reinforced implements a cross-validation approach to find an optimal τ such that the misclassification error is minimized under a certain budget constraint.

Usage

1
2
reinforced(Y, X, Z, budget, folds, startT, link, pve = 0.99,
  nbasis = 10, weight)

Arguments

Y

The outcome variable, vector of length n, taking values in {1, 0, NA}, where 1 = disease, 0 = not, NA = missing.

X

Observed longitudinal biomarker, matrix of n by nTotal, where nTotal denotes the total number of time grids. Missing values are denoted by NA.

Z

Other baseline covariates.

budget

The budget constraint. For instance, if the time grids are {0,1/60,2/60,...,1}. Budget = 30 means that the average follow up was no longer than 30 time grids. This is equivalent to saying that on average, we want to make a definite prediction before time t = 0.5.

folds

Folds in cross-validation, usually 5 or 10.

startT

Time of the first prediction, denoted by t_1 in the manuscript. For instance, if the time grids are {0,1/60,2/60,...,1}, then startT = 25 means that the first prediction is made at t = 24/60.

link

The link function used in functional generalized linear models, e.g. "logit", "probit".

pve

Proportion of variance explained in FPCA, default value is 0.99.

nbasis

Number of B-spline basis functions needed for estimation of the mean function and smoothing of covariance. Default value is 10 in refund package, sometimes a smaller number is needed when there are a small number of time grids.

weight

A user-supplied weight for each individual. If the user did not supply the weight, we use an inverse probability weighting method to calculate a weight. See details in section 3.4 of the manuscript.

Value

final.result

The FPCA fit and the parameter estimates at each time grid from startT to the end.

final.tau

The optimal τ that minimizes the misclassification error under the budget constraint.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
library(reinforcedPred)
set.seed(1)

# take the example training data (univariate Z) from the reinforcedPred package
# see documentation for details about the data set train_data_uniZ
Y <- as.numeric(train_data_uniZ$Y)
tildeX.missing <- as.matrix(train_data_uniZ[,2:62])
Z <- as.numeric(train_data_uniZ$Z)

# analysis starts
budget <- 45
folds <- 5
startT <- 25
link <- "probit"

result <- reinforced(Y, tildeX.missing, Z, budget, folds, startT, link, pve = 0.99, nbasis = 10)

# obtained parameter estimates and FPCA decompositions
list_paraEst <- (result$final.result)$list_paraEst
list_fpcaFit <- (result$final.result)$list_fpcaFit

# optimal tau that minimizes the misclassification error under the budget constraint
final.tau <- result$final.tau
final.tau

# use the fitted model to predict the label Y for subjects in the test data
# see documentation for details about the data set test_data_uniZ
testY <- as.numeric(test_data_uniZ$testY)
test.tildeX.missing <- as.matrix(test_data_uniZ[,2:62])
test.Z <- as.numeric(test_data_uniZ$test.Z)

pred <- modelPredict(list_fpcaFit, list_paraEst, test.tildeX.missing, test.Z, startT, final.tau)

# predicted outcome Y for each subject in the test data
predY.test <- pred$final.label
# misclassification error
mis.error <- sum(predY.test != testY, na.rm = TRUE) / sum(!is.na(testY))
mis.error

# the average cost when we applied the prediction procedure to the test data
pred$avg.cost

reinforcedPred documentation built on May 2, 2019, 4:17 a.m.