sbh: Cross-Validated Survival Bump Hunting

Description Usage Arguments Details Value Acknowledgments Note Author(s) References See Also Examples

Description

Main end-user function for fitting a Survival Bump Hunting (SBH) model (or Group Survival Bump Hunting (GSBH)). It returns an object of class sbh, as generated by our Patient Recursive Survival Peeling (PRSP) algorithm (or Patient Recursive Group Survival Peeling (PRGSP)), containing cross-validated estimates of the target region of the input space with end-points statistics of interest.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  sbh(X, 
      y, 
      delta, 
      B = 30, 
      K = 5, 
      A = 1000, 
      vs = TRUE, 
      vstype = "ppl",
      vsarg = "alpha=1,
               nalpha=1,
               nlambda=100",
      vscons = 0.5,
      cv = TRUE, 
      cvtype = "combined", 
      cvarg = "alpha=0.01,
               beta=0.10,
               peelcriterion=\"lrt\",
               cvcriterion=\"cer\"",
      groups = NULL,
      pv = FALSE,
      decimals = 2, 
      onese = FALSE,
      probval = NULL, 
      timeval = NULL, 
      parallel.vs = FALSE,
      parallel.rep = FALSE,
      parallel.pv = FALSE,
      conf = NULL,
      verbose = TRUE, 
      seed = NULL)

Arguments

X

(n x p) data.frame or numeric matrix of n observations and p input covariates. If a data.frame is provided, it will be coerced to a numeric matrix. Discrete nominal covariates will be treated as ordinal variables. NA missing values are not allowed.

y

n-numeric vector of observed times to event. NA missing values are not allowed.

delta

n-numeric vector of observed status (censoring) indicator variable.

B

Postitive integer of the number of replications of the cross-validation procedure. Defaults to 30.

K

Postitive integer of the number of cross-validation folds (partitions) into which the total number of observations (n) should be randomly split. K must be bigger than 2 for a regular K-fold cross-validation procedure to work and should be greater than 3 for a regular procedure to make sense; K \in {5,...,10} is recommended; defaults to K=5. Setting K also specifies the type of cross-validation to be done:

  • K = 1 carries no cross-validation out, or set-value when cv = FALSE (see below).

  • K \in {2,...,n-1} carries out K-fold cross-validation.

  • K = n carries out leave-one-out cross-validation.

A

Positive integer of the number of permutations for the computation of log-rank permutation p-values. Defaults to 1000. Ignored if pv=FALSE or cv=FALSE.

vs

logical scalar. Flag for optional variable (covariate) screening (pre-selection). Defaults to TRUE.

vstype

character vector in {"prsp", "pcqr", "ppl", "spca"} of one of the four possible variable screening (pre-selection) procedure. See details below. Defaults to "ppl". Ignored if vs is FALSE.

vsarg

Character vector of parameters of the cross-validated variable screening (pre-selection) procedure. Defaults to parameters values of default variable screening (pre-selection) procedure "ppl": vsarg="alpha=1,nalpha=1,nlambda=100". Note that vsarg comes as a character string between double quotes, with comas separated values, and without white spaces. All the following parameters are ignored if vs is FALSE.
PRSP:

  • alpha = numeric data quantile in (0,1) to peel off at each step of the peeling sequence of the PRSP algorithm. Suggests 0.01.

  • beta = numeric scalar of minimum box support at the end of the peeling sequence. Suggests 0.10.

  • msize = positive integer or NULL to control the model size, i.e the number of screened variables used for fitting the Survival Bump Hunting model. Use a single non-NULL value as the maximum model size (cardinal of subset of top-screened variables) within the allowable range [1,floor(p)]. Alternatively, use msize=NULL to allow the optimal model size to be determined by cross-validation. See below for details. Suggests NULL.

  • peelcriterion in {"lhr", "lrt", "chs", "grp"} stands for the peeling criterion Log-Hazard Ratio (LHR), Log-Rank Test (LRT), Cumulative Hazard Summary (CHS), or Group (GRP), respectively, that is used in the PRSP algorithm (LHR, LRT, CHS) or the PRGSP algorithm (GRP). Suggests "lrt".

  • cvcriterion in {"lhr", "lrt", "cer"} stands for the cross-validation criterion Log-Hazard Ratio (LHR), Log-Rank Test (LRT), or Concordance Error Rate (CER), respectively, that is used for optimizing the model size (cardinal of subset of top-screened variables) and the optimal number of peeling steps (optimal peeling sequence length) in the PRSP variable screening procedure. Suggests "cer".

PCQR:

  • tau = numeric quantile in [0, 0.5] used in the censored quantile regression model. It is the tuning parameter of the censored quantile loss. It represents the conditional censored quantile of the survival response to be estimated. It includes the absolute loss when tau=0.5. Suggests 0.5.

  • alpha = numeric elasticnet mixing parameter in [0, 1] that controls the relative contribution from the lasso and the ridge penalty. The penalty is defined as (1-alpha)/2||beta||_2^2+alpha||beta||_1. alpha = 1 is the lasso penalty, and alpha = 0 the ridge penalty. If alpha is set to NULL, a vector of values of length nalpha is used, else alpha value is used and nalpha is set to 1. Suggests alpha=1 (lasso).

  • nalpha = positive integer of number of alpha values to consider in the grid search. Suggests 1 (see above: lasso).

  • nlambda = positive integer of number of elasticnet penalization lambda values to consider in the grid search. Suggests 100.

PPL:

  • alpha = numeric elasticnet mixing parameter in [0, 1] that controls the relative contribution from the lasso and the ridge penalty. See R package glmnet. The penalty is defined as (1-alpha)/2||beta||_2^2+alpha||beta||_1. alpha = 1 is the lasso penalty, and alpha = 0 the ridge penalty. If alpha is set to NULL, a vector of values of length nalpha is used, else alpha value is used and nalpha is set to 1. Suggests alpha=1 (lasso).

  • nalpha = positive integer of number of alpha values to consider in the grid search. Suggests 1 (see above: lasso).

  • nlambda = positive integer of number of elasticnet penalization lambda values to consider in the grid search. Suggests 100.

SPCA:

  • n.thres = number of thresholds to consider in the grid search. It cannot be less than n (sample size). Suggests 20.

  • n.pcs = number of cross-validation principal components to use in {1,2,3}. It cannot be less than n (sample size) and more than p (dimensionality), and will be reset to n.pcs = p - 1 otherwise. Suggests 3.

  • n.var = minimum number of variables to include in determining range for threshold. If cannot be more than p (dimensionality), and will be reset to n.var = p - 1 otherwise. Suggests 5.

vscons

numeric scalar in [1/K, 1], specifying the conservativeness of the variable screening (pre-selection) procedure, where 1/K is the least conservative and 1 is the most. Suggests 0.5

cv

logical scalar. Flag for optional cross-validation (CV) of variable screening (pre-selection) parameters and Survival Bump Hunting fitting by PRSP algorithm. See below for details. Defaults to TRUE.

cvtype

character vector in {"combined", "averaged"} specifying the cross-validation technique. Defaults to "combined". Ignored if cv is FALSE.

cvarg

character vector describing the parameters used in the PRSP algorithm for fitting the Survival Bump Hunting model. Defaults to:
cvarg="alpha=0.01,beta=0.10,peelcriterion=\"lrt\",cvcriterion=\"cer\"". Note that cvarg comes as a character string between double quotes, with comas separated values, and without white spaces.

  • alpha = numeric data quantile in (0,1) to peel off at each step of the peeling sequence of the PRSP algorithm. Defaults to 0.01.

  • beta = numeric scalar of minimum box support at the end of the peeling sequence. Defaults to 0.10.

  • peelcriterion in {"lhr", "lrt", "chs", "grp"} stands for the peeling criterion Log-Hazard Ratio (LHR), Log-Rank Test (LRT), Cumulative Hazard Summary (CHS), or Group (GRP), respectively, that is used in the PRSP or PRGSP algorithm. Defaults to "lrt".

  • cvcriterion in {"lhr", "lrt", "cer"} stands for the cross-validation criterion Log-Hazard Ratio (LHR), Log-Rank Test (LRT), or Concordance Error Rate (CER), respectively, that is used for tuning/optimizing the optimal number of peeling steps (i.e. optimal peeling sequence length) in the PRSP algorithm. Defaults to "cer". Ignored if cv is FALSE.

if peelcriterion is in {"lhr", "lrt", "chs"}, groups is automatically set to NULL.

groups

character or numeric vector, or factor of group membership indicator variable of length the data sample size with as many different values or levels as the number of data sample groups. To be specified if algorithm Patient Recursive Group Survival Peeling (PRGSP) is to be used, i.e. with option peelcriterion = "grp". Only two groups are allowed at this point. Defaults to NULL, i.e. when regular PRSP is to be used.

pv

logical scalar. Flag for computation of log-rank p-values. Defaults to FALSE.

decimals

Positive integer of the number of user-specified significant decimals to output results. Defaults to 2.

onese

logical scalar. Flag for using the 1-standard error rule instead of extremum value of the cross-validation criterion when tuning/optimizing model parameters. Defaults to FALSE.

probval

numeric scalar of the survival probability at which we want to get the endpoint box survival time. Defaults to NULL (i.e. maximal survival probability value).

timeval

numeric scalar of the survival time at which we want to get the endpoint box survival probability. Defaults to NULL (i.e. maximal survival time value).

parallel.vs

logical. Is parallelization to be performed for variable screening? Defaults to FALSE, because it is not implemented yet.

parallel.rep

logical. Is parallelization to be performed for replications? Defaults to FALSE.

parallel.pv

logical. Is parallelization to be performed for computation of log-rank p-values? Defaults to FALSE.

conf

list of 5 fields containing the parameters values needed for creating the parallel backend (cluster configuration). See details below for usage. Optional, defaults to NULL, but all fields are required if used:

  • type : character vector specifying the cluster type ("SOCKET", "MPI").

  • spec : A specification (character vector or integer scalar) appropriate to the type of cluster.

  • homogeneous : logical scalar to be set to FALSE for inhomogeneous clusters.

  • verbose : logical scalar to be set to FALSE for quiet mode.

  • outfile : character vector of an output log file name to direct the stdout and stderr connection output from the workernodes. "" indicates no redirection.

verbose

logical scalar. Is the output to be verbose? Optional, defaults to TRUE.

seed

Positive integer scalar of the user seed to reproduce all the results. Defaults to NULL.

Details

The main function sbh relies on an optional variable screening (pre-selection) procedure that is run before the actual variable usage (selection) is done at the time of fitting the Survival Bump Hunting (SBH) or Group Survival Bump Hunting (GSBH) model using our PRSP or PRGSP algorithm, respectively. At this point, the user can choose between four possible variable screening (pre-selection) procedures:

NA missing values are not allowed in PRIMsrc, because it depends on R package glmnet, which doesn't handle missing values. In case of high-dimensional data (p >> n), the recommendation is to use PPL or SPCA because of computational efficiency. Variable screening (pre-selection) is done by computing occurrence frequencies of top-ranking variables over the cross-validation folds and replicates. The conservativeness of the procedure is controled by the argument vscons.

Example of vscons values for pre-selection are as follows:

Although any value in the interval [1/K,1] is accepted, we recommand using the interval [1/K, 1/2] to avoid excessive conservativeness. Final variable usage (selection) is done at the time of fitting the Survival Bump Hunting (SBH) model itself using our PRSP algorithm on previously screened variables by collecting those variables that have the maximum occurrence frequency in each peeling step over cross-validation folds and replicates.

If cross-validation is done (cv = TRUE, the optimal number of peeling steps (optimal peeling sequence length), and the optimal model size (cardinal of subset of top-screened variables) will be determined by cross-validation. If cv = FALSE, no cross-validation at all will be performed, and the values of K and vscons will both be reset to 1, and traditional log-rank Mantel-Haenszel p-values will be computed (using the Chi-Squared distribution with 1 df for the null distribution) instead of log-rank permutation p-values (using the permutation distribution for the null distribution).

The argument groups is to be specified if the Patient Recursive Group Survival Peeling (PRGSP) algorithm is used. The PRGSP algorithm is a derivation of our original Patient Recursive Survival Peeling (PRSP) algorithm (Dazard et al. 2016) to search for (or find an extreme of) outcome difference within existing (fixed) groups of observations. See Rao et al. (2018) for details and an application in Disparity Subtyping.

In the PRSP variable screening procedure (vsarg of "prsp"), setting option msize to a single non-NULL value within the allowable range [1,floor(p)] will override the cross-validation setting within the variable screening procedure. This could be recommended for high-dimensional data (p >> n) to reduce the computational burden. In this situation, we suggest an arbitrary value of msize within [1, floor(p/5)]. Conversely, setting msize=NULL will force the cross-validation within the variable screening procedure by automaticaly generating a vector of model sizes (cardinals of subset of top-screened variables) within the restricted range [1, floor(p/5)], which will be used to determine the optimal value of model size.

In fitting the Survival Bump Hunting (SBH) model itself, note that the result contains initial step #0, which corresponds to the entire set of the (training) data. Also, the number of peeling steps that is within the allowable range [1,ceiling(log(1/n) / log(1 - (1/n)))] is further restricted when either of the metaparameter alpha or beta takes on values other than the smallest possible fraction of the (training) data, i.e. \frac{1}{n^t}, where n^t is the training sample size:

When cross-validation is requested (cv=TRUE), the function performs a supervised (stratified) random splitting of the observations accounting for the classes/strata provided by delta (censoring). This is because it is desireable that the data splitting balances the class distributions of the outcome within the cross-validation splits. For each screening method and for building the final Survival Bump Hunting (SBH) model, all model tuning parameters are simultaneously estimated by cross-validation. The function offers a number of options for the cross-validation to be perfomed: the number of replications B; the type of technique; the peeling criterion; and the optimization criterion.

The returned S3-class sbh object contains cross-validated estimates of all the decision-rules of used (selected) covariates and all other statistical quantities of interest at each iteration of the peeling sequence (inner loop of the PRSP algorithm). This enables the graphical display of results of profiling curves for model tuning, peeling trajectories, covariate traces and survival distributions (see plotting functions for more details).

In case replicated cross-validations are performed, a "summary report" of the outputs is done over the B replicates as follows:

If the computation of log-rank p-values is desired, then running with the parallelization option is strongly advised. In case of large (p > n) or very large (p >> n) datasets, it is also highly recommended to use the parallelization option.

The function sbh relies on the R package parallel to create a parallel backend within an R session. This enables access to a cluster of compute cores and/or nodes on a local and/or remote machine(s) and scaling-up with the number of CPU cores available and efficient parallel execution. To run a procedure in parallel (with parallel RNG), argument parallel is to be set to TRUE and argument conf is to be specified (i.e. non NULL). Argument conf uses the options described in function makeCluster of the R packages parallel and snow. PRIMsrc supports two types of communication mechanisms between master and worker processes: 'Socket' or 'Message-Passing Interface' ('MPI'). In PRIMsrc, parallel 'Socket' clusters use sockets communication mechanisms only (no forking) and are therefore available on all platforms, including Windows, while parallel 'MPI' clusters use high-speed interconnects mechanism in networks of computers (with distributed memory) and are therefore available only in these architectures. A parallel 'MPI' cluster also requires R package Rmpi to be installed. Value type is used to setup a cluster of type 'Socket' ("SOCKET") or 'MPI' ("MPI"), respectively. Depending on this type, values of spec are to be used alternatively:

The actual creation of the cluster, its initialization, and closing are all done internally. For more details, see the reference manual of R package snow and examples below.

When random number generation is needed, the creation of separate streams of parallel RNG per node is done internally by distributing the stream states to the nodes. For more details, see the vignette of R package parallel. The use of a seed allows to reproduce the results within the same type of session: the same seed will reproduce the same results within a non-parallel session or within a parallel session, but it will not necessarily give the exact same results (up to sampling variability) between a non-parallelized and parallelized session due to the difference of management of the seed between the two (see parallel RNG and value of returned seed below).

Value

Object of class sbh (Patient Recursive Survival Peeling) list containing the following 23 fields:

X

numeric matrix of original dataset.

y

numeric vector of observed failure / survival times.

delta

numeric vector of observed event indicator in {1,0}.

B

positive integer of the number of replications used in the cross-validation procedure.

K

positive integer of the number of folds used in the cross-validation procedure.

A

positive integer of the number of permutations used for the computation of log-rank p-values.

vs

logical scalar of returned flag of optional variable pre-selection.

vstype

character vector of the optional variable pre-selection procdure used.

vsarg

list of parameters used in the pre-selection procedure.

vscons

numeric scalar of conservativeness of the variable screening (pre-selection) procedure.

cv

logical scalar of returned flag of optional cross-validation.

cvtype

character vector of the cross-validation technique used.

cvarg

list of parameters used in the Survival Bump Hunting procedure.

groups

vector of group membership if algorithm Patient Recursive Group Survival Peeling (PRGSP) is used.

pv

logical scalar of returned flag of optional computation of log-rank p-values.

onese

logical scalar of returned flag of 1-standard error rule.

decimals

integer of the number of user-specified significant decimals.

probval

Numeric scalar of survival probability used.

timeval

Numeric scalar of survival time used.

cvprofiles

list of 10 fields of cross-validated tuning profiles and estimates, each of length B (one for each replicate):

  • cv.varprofiles: numeric matrix of cross-validation criterion used for tuning/optimizing the variable screening size in the PRSP variable screening (pre-selection) procedure (NULL otherwise). Values are by columns (peeling steps) and replicates (rows).

  • cv.varprofiles.mean: numeric vector of means (across replicates) of the above cross-validation criterion by peeling steps.

  • cv.varprofiles.se: numeric vector of standard errors (across replicates) of the above cross-validation criterion by peeling steps.

  • cv.varset.opt: numeric scalar of optimal variable screening size according to the extremum.

  • cv.varset.1se: numeric scalar of optimal variable screening size according to 1SE rule.

  • cv.stepprofiles: numeric matrix of cross-validation criterion used for tuning/optimizing the peeling sequence length (i.e. number of peeling steps) in the PRSP algorithm. Values are by columns (peeling steps) and replicates (rows).

  • cv.stepprofiles.mean: numeric vector of means (across replicates) of the above cross-validation criterion by peeling steps.

  • cv.stepprofiles.se: numeric vector of standard errors (across replicates) of the above cross-validation criterion by peeling steps.

  • cv.nsteps.opt: numeric scalar of optimal number of peeling steps according to the extremum.

  • cv.nsteps.1se: numeric scalar of optimal number of peeling steps according to 1SE rule.

cvfit

list with 12 fields of cross-validated SBH output estimates, each of length B (one for each replicate):

  • cv.maxsteps: numeric scalar of maximal number of peeling steps (counting step #0 - see Details section).

  • cv.nsteps: numeric scalar of optimal number of peeling steps (counting step #0 - see Details section).

  • cv.boxind: logical matrix in TRUE, FALSE of individual observation box membership indicator (columns) for all peeling steps (rows).

  • cv.boxind.size: numeric vector of box sample size for all peeling steps.

  • cv.boxind.support: numeric vector of box support for all peeling steps.

  • cv.rules: data.frame of decision rules on the covariates (columns) for all peeling steps (rows).

  • cv.screened: numeric vector of screened (pre-selected) covariates, indexed in reference to original index.

  • cv.trace: numeric vector of the modal trace values of covariate usage for all peeling steps.

  • cv.sign: numeric vector in {-1,+1} of directions of peeling for all used (selected) covariates.

  • cv.used: numeric vector of covariates used (selected) for peeling, indexed in reference to original index.

  • cv.stats: numeric matrix of box endpoint quantities of interest (columns) for all peeling steps (rows).

  • cv.pval: list with 2 fields of two vectors. The first cvfit$pval is a numeric vector for log-rank p-values of separation of survival distributions, The second cvfit$seed is is an integer scalar if parallelization is used, or an integer vector of A values, one for each permutation, if parallelization is not used.

success

logical scalar of the returned flag of success at fitting the SBH model.

seed

User seed. An integer scalar if parallelization is used, or an integer vector of B values, one for each replication, if parallelization is not used.

Acknowledgments

This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. This project was partially funded by the National Institutes of Health NIH - National Cancer Institute (R01-CA160593) to J-E. Dazard and J.S. Rao.

Note

Unique end-user function for fitting the Survival Bump Hunting model.

Author(s)

Maintainer: "Jean-Eudes Dazard, Ph.D." jean-eudes.dazard@case.edu

References

See Also

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
#===================================================
# Loading the library and its dependencies
#===================================================
library("PRIMsrc")

## Not run: 
    #===================================================
    # PRIMsrc Package news
    #===================================================
    PRIMsrc.news()
    
    #===================================================
    # PRIMsrc Package citation
    #===================================================
    citation("PRIMsrc")
    
    #===================================================
    # Demo with a synthetic dataset
    # Use help for descriptions
    #===================================================
    data("Synthetic.1", package="PRIMsrc")
    ?Synthetic.1

## End(Not run)

#===================================================
# Simulated dataset #1 (n=250, p=3)
# Peeling criterion = LRT
# Cross-Validation criterion = LRT
# With Combined Cross-Validation (RCCV)
# Without Replications (B = 1)
# Without variable screening (pre-selection)
# Without computation of log-rank \eqn{p}-values
# Without parallelization
#===================================================
synt1 <- sbh(X = Synthetic.1[ , -c(1,2), drop=FALSE],
             y = Synthetic.1[ ,1, drop=TRUE],
             delta = Synthetic.1[ ,2, drop=TRUE],
             B = 1,
             K = 3,
             vs = FALSE,
             cv = TRUE,
             cvtype = "combined",
             cvarg = "alpha=0.10,
                      beta=0.10,
                      peelcriterion=\"lrt\",
                      cvcriterion=\"lrt\"",
             groups = NULL,
             pv = FALSE,
             decimals = 2,
             onese = FALSE,
             probval = 0.5,
             timeval = NULL,
             parallel.vs = FALSE,
             parallel.rep = FALSE,
             parallel.pv = FALSE,
             conf = NULL,
             verbose = FALSE,
             seed = 123)

summary(object = synt1)
print(x = synt1)

n <- 100
p <- length(synt1$cvfit$cv.used)
x <- matrix(data = runif(n = n*p, min = 0, max = 1),
            nrow = n, ncol = p, byrow = FALSE,
            dimnames = list(1:n, paste("X", 1:p, sep="")))
synt1.pred <- predict(object = synt1,
                      newdata = x,
                      steps = synt1$cvfit$cv.nsteps)

plot(x = synt1,
     main = paste("Scatter plot for model #1", sep=""),
     proj = c(1,2),
     steps = synt1$cvfit$cv.nsteps,
     pch = 16, cex = 0.5, col = c(1,2),
     boxes = TRUE,
     col.box = 2, lty.box = 2, lwd.box = 1,
     add.caption.box = TRUE, 
     text.caption.box = paste("Step: ", synt1$cvfit$cv.nsteps, sep=""),
     device = NULL)

plot_profile(object = synt1,
             main = "Cross-validated tuning profiles for model #1",
             pch = 20, col = 1, lty = 1, lwd = 0.5, cex = 0.5,
             add.sd = TRUE, 
             add.profiles = TRUE,
             add.caption = TRUE, 
             text.caption = c("Mean","Std. Error"),
             device = NULL)

plot_traj(object = synt1,
          main = paste("Cross-validated peeling trajectories for model #1", sep=""),
          col = 1, lty = 1, lwd = 0.5, cex = 0.5,
          toplot = synt1$cvfit$cv.used,
          device = NULL)

plot_trace(object = synt1,
           main = paste("Cross-validated trace plots for model #1", sep=""),
           xlab = "Box Mass", ylab = "Covariate Range (centered)",
           col = 1, lty = 1, lwd = 0.5, cex = 0.5,
           toplot = synt1$cvfit$cv.used,
           center = TRUE, scale = FALSE,
           device = NULL)

plot_km(object = synt1,
        main = paste("Cross-validated probability curves for model #1", sep=""),
        xlab = "Time", ylab = "Probability",
        ci = TRUE,
        steps = 1:synt1$cvfit$cv.nsteps,
        col = c(1,2), lty = 1, lwd = 0.5, cex = 0.5,
        add.caption = TRUE,
        text.caption = c("outbox","inbox"), 
        device = NULL)
                                
## Not run: 
    #===================================================
    # Examples of parallel backend parametrization 
    #===================================================
    if (require("parallel")) {
       cat("'parallel' is attached correctly \n")
    } else {
       stop("'parallel' must be attached first \n")
    }
    #===================================================
    # Ex. #1 - Multicore PC
    # Running WINDOWS
    # SOCKET communication cluster
    # Shared memory parallelization
    #===================================================
    cpus <- parallel::detectCores(logical = TRUE)
    conf <- list("spec" = rep("localhost", cpus),
                 "type" = "SOCKET",
                 "homo" = TRUE,
                 "verbose" = TRUE,
                 "outfile" = "")
    #===================================================
    # Ex. #2 - Master node + 3 Worker nodes cluster
    # All nodes equipped with identical setups of multicores 
    # (8 core CPUs per machine for a total of 32)
    # SOCKET communication cluster
    # Distributed memory parallelization
    #===================================================
    masterhost <- Sys.getenv("HOSTNAME")
    slavehosts <- c("compute-0-0", "compute-0-1", "compute-0-2")
    nodes <- length(slavehosts) + 1
    cpus <- 8
    conf <- list("spec" = c(rep(masterhost, cpus),
                            rep(slavehosts, cpus)),
                 "type" = "SOCKET",
                 "homo" = TRUE,
                 "verbose" = TRUE,
                 "outfile" = "")
    #===================================================
    # Ex. #3 - Enterprise Multinode Cluster w/ multicore/node  
    # Running LINUX with SLURM scheduler
    # MPI communication cluster
    # Distributed memory parallelization
    #==================================================
    if (require("Rmpi")) {
        cat("'Rmpi' is attached correctly \n")
    } else {
        stop("'Rmpi' must be attached first \n")
    }
    # Below, variable 'cpus' is the total number of requested 
    # taks (threads/CPUs), which is specified from within a 
    # SLURM script.
    cpus <- as.numeric(Sys.getenv("SLURM_NTASKS"))
    conf <- list("spec" = cpus,
                 "type" = "MPI",
                 "homo" = TRUE,
                 "verbose" = TRUE,
                 "outfile" = "")
    #===================================================
    # Simulated dataset #1 (n=250, p=3)
    # Peeling criterion = LRT
    # Cross-Validation criterion = LRT
    # With Combined Cross-Validation (RCCV)
    # With Replications (B = 30)
    # With PPL variable screening (pre-selection)
    # With computation of log-rank \eqn{p}-values
    # With parallelization
    #===================================================                         
    synt1 <- sbh(X = Synthetic.1[ , -c(1,2), drop=FALSE],
                 y = Synthetic.1[ ,1, drop=TRUE],
                 delta = Synthetic.1[ ,2, drop=TRUE],
                 B = 30,
                 K = 5,
                 A = 1000,
                 vs = TRUE,
                 vstype = "ppl",
                 vsarg = "alpha=1,
                          nalpha=1,
                          nlambda=100",
                 vscons = 0.5,
                 cv = TRUE,
                 cvtype = "combined",
                 cvarg = "alpha=0.01,
                          beta=0.10,
                          peelcriterion=\"lrt\",
                          cvcriterion=\"lrt\"",
                 groups = NULL,
                 pv = TRUE,
                 decimals = 2,
                 onese = FALSE,
                 probval = 0.5,
                 timeval = NULL,
                 parallel.vs = FALSE,
                 parallel.rep = TRUE,
                 parallel.pv = TRUE,
                 conf = conf,
                 verbose = TRUE,
                 seed = 123)      
    #===================================================
    # Simulated dataset #4 (n=100, p=1000)
    # Peeling criterion = LRT
    # Cross-Validation criterion = CER
    # With Combined Cross-Validation (RCCV)
    # With Replications (B = 30)
    # With PRSP variable screening (pre-selection)
    # With computation of log-rank \eqn{p}-values
    # With parallelization
    #===================================================                         
    synt4 <- sbh(X = Synthetic.4[ , -c(1,2), drop=FALSE],
                 y = Synthetic.4[ ,1, drop=TRUE],
                 delta = Synthetic.4[ ,2, drop=TRUE],
                 B = 30,
                 K = 5,
                 A = 1000,
                 vs = TRUE,
                 vstype = "prsp",
                 vsarg = "alpha=0.01,
                          beta=0.10,
                          msize=NULL,
                          peelcriterion=\"lrt\",
                          cvcriterion=\"cer\"",
                 vscons = 0.5,
                 cv = TRUE,
                 cvtype = "combined",
                 cvarg = "alpha=0.01,
                          beta=0.10,
                          peelcriterion=\"lrt\",
                          cvcriterion=\"cer\"",
                 groups = NULL,
                 pv = TRUE,
                 decimals = 2,
                 onese = FALSE,
                 probval = 0.5,
                 timeval = NULL,
                 parallel.vs = FALSE,
                 parallel.rep = TRUE,
                 parallel.pv = TRUE,
                 conf = conf,
                 verbose = TRUE,
                 seed = 123)

## End(Not run)

PRIMsrc documentation built on May 2, 2019, 11:45 a.m.