# sbh: Cross-Validated Survival Bump Hunting In PRIMsrc: PRIM Survival Regression Classification

## Description

Main end-user function for fitting a cross-validated Survival Bump Hunting (SBH) model. Returns a cross-validated sbh object, as generated by our Patient Recursive Survival Peeling (PRSP) algorithm, containing cross-validated estimates of end-points statistics of interest. Generates an object of class sbh.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  sbh(X, y, delta, B = 30, K = 5, A = 1000, vs = TRUE, vstype = "ppl", vsarg = "alpha=1, nalpha=1, nlambda=100, vscons=0.5", cv = TRUE, cvtype = "combined", cvarg = "alpha=0.01, beta=0.05, minn=5, L=NULL, peelcriterion=\"lrt\", cvcriterion=\"cer\"", pv = FALSE, decimals = 2, onese = FALSE, probval = NULL, timeval = NULL, parallel.vs = FALSE, parallel.rep = FALSE, parallel.pv = FALSE, conf = NULL, verbose = TRUE, seed = NULL) 

## Arguments

 X (n x p) data.frame or numeric matrix of n observations and p input covariates. If a data.frame is provided, it will be coerced to a numeric matrix. Discrete nominal covariates will be treated as ordinal variables. y n-numeric vector of observed times to event. delta n-numeric vector of observed status (censoring) indicator variable. B Postitive integer of the number of replications of the cross-validation procedure. Defaults to 30. K Postitive integer of the number of cross-validation folds (partitions) into which the observations (n) should be randomly split. K must be bigger than 2 for a regular K-fold cross-validation procedure to work and should be greater than 3 for a regular procedure to make sense; K \in {5,...,10} is recommended; defaults to K=5. Setting K also specifies the type of cross-validation to be done: K = 1 carries no cross-validation out, or set-value when cv = FALSE (see below). K \in {2,...,n-1} carries out K-fold cross-validation. K = n carries out leave-one-out cross-validation. A Positive integer of the number of permutations for the computation of log-rank permutation p-values. Defaults to 1000. Ignored if pv=FALSE or cv=FALSE. vs logical scalar. Flag for optional variable (covariate) screening (pre-selection). Defaults to TRUE. vstype character vector in {"prsp", "pcqr", "ppl", "spca"} specifying the variable screening (pre-selection) procedure. Defaults to "ppl". Reset to NA if vs is FALSE. vsarg Character vector specifying the parameters of cross-validated variable screening (pre-selection) procedure. Defaults to PPL with its suggested parameters values: vsarg="alpha=1,nalpha=1,nlambda=100,vscons=0.5". Note that vsarg comes as a characters string between double quotes, with comas separated values without white spaces. All the following parameters are ignored if vs is FALSE. PRSP: alpha = fraction to peel off at each step. Suggests 0.01. beta = minimum support size resulting from the peeling sequence. Suggests 0.05. minn = minimum number of observation that we want to be able to detect in a box. Suggests 5. L = maximum peeling length in [1,ceiling(log(1/n) / log(1 - (1/n)))]. See details below. Suggests ceiling(log(beta) / log(1 - alpha)/3). S = maximum variables screening size in [1,p] (i.e. the maximum cardinal subset of top-screened variables), that is used for the cross-validation of the optimal number of screened variables. Setting S to NULL will create a grid of min(S, floor(100 * S/p)) values in [1,floor(p/10)] that will be used to determine the optimal number of screened variables. Setting S to a single upper bound value in [1,p] will fix the number of screened variables up to it. Suggests S=floor(p/10) to reduce computational complexity. peelcriterion in {"lhr", "lrt", "chs"} standing for Log-Hazard Ratio (LHR), Log-Rank Test (LRT), and Cumulative Hazard Summary (CHS), respectively, specifying the peeling criterion used in the PRSP algorithm. Suggests "lrt". cvcriterion in {"lhr", "lrt", "cer"} standing for Log-Hazard Ratio (LHR), Log-Rank Test (LRT), and Concordance Error Rate (CER), respectively, specifying the cross-validation criterion used for tuning/optimizing the maximum variables screening size in the PRSP variable screening procedure. Suggests "cer". vscons = numeric scalar in [1/K, 1], specifying the conservativeness of the variable screening (pre-selection) procedure, where 1/K is the least conservative and 1 is the most. Suggests 0.5. PCQR: tau = quantile in [0, 0.5] used in the censored quantile regression model. It is the tuning parameter of the censored quantile loss. It represents the conditional censored quantile of the survival response to be estimated. It includes the absolute loss when tau=0.5. Suggests 0.5. alpha = elasticnet mixing parameter in [0, 1] that controls the relative contribution from the lasso and the ridge penalty. The penalty is defined as (1-alpha)/2||beta||_2^2+alpha||beta||_1. alpha = 1 is the lasso penalty, and alpha = 0 the ridge penalty. If alpha is set to NULL, a vector of values of length nalpha is used, else alpha value is used and nalpha is set to 1. Suggests alpha=1 (lasso). nalpha = number of elasticnet penalization alpha values to consider in the grid search. Suggests 1 (see above: lasso). nlambda = number of elasticnet penalization lambda values to consider in the grid search. Suggests 100. vscons = numeric scalar in [1/K, 1], specifying the conservativeness of the variable screening (pre-selection) procedure, where 1/K is the least conservative and 1 is the most. Suggests 0.5. PPL: alpha = elasticnet mixing parameter in [0, 1] that controls the relative contribution from the lasso and the ridge penalty. See R package glmnet. The penalty is defined as (1-alpha)/2||beta||_2^2+alpha||beta||_1. alpha = 1 is the lasso penalty, and alpha = 0 the ridge penalty. If alpha is set to NULL, a vector of values of length nalpha is used, else alpha value is used and nalpha is set to 1. Suggests alpha=1 (lasso). nalpha = number of elasticnet penalization alpha values to consider in the grid search. Suggests 1 (see above: lasso). nlambda = number of elasticnet penalization lambda values to consider in the grid search. Suggests 100. vscons = numeric scalar in [1/K, 1], specifying the conservativeness of the variable screening (pre-selection) procedure, where 1/K is the least conservative and 1 is the most. Suggests 0.5. SPCA: n.thres = number of thresholds to consider in the grid search. It cannot be less than n (sample size). Suggests 20. n.pcs = number of cross-validation principal components to use in {1,2,3}. It cannot be less than n (sample size) and more than p (dimensionality), and will be reset to n.pcs = p - 1 otherwise. Suggests 3. n.var = minimum number of variables to include in determining range for threshold. If cannot be more than p (dimensionality), and will be reset to n.var = p - 1 otherwise. Suggests 5. vscons = numeric scalar in [1/K, 1], specifying the conservativeness of the variable screening (pre-selection) procedure, where 1/K is the least conservative and 1 is the most. Suggests 0.5. cv logical scalar. Flag for optional cross-validation (CV) of parameters of variable screening (pre-selection) and variable usage (selection) by PRSP algorithm. Defaults to TRUE. If FALSE, no cross-validation at will be performed, the value of K will be overwritten to 1, and traditional log-rank Mantel-Haenszel p-values will be computed (using the Chi-Squared distribution with 1 df for the null distribution) instead of log-rank permutation p-values (using the permutation distribution for the null distribution). cvtype character vector in {"combined", "averaged"} specifying the cross-validation technique. Defaults to "combined". Reset to NA if cv is FALSE. cvarg character vector describing the parameters used in the PRSP algorithm of the Survival Bump Hunting function. Defaults to: cvarg="alpha=0.01,beta=0.05,minn=5,L=NULL,peelcriterion=\"lrt\",cvcriterion=\"cer\"". Note that cvarg comes as a characters string between double quotes, with comas separated values without white spaces. alpha = fraction to peel off at each step. Defaults to 0.01. beta = minimum support size resulting from the peeling sequence. Defaults to 0.05. minn = minimum number of observation that we want to be able to detect in a box. Defaults to 5. L = maximum peeling length in [1,ceiling(log(1/n) / log(1 - (1/n)))]. See details below. Defaults to NULL, that is, with automatic selection. peelcriterion in {"lhr", "lrt", "chs"} standing for Log-Hazard Ratio (LHR), Log-Rank Test (LRT), and Cumulative Hazard Summary (CHS), respectively, specifying the peeling criterion used in the PRSP algorithm. Defaults to "lrt". cvcriterion in {"lhr", "lrt", "cer"} standing for Log-Hazard Ratio (LHR), Log-Rank Test (LRT), and Concordance Error Rate (CER), respectively, specifying the cross-validation criterion used for tuning/optimizing the peeling sequence length (i.e. number of peeling steps) in the PRSP algorithm. Defaults to "cer". Ignored if cv is FALSE. pv logical scalar. Flag for computation of log-rank p-values. Defaults to FALSE. decimals Positive integer of the number of user-specified significant decimals to output results. Defaults to 2. onese logical scalar. Flag for using the 1-standard error rule instead of extremum value of the cross-validation criterion when tuning/optimizing model parameters. Defaults to FALSE. probval numeric scalar of the survival probability at which we want to get the endpoint box survival time. Defaults to NULL (i.e. maximal survival probability value). timeval numeric scalar of the survival time at which we want to get the endpoint box survival probability. Defaults to NULL (i.e. maximal survival time value). parallel.vs logical. Is parallelization to be performed for variable screening? Defaults to FALSE, because it is not implemented yet. parallel.rep logical. Is parallelization to be performed for replications? Defaults to FALSE. parallel.pv logical. Is parallelization to be performed for computation of log-rank p-values? Defaults to FALSE. conf list of 5 fields containing the parameters values needed for creating the parallel backend (cluster configuration). See details below for usage. Optional, defaults to NULL, but all fields are required if used: type : character vector specifying the cluster type ("SOCKET", "MPI"). spec : A specification (character vector or integer scalar) appropriate to the type of cluster. homogeneous : logical scalar to be set to FALSE for inhomogeneous clusters. verbose : logical scalar to be set to FALSE for quiet mode. outfile : character vector of an output log file name to direct the stdout and stderr connection output from the workernodes. "" indicates no redirection. verbose logical scalar. Is the output to be verbose? Optional, defaults to TRUE. seed Positive integer scalar of the user seed to reproduce all the results. Defaults to NULL.

## Details

At this point, the main function sbh relies on an optional variable screening (pre-selection) procedure that is run before the variable usage (selection) procedure is done by our PRSP algorithm. User can choose between four possible procedures:

• Patient Recursive Survival Peeling (PRSP) (by univariate screening of our algorithm)

• Penalized Censored Quantile Regression (PCQR) (by Semismooth Newton Coordinate Descent fiting algorithm adapted from package hqreg)

• Penalized Partial Likelihood (PPL) (by Elasticnet Regularization adapted from package glmnet)

• Supervised Principal Component Analysis (SPCA) (by Supervised Principal Component adapted from package superpc)

There is no default, but it is recommended to use PPL or SPCA for computational efficiency. Variable screening (pre-selection) is done by computing occurrence frequencies of top-ranking variables over the cross-validation folds and replicates. The conservativeness of the procedure is controled by the argument vscons. Example of calls for pre-selection are as follows:

• '1.0' represents a presence in all the folds (unanimity vote)

• '0.5' represents a presence in at least half of the folds (majority vote)

• '1/K' represents a presence in at least one of the folds (minority vote)

Although any value in the interval [1/K,1] is accepted, we recommand using the interval [1/K,1/2] to avoid excessive conservativeness. Final variable usage (selection) is done after running our PRSP algorithm on previously screened variables by collecting those variables that have the maximum occurrence frequency in each peeling step over cross-validation folds and replicates.

In the PRSP algorithm, the maximal number of peeling steps is determined either by alpha and beta metaparameters or the smallest possible fraction of the training data, i.e. \frac{1}{n}:

• ceiling(log(beta) / log(1 - alpha)) : alpha and beta are fixed by user

• ceiling(log(1/n) / log(1 - alpha)) : alpha is fixed by user and beta is fixed by data

• ceiling(log(beta) / log(1 - (1/n))) : alpha is fixed by data and beta is fixed by user

• ceiling(log(1/n) / log(1 - (1/n))) : alpha and beta are fixed by data

If L is not used to specify a fixed number of peeling steps (i.e. NULL), then beta and minn are used in the stopping rule instead.

If a cross-validation is requested, the function performs a supervised (stratified) random splitting of the data based on the outcome, which is in that case the delta argument. This is because it is desireable that the data splitting balances the class distributions of the outcome (events) within the cross-validation splits. For each screening method and for building (by PRSP algorithm) the final Survival Bump Hunting (SBH) model, all model tuning parameters are simultaneously estimated by cross-validation. The function offers a number of options for the cross-validation to be perfomed: the number of replications B; the type of technique; the peeling criterion; and the optimization criterion.

The returned S3-class sbh object contains cross-validated estimates of all the decision-rules of used (selected) covariates and all other statistical quantities of interest at each iteration of the peeling sequence (inner loop of the PRSP algorithm). This enables the graphical display of results of profiling curves for model tuning, peeling trajectories, covariate traces and survival distributions (see plotting functions for more details).

In case replicated cross-validations are performed, a "summary report" of the outputs is done over the B replicates as follows:

• Even thought the PRSP algorithm uses only one covariate at a time at each peeling step, the reported matrix of "Replicated CV" box decision rules may show more than one covariate being used in a given step, because these decision rules are averaged over the B replicates (see equation #21 in Dazard et al. 2016).

• However, the reported "Replicated CV" trace values are computed (at each peeling step) as a single modal trace value of covariate usage over the B replicates. This is also reflected in the reported "Replicated CV" importance and usage plots of covariate traces.

• The reported "Replicated CV" box membership indicators are computed (at each peeling step) as the point-wise majority vote over the B replicates (right-hand side of equation #22 in Dazard et al. 2016).

• The reported "Replicated CV" box support vector and corresponding box sample size are computed (at each peeling step) based on the above "Replicated CV" box membership indicators (i.e. not as equation #23 in Dazard et al. 2016).

If the computation of log-rank p-values is desired, then running with the parallelization option is strongly advised as it may take a while. In case of large (p > n) or very large (p >> n) datasets, it is also highly recommended to use the parallelization option.

The function sbh relies on the R package parallel to create a parallel backend within an R session, enabling access to a cluster of compute cores and/or nodes on a local and/or remote machine(s) and scaling-up with the number of CPU cores available and efficient parallel execution. To run a procedure in parallel (with parallel RNG), argument parallel is to be set to TRUE and argument conf is to be specified (i.e. non NULL). Argument conf uses the options described in function makeCluster of the R packages parallel and snow. PRIMsrc supports two types of communication mechanisms between master and worker processes: 'Socket' or 'Message-Passing Interface' ('MPI'). In PRIMsrc, parallel 'Socket' clusters use sockets communication mechanisms only (no forking) and are therefore available on all platforms, including Windows, while parallel 'MPI' clusters use high-speed interconnects mechanism in networks of computers (with distributed memory) and are therefore available only in these architectures. A parallel 'MPI' cluster also requires R package Rmpi to be installed. Value type is used to setup a cluster of type 'Socket' ("SOCKET") or 'MPI' ("MPI"), respectively. Depending on this type, values of spec are to be used alternatively:

• For 'Socket' clusters (conf$type="SOCKET"), spec should be a character vector naming the hosts on which to run the job; it can default to a unique local machine, in which case, one may use the unique host name "localhost". Each host name can potentially be repeated to the number of CPU cores available on the local machine. It can also be an integer scalar specifying the number of processes to spawn on the local machine; or a list of machine specifications if you have ssh installed (a character value named host specifying the name or address of the host to use). • For 'MPI' clusters (conf$type="MPI"), spec should be an integer scalar specifying the total number of processes to be spawned across the network of available nodes, counting the workernodes and masternode.

The actual creation of the cluster, its initialization, and closing are all done internally. For more details, see the reference manual of R package snow and examples below.

When random number generation is needed, the creation of separate streams of parallel RNG per node is done internally by distributing the stream states to the nodes. For more details, see the vignette of R package parallel. The use of a seed allows to reproduce the results within the same type of session: the same seed will reproduce the same results within a non-parallel session or within a parallel session, but it will not necessarily give the exact same results (up to sampling variability) between a non-parallelized and parallelized session due to the difference of management of the seed between the two (see parallel RNG and value of returned seed below).

## Value

Object of class sbh (Patient Recursive Survival Peeling) list containing the following 21 fields:

 X numeric matrix of original dataset. y numeric vector of observed failure / survival times. delta numeric vector of observed event indicator in {1,0}. B positive integer of the number of replications used in the cross-validation procedure. K positive integer of the number of folds used in the cross-validation procedure. A positive integer of the number of permutations used for the computation of log-rank p-values. vs logical scalar of returned flag of optional variable pre-selection. vstype character vector of the optional variable pre-selection procdure used. vsarg character vector of the parameters used in the pre-selection procedure. cv logical scalar of returned flag of optional cross-validation. cvtype character vector of the cross-validation technique used. cvarg character vector of the parameters used in the Survival Bump Hunting procedure. pv logical scalar of returned flag of optional computation of log-rank p-values. onese logical scalar of returned flag of 1-standard error rule. decimals integer of the number of user-specified significant decimals. probval Numeric scalar of survival probability used. timeval Numeric scalar of survival time used. cvprofiles list of 10 fields of cross-validated tuning profiles and estimates, each of length B (one for each replicate): cv.varprofiles: numeric matrix of cross-validation criterion used for tuning/optimizing the variable screening size in the PRSP variable screening (pre-selection) procedure (NULL otherwise). Values are by columns (peeling steps) and replicates (rows). cv.varprofiles.mean: numeric vector of means (across replicates) of the above cross-validation criterion by peeling steps. cv.varprofiles.se: numeric vector of standard errors (across replicates) of the above cross-validation criterion by peeling steps. cv.varset.opt: numeric scalar of optimal variable screening size according to the extremum. cv.varset.1se: numeric scalar of optimal variable screening size according to 1SE rule. cv.stepprofiles: numeric matrix of cross-validation criterion used for tuning/optimizing the peeling sequence length (i.e. number of peeling steps) in the PRSP algorithm. Values are by columns (peeling steps) and replicates (rows). cv.stepprofiles.mean: numeric vector of means (across replicates) of the above cross-validation criterion by peeling steps. cv.stepprofiles.se: numeric vector of standard errors (across replicates) of the above cross-validation criterion by peeling steps. cv.nsteps.opt: numeric scalar of optimal number of peeling steps according to the extremum. cv.nsteps.1se: numeric scalar of optimal number of peeling steps according to 1SE rule. cvfit list with 12 fields of cross-validated SBH output estimates, each of length B (one for each replicate): cv.maxsteps: numeric scalar of maximal number of peeling steps over the replicates. cv.nsteps: numeric scalar of optimal number of peeling steps according to the optimization criterion. cv.boxind: logical matrix in TRUE, FALSE of individual observation box membership indicator (columns) for all peeling steps (rows). cv.boxind.size: numeric vector of box sample size for all peeling steps. cv.boxind.support: numeric vector of box support for all peeling steps. cv.rules: data.frame of decision rules on the covariates (columns) for all peeling steps (rows). cv.screened: numeric vector of screened (pre-selected) covariates, indexed in reference to original index. cv.trace: numeric vector of the modal trace values of covariate usage for all peeling steps. cv.sign: numeric vector in {-1,+1} of directions of peeling for all used (selected) covariates. cv.used: numeric vector of covariates used (selected) for peeling, indexed in reference to original index. cv.stats: numeric matrix of box endpoint quantities of interest (columns) for all peeling steps (rows). cv.pval: list with 2 fields of two vectors. The first cvfit$pval is a numeric vector for log-rank p-values of separation of survival distributions, The second cvfit$seed is is an integer scalar if parallelization is used, or an integer vector of A values, one for each permutation, if parallelization is not used. success logical scalar of the returned flag of success at fitting the SBH model. seed User seed. An integer scalar if parallelization is used, or an integer vector of B values, one for each replication, if parallelization is not used.

## Acknowledgments

This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. This project was partially funded by the National Institutes of Health NIH - National Cancer Institute (R01-CA160593) to J-E. Dazard and J.S. Rao.

## Note

Unique end-user function for fitting the Survival Bump Hunting model.

## Author(s)

Maintainer: "Jean-Eudes Dazard, Ph.D." jean-eudes.dazard@case.edu

## References

• Dazard J-E. and Rao J.S. (2017). "Variable Selection Strategies for High-Dimensional Survival Bump Hunting using Recursive Peeling Methods." (in prep).

• Diaz-Pachon D.A., Dazard J-E. and Rao J.S. (2017). "Unsupervised Bump Hunting Using Principal Components." In: Ahmed SE, editor. Big and Complex Data Analysis: Methodologies and Applications. Contributions to Statistics, vol. Edited Refereed Volume. Springer International Publishing, Cham Switzerland, p. 325-345.

• Yi C. and Huang J. (2016). "Semismooth Newton Coordinate Descent Algorithm for Elastic-Net Penalized Huber Loss Regression and Quantile Regression." J. Comp Graph. Statistics, DOI: 10.1080/10618600.2016.1256816.

• Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2016). "Cross-validation and Peeling Strategies for Survival Bump Hunting using Recursive Peeling Methods." Statistical Analysis and Data Mining, 9(1):12-42.

• Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2015). "R package PRIMsrc: Bump Hunting by Patient Rule Induction Method for Survival, Regression and Classification." In JSM Proceedings, Statistical Programmers and Analysts Section. Seattle, WA, USA. American Statistical Association IMS - JSM, p. 650-664.

• Dazard J-E., Choe M., LeBlanc M. and Rao J.S. (2014). "Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods." In JSM Proceedings, Survival Methods for Risk Estimation/Prediction Section. Boston, MA, USA. American Statistical Association IMS - JSM, p. 3366-3380.

• Dazard J-E. and J.S. Rao (2010). "Local Sparse Bump Hunting." J. Comp Graph. Statistics, 19(4):900-92.

• makeCluster (R package parallel)

• glmnet, cv.glmnet (R package glmnet)

• hqreg, cv.hqreg (R package hqreg)

• superpc.cv (R package superpc)

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 #=================================================== # Loading the library and its dependencies #=================================================== library("PRIMsrc") ## Not run: #=================================================== # PRIMsrc Package news #=================================================== PRIMsrc.news() #=================================================== # PRIMsrc Package citation #=================================================== citation("PRIMsrc") #=================================================== # Demo with a synthetic dataset # Use help for descriptions #=================================================== data("Synthetic.1", package="PRIMsrc") ?Synthetic.1 ## End(Not run) #=================================================== # Simulated dataset #1 (n=250, p=3) # Peeling criterion = LRT # Cross-Validation criterion = LRT # With Combined Cross-Validation (RCCV) # Without Replications (B = 1) # Without variable screening (pre-selection) # Without computation of log-rank \eqn{p}-values # Without parallelization #=================================================== synt1 <- sbh(X = Synthetic.1[ , -c(1,2), drop=FALSE], y = Synthetic.1[ ,1, drop=TRUE], delta = Synthetic.1[ ,2, drop=TRUE], B = 1, K = 3, vs = FALSE, cv = TRUE, cvtype = "combined", cvarg = "alpha=0.10, beta=0.05, minn=5, L=NULL, peelcriterion=\"lrt\", cvcriterion=\"lrt\"", pv = FALSE, decimals = 2, onese = FALSE, probval = 0.5, timeval = NULL, parallel.vs = FALSE, parallel.rep = FALSE, parallel.pv = FALSE, conf = NULL, verbose = FALSE, seed = 123) summary(object = synt1) print(x = synt1) n <- 100 p <- length(synt1$cvfit$cv.used) x <- matrix(data = runif(n = n*p, min = 0, max = 1), nrow = n, ncol = p, byrow = FALSE, dimnames=list(1:n, paste("X", 1:p, sep=""))) synt1.pred <- predict(object = synt1, newdata = x, steps = synt1$cvfit$cv.nsteps) plot(x = synt1, main = paste("Scatter plot for model #1", sep=""), proj = c(1,2), splom = TRUE, boxes = TRUE, steps = synt1$cvfit$cv.nsteps, pch = 16, cex = 0.5, col = 2, col.box = 2, lty.box = 2, lwd.box = 1, add.legend = TRUE, device = NULL) plot_profile(object = synt1, main = "Cross-validated tuning profiles for model #1", pch=20, col=1, lty=1, lwd=0.5, cex=0.5, add.sd = TRUE, add.legend = TRUE, add.profiles = TRUE, device = NULL, file = "Profile Plot", path=getwd(), horizontal = FALSE, width = 8.5, height = 5.0) plot_boxtraj(object = synt1, main = paste("Cross-validated peeling trajectories for model #1", sep=""), col=1, lty=1, lwd=0.5, cex=0.5, toplot = synt1$cvfit$cv.used, device = NULL, file = "Trajectory Plots", path=getwd(), horizontal = FALSE, width = 8.5, height = 8.5) plot_boxtrace(object = synt1, main = paste("Cross-validated trace plots for model #1", sep=""), xlab = "Box Mass", ylab = "Covariate Range (centered)", col=1, lty=1, lwd=0.5, cex=0.5, toplot = synt1$cvfit$cv.used, center = TRUE, scale = FALSE, device = NULL, file = "Covariate Trace Plots", path=getwd(), horizontal = FALSE, width = 8.5, height = 8.5) plot_boxkm(object = synt1, main = paste("Cross-validated probability curves for model #1", sep=""), xlab = "Time", ylab = "Probability", col=2, lty=1, lwd=0.5, cex=0.5, device = NULL, file = "Survival Plots", path=getwd(), horizontal = TRUE, width = 11.5, height = 8.5) ## Not run: #=================================================== # Examples of parallel backend parametrization #=================================================== if (require("parallel")) { print("'parallel' is attached correctly \n") } else { stop("'parallel' must be attached first \n") } #=================================================== # Example #1 - Quad core PC # Running WINDOWS with SOCKET communication #=================================================== cpus <- parallel::detectCores(logical = TRUE) conf <- list("spec" = rep("localhost", cpus), "type" = "SOCKET", "homo" = TRUE, "verbose" = TRUE, "outfile" = "") #=================================================== # Example #2 - Master node + 3 Worker nodes cluster # Running LINUX with SOCKET communication # All nodes equipped with identical setups of # multicores (8 core CPUs per machine for a total of 32) #=================================================== masterhost <- Sys.getenv("HOSTNAME") slavehosts <- c("compute-0-0", "compute-0-1", "compute-0-2") nodes <- length(slavehosts) + 1 cpus <- 8 conf <- list("spec" = c(rep(masterhost, cpus), rep(slavehosts, cpus)), "type" = "SOCKET", "homo" = TRUE, "verbose" = TRUE, "outfile" = "") #=================================================== # Example #3 - Multinode of multicore per node cluster # Running LINUX with SLURM scheduler and MPI communication # Below, variable 'cpus' is the total number # of requested core CPUs, which is specified from # within a SLURM script. #=================================================== if (require("Rmpi")) { print("'Rmpi' is attached correctly \n") } else { stop("'Rmpi' must be attached first \n") } cpus <- as.numeric(Sys.getenv("SLURM_NTASKS")) conf <- list("spec" = cpus, "type" = "MPI", "homo" = TRUE, "verbose" = TRUE, "outfile" = "") #=================================================== # Simulated dataset #1 (n=250, p=3) # Peeling criterion = LRT # Cross-Validation criterion = LRT # With Combined Cross-Validation (RCCV) # With Replications (B = 30) # With variable screening (pre-selection) (PPL) # With computation of log-rank \eqn{p}-values # With parallelization #=================================================== synt1 <- sbh(X = Synthetic.1[ , -c(1,2), drop=FALSE], y = Synthetic.1[ ,1, drop=TRUE], delta = Synthetic.1[ ,2, drop=TRUE], B = 30, K = 5, A = 1000, vs = TRUE, vstype = "ppl", vsarg = "alpha=1, nalpha=1, nlambda=100, vscons=0.5", cv = TRUE, cvtype = "combined", cvarg = "alpha=0.01, beta=0.05, minn=5, L=NULL, peelcriterion=\"lrt\", cvcriterion=\"lrt\"", pv = TRUE, decimals = 2, onese = FALSE, probval = 0.5, timeval = NULL, parallel.vs = FALSE, parallel.rep = TRUE, parallel.pv = TRUE, conf = conf, verbose = TRUE, seed = 123) ## End(Not run) 

PRIMsrc documentation built on July 19, 2017, 1:02 a.m.

Search within the PRIMsrc package
Search all R packages, documentation and source code