Description Usage Arguments Details Value Acknowledgments Author(s) References See Also Examples
Ranking of pairwise interactions between individual or noise variables by bivariate interaction Minimal Depth of a Maximal Subtree (IMDMS)
1 2 3 4 5 6 7 8 9 10 11 |
X |
|
ntree |
Number of trees in the forest. Defaults to 1000. |
method |
Method for ranking of interactions between pairs of individual and noise variables.
|
splitrule |
Splitting rule used to grow trees. For time-to-event analysis, use |
importance |
Method for computing variable importance. Defaults to Character string |
B |
Postitive |
ci |
Confidence Interval for inferences of individual and noise variables.
|
parallel |
|
conf |
|
verbose |
|
seed |
Positive |
The option importance
allows several ways to calculate Variable Importance (VIMP).
The default "permute"
returns Breiman-Cutler permutation VIMP as described in Breiman (2001).
For each tree, the prediction error on the out-of-bag (OOB) data is recorded.
Then for a given variable x
, OOB cases are randomly permuted in x
and the prediction error is recorded.
The VIMP for x
is defined as the difference between the perturbed and unperturbed error rate,
averaged over all trees. If "random"
is used, then x
is not permuted,
but rather an OOB case is assigned a daughter node randomly whenever a split on x
is encountered in the in-bag tree.
If "anti"
is used, then x
is assigned to the opposite node whenever a split on x
is encountered in the in-bag tree.
The function rsf.int
relies on the R package parallel to create a parallel backend within an R session, enabling access to a cluster
of compute cores and/or nodes on a local and/or remote machine(s) and scaling-up with the number of CPU cores available and efficient parallel
execution. To run a procedure in parallel (with parallel RNG), argument parallel
is to be set to TRUE
and argument conf
is to be specified (i.e. non NULL
). Argument conf
uses the options described in function makeCluster
of the R packages
parallel and snow. IRSF supports two types of communication mechanisms between master and worker processes:
'Socket' or 'Message-Passing Interface' ('MPI'). In IRSF, parallel 'Socket' clusters use sockets communication mechanisms only
(no forking) and are therefore available on all platforms, including Windows, while parallel 'MPI' clusters use high-speed interconnects
mechanism in networks of computers (with distributed memory) and are therefore available only in these architectures. A parallel 'MPI'
cluster also requires R package Rmpi to be installed. Value type
is used to setup a cluster of type 'Socket' ("SOCKET")
or 'MPI' ("MPI"), respectively. Depending on this type, values of spec
are to be used alternatively:
For 'Socket' clusters (conf$type="SOCKET"
), spec
should be a character
vector
naming the hosts on which
to run the job; it can default to a unique local machine, in which case, one may use the unique host name "localhost".
Each host name can potentially be repeated to the number of CPU cores available on the local machine.
It can also be an integer
scalar specifying the number of processes to spawn on the local machine;
or a list of machine specifications if you have ssh installed (a character value named host specifying the name or address of the host to use).
For 'MPI' clusters (conf$type="MPI"
), spec
should be an integer
scalar
specifying the total number of processes to be spawned across the network of available nodes, counting the workernodes and masternode.
The actual creation of the cluster, its initialization, and closing are all done internally. For more details, see the reference manual of R package snow and examples below.
When random number generation is needed, the creation of separate streams of parallel RNG per node is done internally by distributing the stream states to the nodes. For more details, see the vignette of R package parallel. The use of a seed allows to reproduce the results within the same type of session: the same seed will reproduce the same results within a non-parallel session or within a parallel session, but it will not necessarily give the exact same results (up to sampling variability) between a non-parallelized and parallelized session due to the difference of management of the seed between the two (see parallel RNG and value of returned seed below).
data.frame
containing the following columns:
"obs.mean" observed mean of covariates pairwise interaction statistics
"obs.se" observed standard error of covariates pairwise interaction statistics
"obs.LBCI" observed Lower Bound Confidence Interval of covariates pairwise interaction ranstatisticsks
"obs.UBCI" observed Upper Bound Confidence Interval of covariates pairwise interaction statistics
"noise.mean" observed mean of noise covariates statistics
"noise.se" observed standard error of noise covariates pairwise interaction statistics
"noise.LBCI" observed Lower Bound Confidence Interval of noise covariates pairwise interaction statistics
"noise.UBCI" observed Upper Bound Confidence Interval of noise covariates pairwise interaction statistics
"signif.1SE" calls of covariates pairwise interaction statistics significance using the 1SE rule
"signif.CI" calls of covariates pairwise interaction statistics significance using the CI rule at ci
% confidence level
This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. We are thankful to Ms. Janet Schollenberger, Senior Project Coordinator, CAMACS, as well as Dr. Jeremy J. Martinson, Sudhir Penugonda, Shehnaz K. Hussain, Jay H. Bream, and Priya Duggal, for providing us the data related to the samples analyzed in the present study. Data in this manuscript were collected by the Multicenter AIDS Cohort Study (MACS) at (https://www.statepi.jhsph.edu/macs/macs.html) with centers at Baltimore, Chicago, Los Angeles, Pittsburgh, and the Data Coordinating Center: The Johns Hopkins University Bloomberg School of Public Health. The MACS is funded primarily by the National Institute of Allergy and Infectious Diseases (NIAID), with additional co-funding from the National Cancer Institute (NCI), the National Heart, Lung, and Blood Institute (NHLBI), and the National Institute on Deafness and Communication Disorders (NIDCD). MACS data collection is also supported by Johns Hopkins University CTSA. This study was supported by two grants from the National Institute of Health: NIDCR P01DE019759 (Aaron Weinberg, Peter Zimmerman, Richard J. Jurevic, Mark Chance) and NCI R01CA163739 (Hemant Ishwaran). The work was also partly supported by the National Science Foundation grant DMS 1148991 (Hemant Ishwaran) and the Center for AIDS Research grant P30AI036219 (Mark Chance).
Jean-Eudes Dazard <jean-eudes.dazard@case.edu>
Maintainer: Jean-Eudes Dazard <jean-eudes.dazard@case.edu>
Dazard J-E., Ishwaran H., Mehlotra R.K., Weinberg A. and Zimmerman P.A. (2018). "Ensemble Survival Tree Models to Reveal Pairwise Interactions of Variables with Time-to-Events Outcomes in Low-Dimensional Setting" Statistical Applications in Genetics and Molecular Biology, 17(1):20170038.
Ishwaran, H. and Kogalur, U.B. (2007). "Random Survival Forests for R". R News, 7(2):25-31.
Ishwaran, H. and Kogalur, U.B. (2013). "Contributed R Package randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC)" CRAN.
R package randomForestSRC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | #===================================================
# Loading the library and its dependencies
#===================================================
library("IRSF")
## Not run:
#===================================================
# IRSF package news
#===================================================
IRSF.news()
#================================================
# MVR package citation
#================================================
citation("IRSF")
#===================================================
# Loading of the Synthetic and Real datasets
# Use help for descriptions
#===================================================
data("MACS", package="IRSF")
?MACS
head(MACS)
#===================================================
# Synthetic dataset
# Continuous case:
# All variables xj, j in {1,...,p}, are iid
# from a multivariate uniform distribution
# with parmeters a=1, b=5, i.e. on [1, 5].
# rho = 0.50
# Regression model: X1 + X2 + X1X2
#===================================================
seed <- 1234567
set.seed(seed)
n <- 200
p <- 5
x <- matrix(data=runif(n=n*p, min=1, max=5),
nrow=n, ncol=p, byrow=FALSE,
dimnames=list(1:n, paste("X", 1:p, sep="")))
beta <- c(rep(1,2), rep(0,p-2), 1)
covar <- cbind(x, "X1X2"=x[,1]*x[,2])
eta <- covar %*% beta # regression function
seed <- 1234567
set.seed(seed)
lambda0 <- 1
lambda <- lambda0 * exp(eta - mean(eta)) # hazards function
tt <- rexp(n=n, rate=lambda) # true (uncensored) event times
tc <- runif(n=n, min=0, max=3.9) # true (censored) event times
stime <- pmin(tt, tc) # observed event times
status <- 1 * (tt <= tc) # observed event indicator
X <- data.frame(stime, status, x)
#===================================================
# Synthetic dataset
# Ranking of pairwise interactions between individual
# or noise variables by bivariate
# Interaction Minimal Depth of a Maximal Subtree (IMDMS)
# Serial mode
#===================================================
X.int.mdms <- rsf.int(X=X,
ntree=1000,
method="imdms",
splitrule="logrank",
importance="random",
B=1000,
ci=90,
parallel=FALSE,
conf=NULL,
verbose=FALSE,
seed=seed)
#===================================================
# Examples of parallel backend parametrization
#===================================================
if (require("parallel")) {
cat("'parallel' is attached correctly \n")
} else {
stop("'parallel' must be attached first \n")
}
#===================================================
# Ex. #1 - Multicore PC
# Running WINDOWS
# SOCKET communication cluster
# Shared memory parallelization
#===================================================
cpus <- parallel::detectCores(logical = TRUE)
conf <- list("spec" = rep("localhost", cpus),
"type" = "SOCKET",
"homo" = TRUE,
"verbose" = TRUE,
"outfile" = "")
#===================================================
# Ex. #2 - Master node + 3 Worker nodes cluster
# All nodes equipped with identical setups of multicores
# (8 core CPUs per machine for a total of 32)
# SOCKET communication cluster
# Distributed memory parallelization
#===================================================
masterhost <- Sys.getenv("HOSTNAME")
slavehosts <- c("compute-0-0", "compute-0-1", "compute-0-2")
nodes <- length(slavehosts) + 1
cpus <- 8
conf <- list("spec" = c(rep(masterhost, cpus),
rep(slavehosts, cpus)),
"type" = "SOCKET",
"homo" = TRUE,
"verbose" = TRUE,
"outfile" = "")
#===================================================
# Ex. #3 - Enterprise Multinode Cluster w/ multicore/node
# Running LINUX with SLURM scheduler
# MPI communication cluster
# Distributed memory parallelization
# Below, variable 'cpus' is the total number of requested
# taks (threads/CPUs), which is specified from within a
# SLURM script.
#==================================================
if (require("Rmpi")) {
print("Rmpi is loaded correctly \n")
} else {
stop("Rmpi must be installed first to use MPI\n")
}
cpus <- as.numeric(Sys.getenv("SLURM_NTASKS"))
conf <- list("spec" = cpus,
"type" = "MPI",
"homo" = TRUE,
"verbose" = TRUE,
"outfile" = "")
#===================================================
# Real dataset
#===================================================
seed <- 1234567
data("MACS", package="IRSF")
X <- MACS[,c("TTX","EventX","Race","Group3",
"DEFB.CNV3","CCR2.SNP","CCR5.SNP2",
"CCR5.ORF","CXCL12.SNP2")]
#===================================================
# Real dataset
# Ranking of pairwise interactions between individual
# or noise variables by bivariate
# Interaction Minimal Depth of a Maximal Subtree (IMDMS)
# Entries [i][j] indicate the normalized minimal depth
# of a variable [j] w.r.t. the maximal subtree for variable [i]
# (normalized w.r.t. the size of [i]'s maximal subtree).
#===================================================
MACS.int.mdms <- rsf.int(X=X,
ntree=1000,
method="imdms",
splitrule="logrank",
importance="random",
B=1000,
ci=80,
parallel=TRUE,
conf=conf,
verbose=TRUE,
seed=seed)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.