Description Usage Arguments Details Value Author(s) References See Also Examples
This function will run multiPIM
once on the actual data, then sample with replacement from the rows of the data and run multiPIM
again (with the same options) the desired number of times
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | multiPIMboot(Y, A, W = NULL,
times = 5000,
id = 1:nrow(Y),
multicore = FALSE,
mc.num.jobs,
mc.seed = 123,
estimator = c("TMLE", "DR-IPCW", "IPCW", "G-COMP"),
g.method = "main.terms.logistic", g.sl.cands = NULL,
g.num.folds = NULL, g.num.splits = NULL,
Q.method = "sl", Q.sl.cands = "default",
Q.num.folds = 5, Q.num.splits = 1,
Q.type = NULL,
adjust.for.other.As = TRUE,
truncate = 0.05,
return.final.models = TRUE,
na.action,
verbose = FALSE,
extra.cands = NULL,
standardize = TRUE,
...)
|
Y |
a data frame of outcomes containing only numeric (integer or double) values. See details section of |
A |
a data frame containing binary exposure variables. Binary means that all values must be either 0 (indicating unexposed, or part of target group) or 1 (indicating exposed or not part of target group). Must have unique names. |
W |
an optional data frame containing possible confounders of the effects of the variables in |
times |
single integer greater than or equal to 2. The number of bootstrap replicates of |
id |
vector which identifies clusters. If obervations i and j are in the same cluster, then |
multicore |
logical value indicting whether bootstrapping should be done using multiple simultaneous jobs (as of multiPIM version 1.3-1 this requires the parallel package, which is distributed with R version 2.14.0 or later. For earlier versions of multiPIM, this feature relied on CRAN packages multicore and rlecuyer. |
mc.num.jobs |
number of simultaneous multicore jobs, e.g. if you want to use a quad core processor with hyperthreading, use |
mc.seed |
integer value with which to seed the RNG when using parallel processing (internally, |
estimator |
the estimator to be used. The default is |
g.method |
a length one character vector indicating the regression method to use in modelling g. The default value, |
g.sl.cands |
character vector of length >= 2 indicating the candidate algorithms that the super learner fits for g should use. The possible values may be taken from the vector |
g.num.folds |
the number of folds to use in cross-validating the super learner fit for g (i.e. the v for v-fold cross-validation). Ignored if |
g.num.splits |
the number of times to randomly split the data into |
Q.method |
character vector of length 1. The regression method to use in modelling Q. See details to find out which values are allowed. The default value, |
Q.sl.cands |
either of the length 1 character values |
Q.num.folds |
the number of folds to use in cross-validating the super learner fit for Q (i.e. the v for v-fold cross-validation). Ignored if |
Q.num.splits |
the number of times to randomly split the data into |
Q.type |
either |
adjust.for.other.As |
a single logical value indicating whether the other columns of |
truncate |
either |
return.final.models |
single logical value indicating whether final g and Q models should be returned by the function (in the slots |
na.action |
currently ignored. If any of |
verbose |
single logical value. Should messages about the progress of the evaluation be printed out. Some of the candidate algorithms may print messages even when |
extra.cands |
a named list of functions. This argument provides a way for the user to specify his or her own functions to use either as stand-alone regression methods, or as candidates for a super learner. See details section of |
standardize |
should all predictor variables be standardized before certain regression methods are run. Passed to all candidates, but only used by some (at this point, lars, penalized.bin and penalized.cont) |
... |
currently ignored. |
Bootstrap standard errors can be calculated by running the summary
function on the multiPIMboot
result (see link{summary.multiPIM}
).
As of multiPIM version 1.3-1, support for multicore processing is through R's parallel package (distributed with R as of version 2.14.0).
For more details on how to use the arguments, see the details section for multiPIM
.
Returns an object of class "multiPIM"
which is identical to the object resulting from running the multiPIM
function in the original data, except for two slots which are slightly different: the call
slot contains a copy of the original call to multiPIMboot
, and the boot.param.array
slot now contains the bootstrap distribution of the parameter estimates gotten by running multiPIM
on the bootstrap replicates of the original data. Thus the object returned has the following slots:
param.estimates |
a matrix of dimensions |
plug.in.stand.errs |
a matrix with the same dimensions as |
call |
a copy of the call to |
num.exposures |
this will be set to |
num.outcomes |
this will be set to |
W.names |
the names attribute of the |
estimator |
the estimator used. |
g.method |
the method used for modelling g. |
g.sl.cands |
in case super learning was used for g, the candidates used in the super learner. Will be |
g.winning.cands |
if super learning was used for g, this will be a named character vector with |
g.cv.risk.array |
array with dim attribute |
g.final.models |
a list of length |
g.num.folds |
the number of folds used for cross validation in the super learner for g. Will be |
g.num.splits |
the number of splits used for cross validation in the super learner for g. Will be |
Q.method |
the method used for modeling Q. Will be |
Q.sl.cands |
in case super learning was used for Q, the candidates used in the super learner. Will be |
Q.winning.cands |
if super learning was used for Q, this will be a named character vector with |
Q.cv.risk.array |
array with dim attribute |
Q.final.models |
a list of length |
Q.num.folds |
the number of folds used for cross validation in the super learner for Q. Will be |
Q.num.splits |
the number of splits used for cross validation in the super learner for Q. Will be |
Q.type |
either |
adjust.for.other.As |
logical value indicating whether the other columns of |
truncate |
the value of the |
truncation.occured |
logical value indicating whether it was necessary to trunctate. |
standardize |
the value of the |
boot.param.array |
a three dimensional array with |
main.time |
time (in seconds) taken for main run of multiPIM on the original data. |
g.time |
time in seconds taken for running g models. |
Q.time |
time in seconds taken for running Q models. |
g.sl.time |
if g.method is "sl", time in seconds taken for running cross-validation of g models. |
Q.sl.time |
if Q.method is "sl", time in seconds taken for running cross-validation of Q models. |
g.sl.cand.times |
if g.method is "sl", named vector containing time taken, with each element corresponding to a super learner candidate for g. |
Q.sl.cand.times |
if Q.method is "sl", named vector containing time taken, with each element corresponding to a super learner candidate for Q. |
Note that all timing results apply only to the first run of codelinkmultiPIM on the original data, not the subsequent bootstrap runs.
Stephan Ritter, with design contributions from Alan Hubbard and Nicholas Jewell.
Ritter, Stephan J., Jewell, Nicholas P. and Hubbard, Alan E. (2014) “R Package multiPIM: A Causal Inference Approach to Variable Importance Analysis” Journal of Statistical Software 57, 8: 1–29. http://www.jstatsoft.org/v57/i08/.
Hubbard, Alan E. and van der Laan, Mark J. (2008) “Population Intervention Models in Causal Inference.” Biometrika 95, 1: 35–47.
Young, Jessica G., Hubbard, Alan E., Eskenazi, Brenda, and Jewell, Nicholas P. (2009) “A Machine-Learning Algorithm for Estimating and Ranking the Impact of Environmental Risk Factors in Exploratory Epidemiological Studies.” U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 250. http://www.bepress.com/ucbbiostat/paper250
van der Laan, Mark J. and Rose, Sherri (2011) Targeted Learning, Springer, New York. ISBN: 978-1441997814
Sinisi, Sandra E., Polley, Eric C., Petersen, Maya L, Rhee, Soo-Yon and van der Laan, Mark J. (2007) “Super learning: An Application to the Prediction of HIV-1 Drug Resistance.” Statistical Applications in Genetics and Molecular Biology 6, 1: article 7. http://www.bepress.com/sagmb/vol6/iss1/art7
van der Laan, Mark J., Polley, Eric C. and Hubbard, Alan E. (2007) “Super learner.” Statistical applications in genetics and molecular biology 6, 1: article 25. http://www.bepress.com/sagmb/vol6/iss1/art25
multiPIM
for the main function which is called by multiPIMboot
.
summary.multiPIM
for printing summaries of the results.
Candidates
to see which candidates are currently available, and for information on writing user-defined super learner candidates and regression methods.
1 2 3 4 5 6 7 8 9 10 11 12 | ## Warning: This would take a very long time to run!
## Not run:
## load example from multiPIM help file
example(multiPIM)
## this would run 5000 bootstrap replicates:
boot.result <- multiPIMboot(Y, A)
summary(boot.result)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.