Description Usage Arguments Details Value Author(s) References See Also Examples
The parameter of interest is a type of causal attributable risk. One effect measure (and a corresponding plug-in standard error) will be calculated for each exposure-outcome pair. The default is to use a Targeted Maximum Likelihood Estimator (TMLE). The other available estimators are Inverse Probability of Censoring Weighted (IPCW), Double-Robust IPCW (DR-IPCW), and Graphical Computation (G-COMP) estimators. PIM stands for Population Intervention Model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | multiPIM(Y, A, W = NULL,
estimator = c("TMLE", "DR-IPCW", "IPCW", "G-COMP"),
g.method = "main.terms.logistic", g.sl.cands = NULL,
g.num.folds = NULL, g.num.splits = NULL,
Q.method = "sl", Q.sl.cands = "default",
Q.num.folds = 5, Q.num.splits = 1,
Q.type = NULL,
adjust.for.other.As = TRUE,
truncate = 0.05,
return.final.models = TRUE,
na.action,
check.input = TRUE,
verbose = FALSE,
extra.cands = NULL,
standardize = TRUE,
...)
|
Y |
a data frame of outcomes containing only numeric (integer or double) values. See details for the default method of determining, based on the values in |
A |
a data frame containing binary exposure variables. Binary means that all values must be either 0 (indicating unexposed, or part of target group) or 1 (indicating exposed or not part of target group). Must have unique names. |
W |
an optional data frame containing possible confounders of the effects of the variables in |
estimator |
the estimator to be used. The default is |
g.method |
a length one character vector indicating the regression method to use in modelling g. The default value, |
g.sl.cands |
character vector of length >= 2 indicating the candidate algorithms that the super learner fits for g should use. The possible values may be taken from the vector |
g.num.folds |
the number of folds to use in cross-validating the super learner fit for g (i.e. the v for v-fold cross-validation). Ignored if |
g.num.splits |
the number of times to randomly split the data into |
Q.method |
character vector of length 1. The regression method to use in modelling Q. See details to find out which values are allowed. The default value, |
Q.sl.cands |
either of the length 1 character values |
Q.num.folds |
the number of folds to use in cross-validating the super learner fit for Q (i.e. the v for v-fold cross-validation). Ignored if |
Q.num.splits |
the number of times to randomly split the data into |
Q.type |
either |
adjust.for.other.As |
a single logical value indicating whether the other columns of |
truncate |
either |
return.final.models |
single logical value indicating whether final g and Q models should be returned by the function (in the slots |
na.action |
currently ignored. If any of |
check.input |
a single logical value indicating whether all of the input to the function should be subjected to strict error checking. |
verbose |
a single logical value indicating whether messages about the progress of the evaluation should be printed out. Some of the candidate algorithms may print messages even when |
extra.cands |
a named list of functions. This argument provides a way for the user to specify his or her own functions to use either as stand-alone regression methods, or as candidates for a super learner. See details. |
standardize |
should all predictor variables be standardized before certain regression methods are run. Passed to all candidates, but only used by some (at this point, lars, penalized.bin and penalized.cont) |
... |
currently ignored. |
The parameter of interest is a type of attributable risk. This means
that it is a measure (adjusted for known confounders) of the
difference between the mean value of Y
for the units in
the target (or unexposed) group and the overall mean value of
Y
. Units which are in the target (or unexposed) group with
respect to one of the variables in A
are characterized as such by
having the value 0 in the respective column of A
. Members of the
the non-target (or exposed) group should have a 1 in that column of
A
. Assuming all causal assumptions hold (see the paper), each
parameter estimate can be thought of as estimating the hypothetical
effect on the respective outcome of totally eliminating the respective
exposure from the population (i.e. setting everyone to 0 for that
exposure). For example, in the case of a binary outcome, a parameter
estimate for exposure x and outcome y of -0.03 could be interpreted
as follows: the effect of an intervention in which the entire population
was set to exposure x = 0 would be to reduce the level of outcome y
by 3 percentage points.
If check.input
is TRUE
(which is the default and is highly
recommended), all of the arguments will be checked to make sure they
have permissible values. Many of the arguments, especially those for
which a single logical value (TRUE
or FALSE
) or a single
character value (such as, for example, "all"
) is expected, are
checked using the identical
function, which means that if any of
these arguments has any extraneous attributes (such as names), this may
cause multiPIM
to throw an error.
On the other hand, the arguments Y
and A
(and W
if
it is non-null) must have valid names attributes. multiPIM
will throw an error if there is any overlap between the names of the
columns of these data frames, or if any of the names cannot be used in a
formula
(for example, because it begins with a number and not a
letter).
By default, the regression methods which will be allowed
for fitting models for Q will be determined from the contents of
Y
as follows: if all values in Y are either 0 or 1 (i.e. all
outcomes are binary), then “logistic”-type regression methods
will be used (and only these methods will be allowed in the arguments
Q.method
and Q.sl.cands
); however, if there are any values
in Y
which are not equal to 0 or 1 then it will be assumed that
all outcomes are continuous, “linear”-type regression will be
used, and the values allowed for Q.method
and Q.sl.cands
will change accordingly. This behavior can be overriden by specifying
Q.type
as either "binary.outcome"
(for logistic-type
regression), or as "continuous.outcome"
(for linear-type
regression). If Q.type
is specified, Y
will not be checked for
binaryness.
The values allowed for Q.method
(which should have length 1) are:
either "sl"
if one would like to use super learning, or one of the
elements of the vector all.bin.cands
(for the binary outcome case),
or of all.cont.cands
(for the continuous outcome case), if one would
like to use only a
particular regression method for all modelling of Q. If Q.method
is given as "sl"
, then the candidates used by the super learner
will be determined from the value of Q.sl.cands
. If the value of
Q.sl.cands
is "default"
, then the candidates listed in either
default.bin.cands
or default.cont.cands
will
be used. If the value of Q.sl.cands
is "all"
, then the candidates
listed in either all.bin.cands
or all.cont.cands
will be used. The function will automatically choose the candidates which
correspond to the correct outcome type (binary or continuous). Alternatively,
one may specify Q.sl.cands
explicitly as a vector of names of the
candidates to be used.
If A
has more than one column, the adjust.for.other.As
argument can be used to specify whether the other
columns of A
should possibly be included in the g and Q models
which will be used in calculating the effect of a
certain column of A
on each column of Y
.
With the argument extra.cands
, one may supply alternative R
functions to be used as stand-alone regression methods, or as super
learner candidates, within the multiPIM
function. extra.cands
should be given as a named list of
functions. See Candidates for the form (e.g. arguments) that the
functions in this list should have. In order to supply your own stand
alone regression method for g or Q, simply specify g.method
or
Q.method
as the name of the function you want to use (i.e. the
corresponding element of the names attribute of extra.cands
). To
add candidates to a super learner, simply use the corresponding names of
your functions (from the names attribute of extra.cands
) when you
supply the g.sl.cands
or Q.sl.cands
arguments. Note that
you may mix and match between your own extra candidates and the built-in
candidates given in the all.bin.cands
and
all.cont.cands
vectors. Note
also that extra candidates must be explicitly specified as
g.method
, Q.method
, or as elements of g.sl.cands
or
Q.sl.cands
– Specifying Q.sl.cands
as "all"
will not
cause any extra candidates to be used.
Returns an object of class "multiPIM"
with the following elements:
param.estimates |
a matrix of dimensions |
plug.in.stand.errs |
a matrix with the same dimensions as |
call |
a copy of the call to |
num.exposures |
this will be set to |
num.outcomes |
this will be set to |
W.names |
the names attribute of the |
estimator |
the estimator used. |
g.method |
the method used for modelling g. |
g.sl.cands |
in case super learning was used for g, the candidates used in the super learner. Will be |
g.winning.cands |
if super learning was used for g, this will be a named character vector with |
g.cv.risk.array |
array with dim attribute |
g.final.models |
a list of length |
g.num.folds |
the number of folds used for cross validation in the super learner for g. Will be |
g.num.splits |
the number of splits used for cross validation in the super learner for g. Will be |
Q.method |
the method used for modeling Q. Will be |
Q.sl.cands |
in case super learning was used for Q, the candidates used in the super learner. Will be |
Q.winning.cands |
if super learning was used for Q, this will be a named character vector with |
Q.cv.risk.array |
array with dim attribute |
Q.final.models |
a list of length |
Q.num.folds |
the number of folds used for cross validation in the super learner for Q. Will be |
Q.num.splits |
the number of splits used for cross validation in the super learner for Q. Will be |
Q.type |
either |
adjust.for.other.As |
logical value indicating whether the other columns of |
truncate |
the value of the |
truncation.occured |
logical value indicating whether it was necessary to trunctate. |
standardize |
the value of the |
boot.param.array |
this slot will be |
main.time |
total time (in seconds) taken to generate this multiPIM result. |
g.time |
time in seconds taken for running g models. |
Q.time |
time in seconds taken for running Q models. |
g.sl.time |
if g.method is "sl", time in seconds taken for running cross-validation of g models. |
Q.sl.time |
if Q.method is "sl", time in seconds taken for running cross-validation of Q models. |
g.sl.cand.times |
if g.method is "sl", named vector containing time taken, with each element corresponding to a super learner candidate for g. |
Q.sl.cand.times |
if Q.method is "sl", named vector containing time taken, with each element corresponding to a super learner candidate for Q. |
Stephan Ritter, with design contributions from Alan Hubbard and Nicholas Jewell.
Ritter, Stephan J., Jewell, Nicholas P. and Hubbard, Alan E. (2014) “R Package multiPIM: A Causal Inference Approach to Variable Importance Analysis” Journal of Statistical Software 57, 8: 1–29. http://www.jstatsoft.org/v57/i08/.
Hubbard, Alan E. and van der Laan, Mark J. (2008) “Population Intervention Models in Causal Inference.” Biometrika 95, 1: 35–47.
Young, Jessica G., Hubbard, Alan E., Eskenazi, Brenda, and Jewell, Nicholas P. (2009) “A Machine-Learning Algorithm for Estimating and Ranking the Impact of Environmental Risk Factors in Exploratory Epidemiological Studies.” U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 250. http://www.bepress.com/ucbbiostat/paper250
van der Laan, Mark J. and Rose, Sherri (2011) Targeted Learning, Springer, New York. ISBN: 978-1441997814
Sinisi, Sandra E., Polley, Eric C., Petersen, Maya L, Rhee, Soo-Yon and van der Laan, Mark J. (2007) “Super learning: An Application to the Prediction of HIV-1 Drug Resistance.” Statistical Applications in Genetics and Molecular Biology 6, 1: article 7. http://www.bepress.com/sagmb/vol6/iss1/art7
van der Laan, Mark J., Polley, Eric C. and Hubbard, Alan E. (2007) “Super learner.” Statistical applications in genetics and molecular biology 6, 1: article 25. http://www.bepress.com/sagmb/vol6/iss1/art25
multiPIMboot
for running multiPIM
with automatic bootstrapping to get standard errors.
summary.multiPIM
for printing summaries of the results.
Candidates
to see which candidates are currently available, and for information on writing user-defined super learner candidates and regression methods.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | num.columns <- 3
num.obs <- 250
set.seed(23)
## use rbinom with size = 1 to make a data frame of binary data
A <- as.data.frame(matrix(rbinom(num.columns*num.obs, 1, .5),
nrow = num.obs, ncol = num.columns))
## let Y[,i] depend only on A[,i] plus some noise
## (start with the noise then add a multiple of A[,i] to Y[,i])
Y <- as.data.frame(matrix(rnorm(num.columns*num.obs),
nrow = num.obs, ncol = num.columns))
for(i in 1:num.columns)
Y[,i] <- Y[,i] + i * A[,i]
## make sure the names are unique
names(A) <- paste("A", 1:num.columns, sep = "")
names(Y) <- paste("Y", 1:num.columns, sep = "")
result <- multiPIM(Y, A)
summary(result)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.