Function to do relative importance calculations based on multiply imputed datasets
Description
The function mianalyze.relimp takes a list of imputed data frames (or matrices), calculates relative importance metrics for each of these and combines these metrics into overall estimates with estimated variances according to the method by Rubin (1987). The output object can be summarized, printed and plotted.
Usage
1 2 3 4 
Arguments
implist 
list of data frames or matrices containing multiplyimputed datasets,
or object of class If no formula is given, the first column of each data frame/matrix is assumed to be the response variable, the other columns are regressors. If a list of designs is also given, the 
level 
is a single confidence level (between 0.5 and 1) 
sort 
is a logical requesting output sorted by size of relative contribution
( 
... 
Further arguments, currently none available 
b 
is the number of bootstrap runs requested on boot.relimp (default: 
type 
cf. 
diff 
is a logical requesting bootstrapping of pairwise differences in relative importance ( 
no.CI 
if set to TRUE, suppresses calculation of confidence intervals and only averages estimated metrics
from all imputed data sets in implist. Currently, 
rela 
cf. 
always 
cf. 
groups 
cf. 
groupnames 
cf. 
deslist 
is a list of design object of class If deslist is not given, the function creates a list of designs using 
bootlist.out 
If TRUE, the individual bootstrap results for each multiply imputed data set are stored in the bootlist slot of the output object (may be storageintensive). 
formula 
cf. 
weights 
is a vector of case weights for the observations in the data frame (or matrix).
You can EITHER specify 
strata 
is a strata request that will be handed to function 
ids 
is an idrequest that will be handed to function 
Details
Multiple imputation is a contemporary method for handling data with a substantial missing value problem. It produces a number of completed data sets (e.g. 10) the inference from which is subsequently combined. The most frequently used way of combination is the one by Rubin: estimates from the different completed data sets are averaged, and the variance is estimated by combining the average over the estimated variances (within imputation variance) with an appropriatelyscaled variance between estimates, and confidence intervals are obtained by using a tdistribution with appropriately chosen degrees of freedom.
The variancecovariance matrix of the vector of estimates for each individual completed data set is obtained
from function withReplicates in package survey based on survey's bootstrap replication weights.
On request (bootlist.out=TRUE
), the underlying bootstrap resamples are also stored in the
bootlist
slot of the output object. In this case, list elements of the bootlist
slot
are objects of class relimplmboot
and can be processed by function booteval.relimp
.
This can help in getting an impression whether the overall aggregated confidence intervals are heavily distorted
towards symmetry. If such sanitychecking is intended, the default value for b
should be substantially
increased.
Function mianalyze.relimp
needs a list of multiplyimputed data sets or an object of class
imputationList
for input.
Multiply imputed data sets can  within R  be obtained from various packages. Hints for creating lists
of the form needed for mianalyze.relimp
are given below for users of functions aregImpute
,
mice
, and amelia
. Users of packages norm, cat, mix, or pan
(who have managed to operate these extremely uncomfortable packages) can of course also produce lists of
imputed data sets (only less comfortably).
For an object imp
of class mids
obtained from function mice
in package mice,
the code
lapply(as.list(1:imp$m),function(obj) complete(imp,action=obj))
produces a list of multiplyimputed data sets as needed for function mianalyze.relimp
.
For an object f
of class aregImpute
produced by function aregImpute
in package Hmisc,
lapply(as.list(1:f$m),function(obj) impute(imp,imputation=obj))
produces the required list of multiplyimputed data sets.
For an object output
produced by function amelia
in package Amelia, the code
output[1:output$amelia.args$m]
produces the list of multiplyimputed data sets as needed for function mianalyze.relimp
.
For multiple imputation, practice is in many cases ahead of theory; this is no different with function
mianalyze.relimp
. Users should note that the validity of confidence intervals has only been proven for
likelihoodbased analyses. Since the metrics calculated in relaimpo are not strictly likelihoodbased,
the confidence intervals from function mianalyze.relimp
must be considered approximate and experimental.
Value
The value returned by function mianalyze.relimp
is an object of class relimplmbootMI
(if no.CI = FALSE
, default) or an object of class relimplm
(if no.CI=TRUE).
It can be printed, plotted and summarized using special methods.
For extracting its items, the @
or $
extractors can be used.
In addition to the items described for function calc.relimp
, which are also available here,
the following items from class relimplmbootMI
may be of interest for further calculations:
metric.lower 
matrix of lower confidence bounds for “metric”: one row for each confidence level,
one column for each element of “metric”. “metric” can be any of 
metric.upper 
matrix of upper confidence bounds for “metric”: one row for each confidence level, one column for each element of “metric” 
nboot 
number of bootstrap runs underlying the evaluations 
level 
confidence level 
MIresult 
object of class 
bootlist 
only available if bootlist.out=TRUE has been chosen;
list of objects of class boot.relimp; each list element can be input to function

Warning
The confidence intervals produced here should be used for exploratory purposes only.
They can be somewhat liberal and are likely to be too symmetric particularly for small data sets.
The confidence intervals produced by function mianalyze.relimp
need further research into their behaviour
and are currently considered experimental.
Be aware that the methods themselves (lmg
and even more pmvd
) need some computing time in case of
many regressors. Hence, bootstrapping of multiple data sets should be used with awareness of computing time issues.
Note
There are two versions of this package. The version on CRAN is globally licensed under GPL version 2 (or later).
There is an extended version with the interesting additional metric pmvd
that is licensed according to GPL version 2
under the geographical restriction "outside of the US" because of potential issues with US patent 6,640,204. This version can be obtained
from Ulrike Groempings website (cf. references section). Whenever you load the package, a display tells you, which version you are loading.
Author(s)
Ulrike Groemping, BHT Berlin
References
Chevan, A. and Sutherland, M. (1991) Hierarchical Partitioning. The American Statistician 45, 90–96.
Darlington, R.B. (1968) Multiple regression in psychological research and practice. Psychological Bulletin 69, 161–182.
Feldman, B. (2005) Relative Importance and Value. Manuscript (Version 1.1, March 19 2005), downloadable at http://www.prismanalytics.com/docs/RelativeImportance050319.pdf
Genizi, A. (1993) Decomposition of R2 in multiple regression with correlated regressors. Statistica Sinica 3, 407–420. Downloadable at http://www3.stat.sinica.edu.tw/statistica/password.asp?vol=3&num=2&art=10
Groemping, U. (2006) Relative Importance for Linear Regression in R: The Package relaimpo Journal of Statistical Software 17, Issue 1. Downloadable at http://www.jstatsoft.org/v17/i01
Lindeman, R.H., Merenda, P.F. and Gold, R.Z. (1980) Introduction to Bivariate and Multivariate Analysis, Glenview IL: Scott, Foresman.
Little, R.J.A. and Rubin, D.B. (2002) Statistical Analysis with Missing Data, Wiley, New York.
Zuber, V. and Strimmer, K. (2010) Variable importance and model selection by decorrelation. Preprint, downloadable at http://www.unileipzig.de/strimmer/lab/publications/preprints/carscore2010.pdf
Go to http://prof.beuthhochschule.de/groemping/ for further information and references.
See Also
relaimpo, calc.relimp
, booteval.relimp
,
classesmethods.relaimpo
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  ## smi contains a list of 5 imputed datasets (class imputationList) from package mitools
## (first element of smi is list of data frames)
## it is not a wellsuited example for relative importance but easily available for demonstrating
## multiple imputationrelated functionality
data(smi)
## obtain averaged estimates only, without confidence intervals
## works with factors and interactions
mianalyze.relimp(smi[[1]], formula = cistot ~ drkfre+sex+wave, no.CI = TRUE)
## for obtaining all individual estimates, use lapply:
smi.cr.list < lapply(smi[[1]], function(obj) calc.relimp(cistot ~ drkfre+sex+wave, data=obj))
## display result for first individual imputed data set
smi.cr.list[[1]]
## obtain confidence intervals,
## currently only usable for models without calculated variables, factors, groups, interactions
## call without using weights, strata, clusters or a design list
mianalyze.relimp(smi[[1]], formula = cistot ~ mdrkfre+sex+wave)
## call using the id column (identical in all smi data sets) for cluster structure
ident < smi[[1]][[1]]$id
mitest < mianalyze.relimp(smi[[1]], formula = cistot ~ mdrkfre+sex+wave, ids=ident)
mitest
## postprocess: look at intervals with different confidence level
summary(mitest,lev=0.8)
## call with design list
deslist = lapply(smi[[1]], function(obj) svydesign(~id,strata=~sex,weights=~cistot,data=obj))
mitest < mianalyze.relimp(smi[[1]], formula = cistot ~ mdrkfre+sex+wave, deslist=deslist,
level=c(0.8))
mitest
