# mianalyze.relimp: Function to do relative importance calculations based on... In relaimpo: Relative importance of regressors in linear models

## Description

The function mianalyze.relimp takes a list of imputed data frames (or matrices), calculates relative importance metrics for each of these and combines these metrics into overall estimates with estimated variances according to the method by Rubin (1987). The output object can be summarized, printed and plotted.

## Usage

 ```1 2 3 4``` ```mianalyze.relimp(implist, level = 0.95, sort = FALSE, ..., b = 50, type = "lmg", diff = TRUE, no.CI = FALSE, rela = FALSE, always = NULL, groups = NULL, groupnames = NULL, deslist = NULL, bootlist.out = FALSE, formula = NULL, weights = NULL, strata=NULL, ids=NULL) ```

## Arguments

 `implist` list of data frames or matrices containing multiply-imputed datasets, or object of class `imputationList` If no formula is given, the first column of each data frame/matrix is assumed to be the response variable, the other columns are regressors. If a list of designs is also given, the `variables` component of each design must consist of the necessary columns from the respective entry in implist; if no formula is given, the `variables` component of each design must coincide (except for the order of columns) with the respective entry in implist. `level` is a single confidence level (between 0.5 and 1) `sort` is a logical requesting output sorted by size of relative contribution (`sort=TRUE`) or by variable position in list (`sort=FALSE`, default). `...` Further arguments, currently none available `b` is the number of bootstrap runs requested on boot.relimp (default: `b=50`). Make sure to set this to a higher number, if you want to subsequently use the `bootlist` slot for calculating further confidence intervals with function `booteval.relimp`. `type` cf. `calc.relimp`. `diff` is a logical requesting bootstrapping of pairwise differences in relative importance (`diff=TRUE`, default) for each metric in type `no.CI` if set to TRUE, suppresses calculation of confidence intervals and only averages estimated metrics from all imputed data sets in implist. Currently, `no.CI = TRUE` is the only setting for which `mianalyze.relimp` works when using models with factors, groups or interactions. `rela` cf. `calc.relimp`. `always` cf. `calc.relimp`. `groups` cf. `calc.relimp`. `groupnames` cf. `calc.relimp`. `deslist` is a list of design object of class `survey.design` (cf. package `survey`). You can EITHER specify a `deslist` OR `weights` and/or `strata` and/or `ids`. Note that the design list must contain the same data objects (in the “variables” element) that are listed in `implist`, so that a lot of storage space is needed in case of large datasets. If deslist is not given, the function creates a list of designs using `weights`, `strata`, and `ids` information. Whenever the required designs are simple enough to be covered by specifying `weights`, `strata`, and `ids`, this is by far preferrable in terms of storage. `bootlist.out` If TRUE, the individual bootstrap results for each multiply imputed data set are stored in the bootlist slot of the output object (may be storage-intensive). `formula` cf. `boot.relimp`; NOTE: If no.CI = FALSE, i.e. confidence intervals are not suppressed, `formula` has to follow the same restrictions as mentioned under item `design` for `boot.relimp` (no calculated variables, no interaction terms, no factors), since confidence interval calculations in `mianalyze.relimp` are design-based, even if no `deslist-`option is given. `weights` is a vector of case weights for the observations in the data frame (or matrix). You can EITHER specify `weights` OR a `deslist`. If `weights` is NULL, equal weights are assumed, unless otherwise specified in `deslist`. For the different types of weights and their appropriate treatment for obtaining confidence intervals, cf. the “Details” section of `boot.relimp`. `strata` is a strata request that will be handed to function `svydesign` for defining the strata in a survey design (to be given to mianalyze without the “\~”). You can EITHER specify `strata` OR a `deslist`. If `strata` is NULL, one stratum is assumed, unless otherwise specified in `deslist`. `ids` is an id-request that will be handed to function `svydesign` for defining the clusters in a survey design (to be given to mianalyze without the “\~”). You can EITHER specify `ids` OR a `deslist`. If `ids` is NULL, it is assumed that there are no clusters, unless otherwise specified in `deslist`.

## Details

Multiple imputation is a contemporary method for handling data with a substantial missing value problem. It produces a number of completed data sets (e.g. 10) the inference from which is subsequently combined. The most frequently used way of combination is the one by Rubin: estimates from the different completed data sets are averaged, and the variance is estimated by combining the average over the estimated variances (within imputation variance) with an appropriately-scaled variance between estimates, and confidence intervals are obtained by using a t-distribution with appropriately chosen degrees of freedom.

The variance-covariance matrix of the vector of estimates for each individual completed data set is obtained from function withReplicates in package survey based on survey's bootstrap replication weights. On request (`bootlist.out=TRUE`), the underlying bootstrap resamples are also stored in the `bootlist`-slot of the output object. In this case, list elements of the `bootlist`-slot are objects of class `relimplmboot` and can be processed by function `booteval.relimp`. This can help in getting an impression whether the overall aggregated confidence intervals are heavily distorted towards symmetry. If such sanity-checking is intended, the default value for `b` should be substantially increased.

Function `mianalyze.relimp` needs a list of multiply-imputed data sets or an object of class `imputationList` for input. Multiply imputed data sets can - within R - be obtained from various packages. Hints for creating lists of the form needed for `mianalyze.relimp` are given below for users of functions `aregImpute`, `mice`, and `amelia`. Users of packages norm, cat, mix, or pan (who have managed to operate these extremely uncomfortable packages) can of course also produce lists of imputed data sets (only less comfortably).

For an object `imp` of class `mids` obtained from function `mice` in package mice, the code

`lapply(as.list(1:imp\$m),function(obj) complete(imp,action=obj))`

produces a list of multiply-imputed data sets as needed for function `mianalyze.relimp`. For an object `f` of class `aregImpute` produced by function `aregImpute` in package Hmisc,

`lapply(as.list(1:f\$m),function(obj) impute(imp,imputation=obj))`

produces the required list of multiply-imputed data sets. For an object `output` produced by function `amelia` in package Amelia, the code

`output[1:output\$amelia.args\$m]`

produces the list of multiply-imputed data sets as needed for function `mianalyze.relimp`.

For multiple imputation, practice is in many cases ahead of theory; this is no different with function `mianalyze.relimp`. Users should note that the validity of confidence intervals has only been proven for likelihood-based analyses. Since the metrics calculated in relaimpo are not strictly likelihood-based, the confidence intervals from function `mianalyze.relimp` must be considered approximate and experimental.

## Value

The value returned by function `mianalyze.relimp` is an object of class `relimplmbootMI` (if `no.CI = FALSE`, default) or an object of class `relimplm` (if no.CI=TRUE). It can be printed, plotted and summarized using special methods. For extracting its items, the `@` or `\$` extractors can be used.

In addition to the items described for function `calc.relimp`, which are also available here, the following items from class `relimplmbootMI` may be of interest for further calculations:

 `metric.lower` matrix of lower confidence bounds for “metric”: one row for each confidence level, one column for each element of “metric”. “metric” can be any of `lmg`, `lmg.rank`, `lmg.diff`, ... (replace `lmg` with other available relative importance metrics, cf. `calc.relimp`) `metric.upper` matrix of upper confidence bounds for “metric”: one row for each confidence level, one column for each element of “metric” `nboot` number of bootstrap runs underlying the evaluations `level` confidence level `MIresult` object of class `MIresult` that can be processed by the function summary.MIresult from package survey `bootlist` only available if bootlist.out=TRUE has been chosen; list of objects of class boot.relimp; each list element can be input to function `booteval.relimp`

## Warning

The confidence intervals produced here should be used for exploratory purposes only. They can be somewhat liberal and are likely to be too symmetric particularly for small data sets. The confidence intervals produced by function `mianalyze.relimp` need further research into their behaviour and are currently considered experimental.

Be aware that the methods themselves (`lmg` and even more `pmvd`) need some computing time in case of many regressors. Hence, bootstrapping of multiple data sets should be used with awareness of computing time issues.

## Note

There are two versions of this package. The version on CRAN is globally licensed under GPL version 2 (or later). There is an extended version with the interesting additional metric `pmvd` that is licensed according to GPL version 2 under the geographical restriction "outside of the US" because of potential issues with US patent 6,640,204. This version can be obtained from Ulrike Groempings website (cf. references section). Whenever you load the package, a display tells you, which version you are loading.

## Author(s)

Ulrike Groemping, BHT Berlin

## References

Chevan, A. and Sutherland, M. (1991) Hierarchical Partitioning. The American Statistician 45, 90–96.

Darlington, R.B. (1968) Multiple regression in psychological research and practice. Psychological Bulletin 69, 161–182.

Feldman, B. (2005) Relative Importance and Value. Manuscript (Version 1.1, March 19 2005), downloadable at http://www.prismanalytics.com/docs/RelativeImportance050319.pdf

Genizi, A. (1993) Decomposition of R2 in multiple regression with correlated regressors. Statistica Sinica 3, 407–420. Downloadable at http://www3.stat.sinica.edu.tw/statistica/password.asp?vol=3&num=2&art=10

Groemping, U. (2006) Relative Importance for Linear Regression in R: The Package relaimpo Journal of Statistical Software 17, Issue 1. Downloadable at http://www.jstatsoft.org/v17/i01

Lindeman, R.H., Merenda, P.F. and Gold, R.Z. (1980) Introduction to Bivariate and Multivariate Analysis, Glenview IL: Scott, Foresman.

Little, R.J.A. and Rubin, D.B. (2002) Statistical Analysis with Missing Data, Wiley, New York.

Zuber, V. and Strimmer, K. (2010) Variable importance and model selection by decorrelation. Preprint, downloadable at http://www.uni-leipzig.de/strimmer/lab/publications/preprints/carscore2010.pdf

Go to http://prof.beuth-hochschule.de/groemping/ for further information and references.

relaimpo, `calc.relimp`, `booteval.relimp`, `classesmethods.relaimpo`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31``` ``` ## smi contains a list of 5 imputed datasets (class imputationList) from package mitools ## (first element of smi is list of data frames) ## it is not a well-suited example for relative importance but easily available for demonstrating ## multiple imputation-related functionality data(smi) ## obtain averaged estimates only, without confidence intervals ## works with factors and interactions mianalyze.relimp(smi[[1]], formula = cistot ~ drkfre+sex+wave, no.CI = TRUE) ## for obtaining all individual estimates, use lapply: smi.cr.list <- lapply(smi[[1]], function(obj) calc.relimp(cistot ~ drkfre+sex+wave, data=obj)) ## display result for first individual imputed data set smi.cr.list[[1]] ## obtain confidence intervals, ## currently only usable for models without calculated variables, factors, groups, interactions ## call without using weights, strata, clusters or a design list mianalyze.relimp(smi[[1]], formula = cistot ~ mdrkfre+sex+wave) ## call using the id column (identical in all smi data sets) for cluster structure ident <- smi[[1]][[1]]\$id mitest <- mianalyze.relimp(smi[[1]], formula = cistot ~ mdrkfre+sex+wave, ids=ident) mitest ## postprocess: look at intervals with different confidence level summary(mitest,lev=0.8) ## call with design list deslist = lapply(smi[[1]], function(obj) svydesign(~id,strata=~sex,weights=~cistot,data=obj)) mitest <- mianalyze.relimp(smi[[1]], formula = cistot ~ mdrkfre+sex+wave, deslist=deslist, level=c(0.8)) mitest ```