Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples

calc.relimp calculates several relative importance metrics for the linear model.
The recommended metrics are `lmg`

(*R^2* partitioned by averaging over orders, like in Lindemann, Merenda and Gold (1980, p.119ff))
and `pmvd`

(a newly proposed metric by Feldman (2005) that is provided in the non-US version of the package only).
For completeness and comparison purposes, several other metrics are also on offer (cf. e.g. Darlington (1968)).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | ```
## generic function
calc.relimp(object, ...)
## default S3 method
## Default S3 method:
calc.relimp(object, x = NULL, ...,
type = "lmg", diff = FALSE, rank = TRUE, rela = FALSE, always = NULL,
groups = NULL, groupnames = NULL, weights=NULL, design=NULL)
## S3 method for formula object
## S3 method for class 'formula'
calc.relimp(formula, data, weights, na.action, ..., subset=NULL)
## S3 method for objects of class lm
## S3 method for class 'lm'
calc.relimp(object, type = "lmg", groups = NULL, groupnames=NULL, always = NULL, ...)
``` |

`object ` |
The class of this object determines which of the methods is used:
There are special methods for output objects from function Thus, object can be a formula (e.g. y\~x1+x2+x3+x2:x3) (cf. below for details) OR the output of a linear model call (inheriting from class The restrictions on usage of interactions listed under item formula below also apply to linear model objects. OR the covariance matrix of a response y and regressors x, (e.g. obtained by cov(cbind(y,x)), if y is a column vector of response values and x a corresponding matrix of regressors) OR a (raw) data matrix or data frame with the response variable in the first column OR a response vector or one-column matrix,
if |

`formula ` |
The first object, if a formula is to be given; one response only. Interaction terms are currently limited to second-order. Note: If several interaction terms are given, calculations may be very resource intensive, if these are all connected (e.g. with A:B, B:C, C:D, all A,B,C,D are connected, while with A:B, C:D, D:E there are separate groups A,B and C,D,E). Interaction terms occurring in always do not increase resource usage (but are only permitted if the respective main effects also occur in always). Interactions and groups currently cannot be used simultaneously. |

`x ` |
a (raw) data matrix or data frame containing the regressors,
if OR NULL, if |

`type ` |
can be a character string, character vector or list of character strings.
It is the collection of metrics that are to be calculated.
Available metrics: |

`diff ` |
logical; if TRUE, pairwise differences between the relative contributions are calculated; default FALSE |

`rank ` |
logical; if TRUE, ranks of regressors in terms of relative contributions are calculated; default TRUE |

`rela ` |
is a logical requesting relative importances summing to 100% ( |

`always ` |
is a vector of column numbers or names of variables to be always in the
model (adjusted for). Valid numbers are 2 to (number of regressors + 1) (1 is reserved for the response),
valid character strings are all column names of Relative importance is only assessed for the variables not selected in This option currently does not work for metrics |

`groups ` |
is a list of vectors of column numbers or names of variables to be combined into groups.
If only one group is needed, a vector can be given. The numbers and character strings needed are of the same form
as for Relative importance is only allocated between groups of regressors, no subdivision within groups is calculated.
Regressors that do not occur in any group are included as singletons.
A regressor must not occur in |

`groupnames ` |
is a vector of names for the variable groups to be used for annotation of output. |

`weights ` |
is a vector of case weights for the observations in the data frame (or matrix).
You can EITHER specify |

`design ` |
is a design object of class Also note that care is needed when using |

`data` |
if first object is of class formula: an optional matrix or data frame that the variables in formula and subset come from; if it is omitted, all names must be meaningful in the environment from which calc.relimp is called |

`subset` |
if first object is of class formula:
an optional expression indicating the subset of the observations of |

`na.action` |
if first object is of class formula:
an optional function that indicates what should happen when the data contain 'NA's.
The default is first, any na.action attribute of data, second the setting given in the call to calc.relimp,
third the na.action setting of options. Possible choices are "na.fail",
(print an error message and terminate if there are any incomplete observations),
"na.omit" or "na.exclude" (equivalent for package |

`...` |
usable for further arguments, particularly most arguments of default method can be given to all other methods (exception: weights and design cannot be given to lm-method) |

- lmg
is the

*R^2*contribution averaged over orderings among regressors, cf. e.g. Lindeman, Merenda and Gold 1980, p.119ff or Chevan and Sutherland (1991).- pmvd
is the proportional marginal variance decomposition as proposed by Feldman (2005) (non-US version only). It can be interpreted as a weighted average over orderings among regressors, with data-dependent weights.

- last
is each variables contribution when included last, also sometimes called usefulness.

- first
is each variables contribution when included first, which is just the squared covariance between y and the variable.

- betasq
is the squared standardized coefficient.

- pratt
is the product of the standardized coefficient and the correlation.

- genizi
is the

*R^2*decomposition according to Genizi 1993- car
is the

*R^2*decomposition according to Zuber and Strimmer 2010, also available from package care (squares of scores produced by function`carscore`

Each metric is calculated using the internal function “metric”`calc`

, e.g. `lmgcalc`

.

Five of the metrics in `calc.relimp`

(`lmg`

, `pmvd`

, `pratt`

, `genizi`

and `car`

),
decompose the model *R^2*.
`calc.relimp`

(`lmg`

, `pmvd`

, `pratt`

, `genizi`

and `car`

) sum to the *R^2* that is to be decomposed,
if `rela = FALSE`

and to 100pct if `rela = TRUE`

.

The other metrics also (artificially) sum to 100pct if `rela = TRUE`

.
If `rela = FALSE`

, they are given relative to var(y) (or the conditional variance of y after adjusting out the variables
requested in `always`

) but do not sum to *R^2*.

If `always`

requests some variables to be always in the model, these are conditioned upon
(i.e. included into the model first). Only the remaining *R^2* that is not explained by
these variables is decomposed among the other regressors. This currently does not work for metrics `genizi`

and `car`

.

Four of the metrics, `lmg`

, `pmvd`

, `first`

and `last`

,
are related to the order in which the variables are included into the model.
For these it is possible to consider the variables in groups that are always entered into the model together.

Note that relaimpo can only provide metric `lmg`

for models with interactions (2-way interactions only).
It averages only over those orders, for which the interactions enter the model after both their main effects.

Note that there are different types of weights, weights indicating the variability of the response
(observations with a more variable responses receive a lower weight than those with a less variable response,
like in the Aitken estimator), frequency weights indicating the number of observations with exactly the observed
data pattern of the current observation, or weights indicating the number of population units represented by the
current observation (inverse sampling probability, weights typically used in survey situations). All three types of
weight alike can be handed to function `calc.relimp`

using the `weights=`

option. Note, however, that they
have to be treated differently for bootstrapping (cf. `boot.relimp`

).

Data from complex surveys can be treated by providing a survey design with `design=`

-option.
For `calc.relimp`

, it is also sufficient to provide the weights derived from the design using the
`weights=`

-option.

`calc.relimp`

cannot handle data with missing values directly. It applies complete-case analysis,
i.e. drops all units with any missing values by default. While this can be appropriate, if there are only few
missing values, data with more severe missingness issues need special treatment. Package relaimpo
offers the function `mianalyze.relimp`

that handles multiply-imputed datasets (that can be created
by several other **R**-packages). Currently, possibilities in this function are limited due to the fact that
it uses complex survey designs and bootstrapping which do not (yet) go together well with factors, interactions
and calculated quantities in formulae.

`var.y ` |
the variance of the response |

`R2 ` |
the coefficient of determination, |

`R2.decomp ` |
the part of the coefficient of determination that is decomposed among the variables under investigation |

`lmg ` |
vector of relative contributions obtained from the |

`lmg.diff ` |
vector of pairwise differences between relative contributions obtained from the |

`lmg.rank ` |
rank of the regressors relative contributions obtained from the |

`metric, metric.diff, metric.rank ` |
analogous to |

`ave.coeffs` |
average coefficients for variables not not requested by always only for models of different sizes; note that coefficients refer to modeling residuals after adjusting out variables listed in always (both from response and other explanatory variables) |

`namen` |
names of variables, starting with response |

`type` |
character vector of metrics available |

`rela` |
Have metrics been normalized to sum 100% ? |

`always` |
column numbers of variables always in the model;
in case of factors, the column numbers given here are not identical to those in
the call to |

`alwaysnam` |
names of variables always in the model |

`call` |
contains the call that generated the object |

`lmg`

and `pmvd`

are computer-intensive. Although they are calculated based on the
covariance matrix, which saves substantial computing time in comparison to carrying out actual regressions,
these methods still take quite long for problems with many regressors.

`relaimpo`

is a package for univariate linear models.
Using `relaimpo`

on objects that inherit from class `lm`

but are not univariate linear model objects
may produce nonsensical results without warning. Objects of class `mlm`

or `glm`

with link functions other than identity
or family other than gaussian lead to an error message.

There are two versions of this package. The version on CRAN is globally licensed under GPL version 2 (or later).
There is an extended version with the interesting additional metric `pmvd`

that is licensed according to GPL version 2
under the geographical restriction "outside of the US" because of potential issues with US patent 6,640,204.
This version can be obtained from Ulrike Groempings website (cf. references section).
Whenever you load the package, a display tells you, which version you are loading.

Ulrike Groemping, BHT Berlin

Chevan, A. and Sutherland, M. (1991) Hierarchical Partitioning. *The American Statistician* **45**, 90–96.

Darlington, R.B. (1968) Multiple regression in psychological research and practice. *Psychological Bulletin* **69**, 161–182.

Feldman, B. (2005) Relative Importance and Value. Manuscript (Version 1.1, March 19 2005), downloadable at http://www.prismanalytics.com/docs/RelativeImportance050319.pdf

Genizi, A. (1993) Decomposition of R2 in multiple regression with correlated regressors. *Statistica Sinica* **3**, 407–420.
Downloadable at http://www3.stat.sinica.edu.tw/statistica/password.asp?vol=3&num=2&art=10

Groemping, U. (2006) Relative Importance for Linear Regression in R: The Package relaimpo
*Journal of Statistical Software* **17**, Issue 1.
Downloadable at http://www.jstatsoft.org/v17/i01

Lindeman, R.H., Merenda, P.F. and Gold, R.Z. (1980) *Introduction to Bivariate and Multivariate Analysis*, Glenview IL: Scott, Foresman.

Zuber, V. and Strimmer, K. (2010) *Variable importance and model selection by decorrelation*. Preprint, downloadable at http://www.uni-leipzig.de/strimmer/lab/publications/preprints/carscore2010.pdf

Go to http://prof.beuth-hochschule.de/groemping/relaimpo/ for further information and references.

relaimpo, `booteval.relimp`

, `mianalyze.relimp`

,
`classesmethods.relaimpo`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | ```
#####################################################################
### Example: relative importance of various socioeconomic indicators
### for Fertility in Switzerland
### Fertility is first column of data set swiss
#####################################################################
data(swiss)
calc.relimp(swiss,
type = c("lmg", "last", "first", "betasq", "pratt", "genizi", "car") )
# calculation of all available relative importance metrics
# non-US version offers the additional metric "pmvd",
# i.e. call would be
# calc.relimp(cov(swiss),
# type = c("lmg", "pmvd", "last", "first", "betasq, "pratt"),
# rela = TRUE )
## same analysis with formula or lm method and a few modified options
crf <- calc.relimp(Fertility~Agriculture+Examination+Education+Catholic+Infant.Mortality,swiss,
subset = Catholic>40,
type = c("lmg", "last", "first", "betasq", "pratt"), rela = TRUE )
crf
linmod <- lm(Fertility~Agriculture+Examination+Education+Catholic+Infant.Mortality,swiss)
crlm <- calc.relimp(linmod,
type = c("lmg", "last", "first", "betasq", "pratt", "genizi", "car"), rela = TRUE )
plot(crlm)
# bar plot of the relative importance metrics
#of statistical interest in this context: correlation matrix
cor(swiss)
#demonstration of conditioning on one regressor using always
calc.relimp(swiss,
type = c("lmg", "last", "first", "betasq", "pratt"), rela = FALSE,
always = "Education" )
# using calc.relimp with grouping of two regressors
# and weights (not reasonable here, purely for demo purposes)
calc.relimp(swiss,
type = c("lmg", "last", "first"), rela = FALSE,
groups = c("Education","Examination"), weights = abs(-23:23) )
# using calc.relimp with grouping of two regressors
# and a design object (not reasonable here, purely for demo purposes)
des <- svydesign(~1, data=swiss, weights=~abs(-23:23))
calc.relimp(swiss,
type = c("lmg", "last", "first"), rela = FALSE,
groups = c("Education","Examination"), groupnames ="EduExam", design = des)
# calc.relimp with factors (betasq and pratt not possible)
# (calc.relimp would not be necessary here,
# because the experiment is balanced)
calc.relimp(1/time~poison+treat,data=poisons, rela = FALSE,
type = c("lmg", "last", "first"))
# including also the interaction (lmg possible only)
calc.relimp(1/time~poison*treat,data=poisons, rela = FALSE)
``` |

```
Loading required package: MASS
Loading required package: boot
Loading required package: survey
Loading required package: grid
Loading required package: Matrix
Loading required package: survival
Attaching package: 'survival'
The following object is masked from 'package:boot':
aml
Attaching package: 'survey'
The following object is masked from 'package:graphics':
dotchart
Loading required package: mitools
This is the global version of package relaimpo.
If you are a non-US user, a version with the interesting additional metric pmvd is available
from Ulrike Groempings web site at prof.beuth-hochschule.de/groemping.
Response variable: Fertility
Total response variance: 156.0425
Analysis based on 47 observations
5 Regressors:
Agriculture Examination Education Catholic Infant.Mortality
Proportion of variance explained by model: 70.67%
Metrics are not normalized (rela=FALSE).
Relative importance metrics:
lmg last first betasq pratt
Agriculture 0.05709122 0.042869607 0.1246649 0.09791973 -0.1104860
Examination 0.17117303 0.007387419 0.4171645 0.02715186 0.1064274
Education 0.26013468 0.161962693 0.4406156 0.44943721 0.4450046
Catholic 0.10557015 0.062372626 0.2150035 0.12082578 0.1611768
Infant.Mortality 0.11276592 0.056945259 0.1735189 0.06306928 0.1046122
genizi car
Agriculture 0.0484169 0.0005674742
Examination 0.1547082 0.1511339908
Education 0.2719641 0.3232680710
Catholic 0.1146800 0.1150731842
Infant.Mortality 0.1169657 0.1166922814
Average coefficients for different model sizes:
1X 2Xs 3Xs 4Xs 5Xs
Agriculture 0.1942017 0.03949369 -0.06794018 -0.1380370 -0.1721140
Examination -1.0113173 -0.89693064 -0.72467898 -0.5056072 -0.2580082
Education -0.8623503 -0.77680153 -0.77395379 -0.8164207 -0.8709401
Catholic 0.1388857 0.09709583 0.08377687 0.0903765 0.1041153
Infant.Mortality 1.7864860 1.59438387 1.46135692 1.2717960 1.0770481
Warning message:
In rev(variances[[p]]) - variances[[p + 1]] :
Recycling array of length 1 in vector-array arithmetic is deprecated.
Use c() or as.vector() instead.
Warning message:
In rev(variances[[p]]) - variances[[p + 1]] :
Recycling array of length 1 in vector-array arithmetic is deprecated.
Use c() or as.vector() instead.
Response variable: Fertility
Total response variance: 280.7706
Analysis based on 19 observations
5 Regressors:
Agriculture Examination Education Catholic Infant.Mortality
Proportion of variance explained by model: 80.93%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg last first betasq pratt
Agriculture 0.11486583 0.353055869 0.11306600 0.136299551 -0.19101959
Examination 0.17190060 0.001476325 0.20205545 0.001777293 0.02915946
Education 0.30508034 0.074753723 0.31131441 0.301904369 0.47173598
Catholic 0.31494322 0.475491780 0.31290701 0.533989734 0.62898283
Infant.Mortality 0.09321001 0.095222303 0.06065714 0.026029053 0.06114132
Average coefficients for different model sizes:
1X 2Xs 3Xs 4Xs 5Xs
Agriculture 0.3642223 -0.006783255 -0.16636929 -0.229599451 -0.21989767
Examination -1.4207166 -0.540061683 -0.05947462 0.008361967 -0.07326973
Education -1.1118076 -1.171641212 -1.14221086 -0.901857999 -0.60205695
Catholic 0.7923300 0.745030377 0.62073008 0.590074848 0.56916445
Infant.Mortality 1.9642435 1.940427202 1.37998276 0.891605693 0.70754883
Warning message:
In rev(variances[[p]]) - variances[[p + 1]] :
Recycling array of length 1 in vector-array arithmetic is deprecated.
Use c() or as.vector() instead.
Fertility Agriculture Examination Education Catholic
Fertility 1.0000000 0.35307918 -0.6458827 -0.66378886 0.4636847
Agriculture 0.3530792 1.00000000 -0.6865422 -0.63952252 0.4010951
Examination -0.6458827 -0.68654221 1.0000000 0.69841530 -0.5727418
Education -0.6637889 -0.63952252 0.6984153 1.00000000 -0.1538589
Catholic 0.4636847 0.40109505 -0.5727418 -0.15385892 1.0000000
Infant.Mortality 0.4165560 -0.06085861 -0.1140216 -0.09932185 0.1754959
Infant.Mortality
Fertility 0.41655603
Agriculture -0.06085861
Examination -0.11402160
Education -0.09932185
Catholic 0.17549591
Infant.Mortality 1.00000000
Response variable: Fertility
Total response variance: 156.0425
Analysis based on 47 observations
5 Regressors:
Proportion of variance explained: 70.67%
One Regressor always included in model:
Education
44.06 % of variance explained by this regressor
Relative importance of 4 regressors assessed:
Agriculture Examination Catholic Infant.Mortality
26.61 % of variance decomposed among these
Metrics are not normalized (rela=FALSE).
Relative importance metrics:
lmg last first betasq pratt
Agriculture 0.03432758 0.042869607 0.008632775 0.05787163 0.02235157
Examination 0.03884696 0.007387419 0.064868872 0.01390762 0.03003617
Catholic 0.10165854 0.062372626 0.133891476 0.11796552 0.12567648
Infant.Mortality 0.09128627 0.056945259 0.124164362 0.06244711 0.08805513
Warning message:
In rev(variances[[p]]) - variances[[p + 1]] :
Recycling array of length 1 in vector-array arithmetic is deprecated.
Use c() or as.vector() instead.
Response variable: Fertility
Total response variance: 215.6661
Analysis based on 47 observations
5 Regressors:
Some regressors combined in groups:
Group G1 : Examination Education
Relative importance of 4 (groups of) regressors assessed:
G1 Agriculture Catholic Infant.Mortality
Proportion of variance explained by model: 74.01%
Metrics are not normalized (rela=FALSE).
Relative importance metrics:
lmg last first
G1 0.49953751 0.39224182 0.6307919
Agriculture 0.07636859 0.02153528 0.1497720
Catholic 0.05614286 0.01776434 0.1363082
Infant.Mortality 0.10801371 0.03827817 0.1690776
Average coefficients for different model sizes:
1group 2groups 3groups 4groups
Agriculture 0.2426390 0.09581090 -0.02102877 -0.13783809
Examination -0.3580310 -0.32182157 -0.28867678 -0.25727647
Education -0.8201332 -0.85553230 -0.89525538 -0.93781515
Catholic 0.1333066 0.08962258 0.06341285 0.07193345
Infant.Mortality 2.2787886 1.98814590 1.73745564 1.19503671
Warning message:
In rev(variances[[p]]) - variances[[p + 1]] :
Recycling array of length 1 in vector-array arithmetic is deprecated.
Use c() or as.vector() instead.
Response variable: Fertility
Total response variance: 215.6661
Analysis based on 47 observations
5 Regressors:
Some regressors combined in groups:
Group EduExam : Examination Education
Relative importance of 4 (groups of) regressors assessed:
EduExam Agriculture Catholic Infant.Mortality
Proportion of variance explained by model: 74.01%
Metrics are not normalized (rela=FALSE).
Relative importance metrics:
lmg last first
EduExam 0.49953751 0.39224182 0.6307919
Agriculture 0.07636859 0.02153528 0.1497720
Catholic 0.05614286 0.01776434 0.1363082
Infant.Mortality 0.10801371 0.03827817 0.1690776
Average coefficients for different model sizes:
1group 2groups 3groups 4groups
Agriculture 0.2426390 0.09581090 -0.02102877 -0.13783809
Examination -0.3580310 -0.32182157 -0.28867678 -0.25727647
Education -0.8201332 -0.85553230 -0.89525538 -0.93781515
Catholic 0.1333066 0.08962258 0.06341285 0.07193345
Infant.Mortality 2.2787886 1.98814590 1.73745564 1.19503671
Warning message:
In rev(variances[[p]]) - variances[[p + 1]] :
Recycling array of length 1 in vector-array arithmetic is deprecated.
Use c() or as.vector() instead.
Response variable: 1/time
Total response variance: 1.393729
Analysis based on 48 observations
5 Regressors:
Some regressors combined in groups:
Group poison : poison2 poison3
Group treat : treatB treatC treatD
Relative importance of 2 (groups of) regressors assessed:
poison treat
Proportion of variance explained by model: 84.41%
Metrics are not normalized (rela=FALSE).
Relative importance metrics:
lmg last first
poison 0.5324323 0.5324323 0.5324323
treat 0.3116435 0.3116435 0.3116435
Average coefficients for different model sizes:
1group 2groups
poison2 0.4686413 0.4686413
poison3 1.9964249 1.9964249
treatB -1.6574024 -1.6574024
treatC -0.5721354 -0.5721354
treatD -1.3583383 -1.3583383
Warning message:
In rev(variances[[p]]) - variances[[p + 1]] :
Recycling array of length 1 in vector-array arithmetic is deprecated.
Use c() or as.vector() instead.
Response variable: 1/time
Total response variance: 1.393729
Analysis based on 48 observations
11 Regressors:
Some regressors combined in groups:
Group poison : poison2 poison3
Group treat : treatB treatC treatD
Group poison:treat : poison2:treatB poison3:treatB poison2:treatC poison3:treatC poison2:treatD poison3:treatD
Relative importance of 3 (groups of) regressors assessed:
poison treat poison:treat
Proportion of variance explained by model: 86.81%
Metrics are not normalized (rela=FALSE).
Relative importance metrics:
lmg
poison 0.53243232
treat 0.31164349
poison:treat 0.02397933
Average coefficients for different model sizes:
1group 2groups 3groups
poison2 0.4686413 0.4686413 0.78158915
poison3 1.9964249 1.9964249 2.31580446
treatB -1.6574024 -1.6574024 -1.32341687
treatC -0.5721354 -0.5721354 -0.62415711
treatD -1.3583383 -1.3583383 -0.79719886
poison2:treatB NaN NaN -0.55166088
poison3:treatB NaN NaN -0.45029566
poison2:treatC NaN NaN 0.06960632
poison3:treatC NaN NaN 0.08645870
poison2:treatD NaN NaN -0.76973705
poison3:treatD NaN NaN -0.91368123
```

relaimpo documentation built on May 29, 2017, 9:26 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.