In clbustos/dominanceAnalysis: Dominance Analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

dominanceanalysis

Dominance Analysis (Azen and Budescu, 2003, 2006; Azen and Traxel, 2009; Budescu, 1993; Luo and Azen, 2013), for multiple regression models: Ordinary Least Squares, Generalized Linear Models, Dynamic Linear Models and Hierarchical Linear Models.

Features:

Provides complete, conditional and general dominance analysis for lm (univariate and multivariate), clm, dynlm, lmer, betareg and glm (family=binomial) models.
Covariance / correlation matrixes could be used as input for OLS dominance analysis, using lmWithCov() and mlmWithCov() methods, respectively.
Multiple criteria can be used as fit indices, which is useful especially for HLM.

Examples

Linear regression

We could apply dominance analysis directly on the data, using lm (see Azen and Budescu, 2003).

The attitude data is composed of six predictors of the overall rating of 35 clerical employees of a large financial organization: complaints, privileges, learning, raises, critical and advancement. The method dominanceAnalysis() can retrieve all necessary information directly from a lm model.

  library(dominanceanalysis)
  lm.attitude<-lm(rating~.,attitude)
  da.attitude<-dominanceAnalysis(lm.attitude)

Using print() method on the dominanceAnalysis object, we can see that complaints completely dominates all other predictors, followed by learning (lrnn). The remaining 4 variables (prvl,rass,crtc,advn) don't show a consistent pattern for complete and conditional dominance. The average contribution of each predictor is also presented, that defines defines general dominance.

The print() method uses abbreviate, to allow complex models to be visualized at a glance.

  print(da.attitude)

The dominance brief and average contribution of each predictor could be retrieved separately using dominanceBriefing() and averageContribution() methods, respectively.

  dominanceBriefing(da.attitude, abbrev = TRUE)$r2
  averageContribution(da.attitude)

The summary() method shows the complete dominance analysis matrix, that presents all fit differences between levels. Also, provides the average contribution of each variable.

  summary(da.attitude)

To evaluate the robustness of our results, we can use bootstrap analysis (Azen and Budescu, 2006).

We applied a bootstrap analysis using bootDominanceAnalysis() method with $R^2$ as a fit index and 100 permutations. For precise results, you need to run at least 1000 replications.

  set.seed(1234)
  bda.attitude<-bootDominanceAnalysis(lm.attitude, R=100)

The summary() method presents the results for the bootstrap analysis. Dij shows the original result, and mDij, the mean for Dij on bootstrap samples and SE.Dij its standard error. Pij is the proportion of bootstrap samples where i dominates j, Pji is the proportion of bootstrap samples where j dominates i and Pnoij is the proportion of samples where no dominance can be asserted. Rep is the proportion of samples where original dominance is replicated.

We can see that the value of complete dominance for complaints is fairly robust over all variables (Dij almost equal to mDij, and small SE), contrarily to learning (Dij differs from mDij, and bigger SE).

  summary(bda.attitude)

Another way to perform the dominance analysis is by using a correlation or covariance matrix. As an example, we use the ability.cov matrix which is composed of five specific skills that might explain general intelligence (general). The biggest average contribution is for predictor reading (0.152). Nevertheless, in the output of summary() method on level 1, we can see that picture (0.125) dominates over reading (0.077) on vocab submodel.

lmwithcov<-lmWithCov( f = general~picture+blocks+maze+reading+vocab,
                      x = cov2cor(ability.cov$cov))
da.cov<-dominanceAnalysis(lmwithcov)
print(da.cov)
summary(da.cov)

Hierarchical Linear Models

For Hierarchical Linear Models using lme4, you should provide a null model (see Luo and Azen, 2013).

As an example, we use npk dataset, which contains information about a classical N, P, K (nitrogen, phosphate, potassium) factorial experiment on the growth of peas conducted on 6 blocks.

library(lme4)
lmer.npk.1<-lmer(yield~N+P+K+(1|block),npk)
lmer.npk.0<-lmer(yield~1+(1|block),npk)
da.lmer<-dominanceAnalysis(lmer.npk.1,null.model=lmer.npk.0)

Using print() method, we can see that random effects are modeled as a constant (1 | block).

print(da.lmer)

The fit indices used in the analysis were n.marg (Nakagawa's marginal R²), n.cond (Nakagawa's conditional R²), rb.r2.1 (R&B $R^2_1$: Level-1 variance component explained by predictors), rb.r2.2 (R&B $R^2_2$: Level-2 variance component explained by predictors), sb.r2.1 (S&B $R^2_1$: Level-1 proportional reduction in error predicting scores at Level-1), and sb.r2.2 (S&B $R^2_2$: Level-2 proportional reduction in error predicting scores at Level-1). We can see that using rb.r2.1 and sb.r2.1 index, that shows influence of predictors on Level-1 variance, clearly nitrogen dominates over potassium and phosphate, and potassium dominates over phosphate.

s.da.lmer=summary(da.lmer)
s.da.lmer
sm.rb.r2.1=s.da.lmer$rb.r2.1$summary.matrix
# Nitrogen completely dominates  potassium
as.logical(na.omit(sm.rb.r2.1$N > sm.rb.r2.1$K))
# Nitrogen completely dominates  phosphate
as.logical(na.omit(sm.rb.r2.1$N > sm.rb.r2.1$P))
# Potassium completely dominates phosphate
as.logical(na.omit(sm.rb.r2.1$K > sm.rb.r2.1$P))

Logistic regression

Dominance analysis can be used in logistic regression (see Azen and Traxel, 2009).

As an example, we used the esoph dataset, that contains information about a case-control study of (o)esophageal cancer in Ille-et-Vilaine, France.

Looking at the report for standard glm summary method, we can see that the linear effect of each variable was significant (p < 0.05 for agegp.L, alcgp.L and tobgp.L), such as the quadratic effect of predictor age (p < 0.05 for agegp.Q). Even so,it is hard to identify which variable is more important to predict esophageal cancer.

glm.esoph<-glm(cbind(ncases,ncontrols)~agegp+alcgp+tobgp, esoph,family="binomial")
summary(glm.esoph)

We performed dominance analysis on this dataset and the results are shown below. The fit indices were r2.m ($R^2_M$: McFadden's measure), r2.cs ($R^2_{CS}$: Cox and Snell's measure), r2.n ($R^2_N$: Nagelkerke's measure) and r2.e ($R^2_E$: Estrella's measure). For all fit indices, we can conclude that age and alcohol completely dominate tobacco, while age shows general dominance over both alcohol and tobacco.

da.esoph<-dominanceAnalysis(glm.esoph)
print(da.esoph)
summary(da.esoph)

Then, we performed a bootstrap analysis. Using McFadden's measure (r2.m), we can see that bootstrap dominance of age over tobacco, and of alcohol over tobacco have standard errors (SE.Dij) near 0 and reproducibility (Rep) close to 1, so are fairly robust on all levels.Dominance values of age over alcohol are not easily reproducible and require more research

set.seed(1234)
da.b.esoph<-bootDominanceAnalysis(glm.esoph,R = 200)
print(format(summary(da.b.esoph)$r2.m,digits=3),row.names=F)

Set of predictors

Budescu (1993) shows that dominance analysis can be applied to groups or set of inseparable predictors. The Longley's economic regression data is know for have a highly collinear set on Employed variable. We can see that GNP.deflator, GNP, Population and Year are highly correlated.

data(longley)
round(cor(longley),2)

We can group GNP and employment related variables, to determine the importance of both groups of variables. The GNP related variables dominates completely population, and we can see that all predictors dominates generally over employment.

terms.r<-c(GNP.rel="GNP.deflator+GNP", 
           employment="Unemployed+Armed.Forces", 
           "Population", 
           "Year")
da.longley<-dominanceAnalysis(lm(Employed~.,longley),terms = terms.r)
print(da.longley)

Installation

You can install the stable version from CRAN

install.packages('dominanceanalysis')

Also, you can install the latest version from github with:

library(devtools)
install_github("clbustos/dominanceanalysis")

Authors

Claudio Bustos Navarrete: Creator and maintainer
Filipa Coutinho Soares: Documentation and testing

Acknowledgments

Daniel Schlaepfer: Reported an error in the logistic regression code.
Xiong Zhang: Contributed to the incorporation of dynamic linear models.
Maartje Hidalgo: Contributed to the incorporation of beta regression.

References

Budescu, D. V. (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114(3), 542-551. https://doi.org/10.1037/0033-2909.114.3.542
Azen, R., & Budescu, D. V. (2003). The dominance analysis approach for comparing predictors in multiple regression. Psychological Methods, 8(2), 129-148. https://doi.org/10.1037/1082-989X.8.2.129
Azen, R., & Budescu, D. V. (2006). Comparing Predictors in Multivariate Regression Models: An Extension of Dominance Analysis. Journal of Educational and Behavioral Statistics, 31(2), 157-180. https://doi.org/10.3102/10769986031002157
Azen, R., & Traxel, N. (2009). Using Dominance Analysis to Determine Predictor Importance in Logistic Regression. Journal of Educational and Behavioral Statistics, 34(3), 319-347. https://doi.org/10.3102/1076998609332754
Luo, W., & Azen, R. (2013). Determining Predictor Importance in Hierarchical Linear Models Using Dominance Analysis. Journal of Educational and Behavioral Statistics, 38(1), 3-31. https://doi.org/10.3102/1076998612458319
Shou, Y., & Smithson, M. (2015). Evaluating Predictors of Dispersion: A Comparison of Dominance Analysis and Bayesian Model Averaging. Psychometrika, 80(1), 236-256. https://doi.org/10.1007/s11336-013-9375-8