knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", error = FALSE, warning = FALSE, message = FALSE )
library(rwa) library(dplyr) library(ggplot2)
Relative Weights Analysis (RWA) is a powerful method for determining the relative importance of predictor variables in multiple regression models, particularly when dealing with multicollinearity. This vignette provides a comprehensive guide to using the rwa package, explaining the methodology, interpreting results, exploring advanced features, and evaluating the theoretical foundations and validity of the method.
Relative Weights Analysis addresses a fundamental problem in multiple regression: when predictor variables are correlated with each other (multicollinearity), it becomes difficult to determine the true contribution of each variable to the outcome. Traditional regression coefficients can be misleading, unstable, or difficult to interpret in the presence of multicollinearity.
RWA decomposes the total variance predicted in a regression model (R²) into weights that accurately reflect the proportional contribution of the various predictor variables. The method implemented in this package is based on Tonidandel and LeBreton (2015), with its original roots in Johnson (2000).
The core insight of RWA is to create a set of orthogonal (uncorrelated) variables that are maximally related to the original predictors, then use these in a regression model. The process involves:
This approach allows us to determine each variable's unique contribution to the explained variance, even when predictors are highly correlated.
RWA is particularly useful when:
RWA addresses multicollinearity by attributing shared (collinear) variance between predictors to those predictors in proportion to their structure with the outcome. Unlike traditional regression coefficients that reflect only unique contributions, RWA considers both direct and indirect (shared) effects.
The Johnson (2000) algorithm performs an eigenvalue decomposition of the predictor correlation matrix to create uncorrelated principal components, then regresses the outcome on these components. This procedure yields weights virtually identical to what would be obtained by averaging a predictor's incremental R² contribution over all possible subsets of predictors (the logic used in dominance analysis), but without brute-force search over subsets, making it computationally efficient.
Let's demonstrate this with a controlled example where we know the true relationships:
# Create controlled scenario to demonstrate RWA's theoretical properties set.seed(123) n <- 200 # Generate predictors with known correlation structure x1 <- rnorm(n) x2 <- 0.7 * x1 + 0.3 * rnorm(n) # r ≈ 0.7 with x1 x3 <- 0.5 * x1 + 0.8 * rnorm(n) # r ≈ 0.5 with x1 x4 <- rnorm(n) # Independent # True population model with known coefficients y <- 0.6 * x1 + 0.4 * x2 + 0.3 * x3 + 0.2 * x4 + rnorm(n, sd = 0.5) theory_data <- data.frame(y = y, x1 = x1, x2 = x2, x3 = x3, x4 = x4) # Compare traditional regression vs RWA lm_theory <- lm(y ~ x1 + x2 + x3 + x4, data = theory_data) rwa_theory <- rwa(theory_data, "y", c("x1", "x2", "x3", "x4")) # Show how multicollinearity affects traditional coefficients cat("True population contributions (designed into simulation):\n") true_contributions <- c(0.6, 0.4, 0.3, 0.2) names(true_contributions) <- c("x1", "x2", "x3", "x4") print(true_contributions) cat("\nStandardized regression coefficients (distorted by multicollinearity):\n") std_betas <- summary(lm_theory)$coefficients[2:5, "Estimate"] names(std_betas) <- c("x1", "x2", "x3", "x4") print(round(std_betas, 3)) cat("\nRWA weights (better reflect true importance despite correlations):\n") rwa_weights_theory <- rwa_theory$result$Raw.RelWeight names(rwa_weights_theory) <- rwa_theory$result$Predictors print(round(rwa_weights_theory, 3)) # Calculate correlation between methods and true values cor_with_true <- data.frame( Method = c("Std_Betas", "RWA_Weights"), Correlation_with_True = c( cor(abs(std_betas), true_contributions), cor(rwa_weights_theory[names(true_contributions)], true_contributions) ) ) print("Correlation with true population values:") print(cor_with_true)
RWA remains a widely accepted and useful method for evaluating predictor importance under multicollinearity. Since its introduction, numerous studies have validated that RWA provides a fair assessment of each variable's importance even in high-correlation settings.
Evidence from research practice:
- Organizational Psychology: Determining which employee factors predict performance when intercorrelated
- Marketing Research: Key driver analysis where product attributes tend to move together
- Educational Research: Assessing cognitive abilities that naturally correlate
- Healthcare Analytics: Understanding risk factors that often co-occur
Methodological validation: 1. RWA reveals important predictors that regression beta weights obscure due to multicollinearity 2. Rankings from RWA correlate > 0.99 with dominance analysis in most scenarios 3. Bootstrap confidence intervals provide robust significance testing 4. Computational efficiency allows analysis of large predictor sets
Let's start with a basic example using the built-in mtcars dataset to predict fuel efficiency (mpg):
# Basic RWA result_basic <- mtcars %>% rwa(outcome = "mpg", predictors = c("cyl", "disp", "hp", "gear")) # View the results result_basic$result
The basic output includes several key components:
# Predictor variables used result_basic$predictors # Model R-squared result_basic$rsquare # Number of complete observations result_basic$n # Correlation matrices (for advanced users) str(result_basic$RXX) # Predictor correlation matrix str(result_basic$RXY) # Predictor-outcome correlations
The main results table contains:
By default, results are automatically sorted by rescaled relative weights in descending order, making it easy to identify the most important predictors:
# Results are sorted by default (most important first) result_basic$result # Raw weights sum to R-squared sum(result_basic$result$Raw.RelWeight) result_basic$rsquare # Rescaled weights sum to 100% sum(result_basic$result$Rescaled.RelWeight)
You can control whether results are sorted using the sort parameter:
# Default behavior: sorted by importance (descending) result_sorted <- mtcars %>% rwa(outcome = "mpg", predictors = c("cyl", "disp", "hp", "gear")) result_sorted$result # Preserve original predictor order result_unsorted <- mtcars %>% rwa(outcome = "mpg", predictors = c("cyl", "disp", "hp", "gear"), sort = FALSE) result_unsorted$result
The applysigns parameter adds directional information to help interpret whether variables positively or negatively influence the outcome:
result_signs <- mtcars %>% rwa(outcome = "mpg", predictors = c("cyl", "disp", "hp", "gear"), applysigns = TRUE) result_signs$result
The package allows you to visualize the relative importance by piping the results to plot_rwa():
# Generate RWA results rwa_result <- mtcars %>% rwa(outcome = "mpg", predictors = c("cyl", "disp", "hp", "gear", "wt")) # Create plot rwa_result %>% plot_rwa() # The rescaled relative weights rwa_result$result
The rwa package includes powerful bootstrap functionality for determining the statistical significance of relative weights. For detailed coverage of bootstrap methods, see the dedicated vignette:
vignette("bootstrap-confidence-intervals", package = "rwa")
# Basic bootstrap analysis bootstrap_result <- mtcars %>% rwa(outcome = "mpg", predictors = c("cyl", "disp", "hp"), bootstrap = TRUE, n_bootstrap = 500) # Reduced for speed # View significant predictors bootstrap_result$result %>% filter(Raw.Significant == TRUE) %>% select(Variables, Rescaled.RelWeight, Raw.RelWeight.CI.Lower, Raw.RelWeight.CI.Upper)
For comprehensive coverage of bootstrap methods, advanced features, and best practices, consult the bootstrap vignette.
Let's explore a more complex example using the diamonds dataset:
# Analyze diamond price drivers diamonds_subset <- diamonds %>% select(price, carat, depth, table, x, y, z) %>% sample_n(1000) # Sample for faster computation diamond_rwa <- diamonds_subset %>% rwa(outcome = "price", predictors = c("carat", "depth", "table", "x", "y", "z"), applysigns = TRUE) diamond_rwa$result
For bootstrap analysis of this example with confidence intervals, see:
vignette("bootstrap-confidence-intervals", package = "rwa")
Let's compare RWA results with traditional multiple regression to highlight the differences:
# Traditional regression lm_model <- lm(mpg ~ cyl + disp + hp + gear, data = mtcars) lm_summary <- summary(lm_model) # Display regression summary print(lm_summary) # RWA results rwa_model <- mtcars %>% rwa(outcome = "mpg", predictors = c("cyl", "disp", "hp", "gear")) # Compare importance rankings comparison <- data.frame( Variable = rwa_model$predictors, RWA_Rescaled = rwa_model$result$Rescaled.RelWeight, RWA_Rank = rank(-rwa_model$result$Rescaled.RelWeight) ) print(comparison)
Traditional standardized beta coefficients reflect unique contribution holding others constant, which severely underestimates variables that share variance with others. RWA gives credit to predictors for variance they share with the outcome both individually and jointly with other variables.
Stepwise methods arbitrarily pick one variable from a collinear set and drop others, potentially leading to incorrect conclusions about importance. Unlike stepwise selection, RWA does not throw away information – it shows how much each predictor contributes given the complete set.
While PCA addresses multicollinearity, it buries the identity of individual predictors in composite factors. RWA uses orthogonalization behind the scenes but returns results in terms of original predictors, maintaining interpretability.
RWA's greatest achievement is producing almost identical results to dominance analysis but far more efficiently. Research shows:
# Demonstrate computational considerations predictors <- c("cyl", "disp", "hp", "gear") n_predictors <- length(predictors) cat("Number of predictors:", n_predictors, "\n") cat("Dominance analysis would require", 2^n_predictors, "subset models\n") cat("RWA solves this in a single matrix operation\n") # Show RWA speed (for demonstration) start_time <- Sys.time() rwa_speed_test <- mtcars %>% rwa(outcome = "mpg", predictors = predictors) end_time <- Sys.time() cat("RWA computation time:", round(as.numeric(end_time - start_time, units = "secs"), 4), "seconds\n")
RWA is fundamentally a descriptive, variance-partitioning tool. It tells us "how much prediction was attributable to X" but does not imply causation. As Tonidandel & LeBreton emphasize: relative weights "are not causal indicators and thus do not necessarily dictate a course of action."
RWA should enhance interpretation of regression models but decisions must still be guided by substantive theory.
RWA is not a magic bullet for extreme multicollinearity stemming from redundant predictors. When predictors measure essentially the same construct, RWA will split the contribution between them, potentially making each appear less important individually.
# Demonstrate the redundancy limitation set.seed(456) x1_orig <- rnorm(100) x1_dup <- x1_orig + rnorm(100, sd = 0.05) # Nearly identical (r ≈ 0.99) y_simple <- 0.8 * x1_orig + rnorm(100, sd = 0.5) redundant_data <- data.frame(y = y_simple, x1_original = x1_orig, x1_duplicate = x1_dup) cat("Correlation between 'different' predictors:", cor(x1_orig, x1_dup), "\n") # RWA correctly splits redundant variance redundant_rwa <- rwa(redundant_data, "y", c("x1_original", "x1_duplicate")) print("RWA with redundant predictors:") print(redundant_rwa$result) cat("\nEach variable appears less important individually,") cat("\nbut together they account for most variance.\n") cat("Combined contribution:", sum(redundant_rwa$result$Raw.RelWeight), "\n")
This is statistically appropriate – it reflects that one variable alone could do the job of both. Users must recognize that the pair as a whole may be very important even if each alone has a modest weight.
RWA shares multiple regression's sensitivity to sample size and predictor-to-observation ratios. Bootstrap confidence intervals help quantify uncertainty, but "bootstrapping will not overcome the inherent limitations associated with a small sample size."
# Check your sample size n_obs <- mtcars %>% select(mpg, cyl, disp, hp, gear) %>% na.omit() %>% nrow() cat("Sample size:", n_obs) cat("\nRule of thumb: At least 5-10 observations per predictor")
RWA works best with:
Use RWA when:
Consider alternatives when:
When reporting RWA results, include:
For bootstrap confidence intervals and advanced statistical considerations, see:
vignette("bootstrap-confidence-intervals", package = "rwa")
If you encounter extreme multicollinearity:
# Check correlation matrix cor_matrix <- mtcars %>% select(cyl, disp, hp, gear) %>% cor() # Look for high correlations (>0.9) high_cor <- which(abs(cor_matrix) > 0.9 & cor_matrix != 1, arr.ind = TRUE) if(nrow(high_cor) > 0) { cat("High correlations detected between variables") }
RWA handles missing data through listwise deletion:
# Check for missing data patterns missing_summary <- mtcars %>% select(mpg, cyl, disp, hp, gear) %>% summarise_all(~sum(is.na(.))) print(missing_summary)
This package and methodology are based on the following key references:
Primary Sources:
Johnson, J. W. (2000). A heuristic method for estimating the relative weight of predictor variables in multiple regression. Multivariate Behavioral Research, 35(1), 1-19. DOI: 10.1207/S15327906MBR3501_1
Tonidandel, S., & LeBreton, J. M. (2015). RWA Web: A free, comprehensive, web-based, and user-friendly tool for relative weight analyses. Journal of Business and Psychology, 30(2), 207-216. DOI: 10.1007/s10869-014-9351-z
Additional Reading:
Azen, R., & Budescu, D. V. (2003). The dominance analysis approach for comparing predictors in multiple regression. Psychological Methods, 8(2), 129-148.
Grömping, U. (2006). Relative importance for linear regression in R: The package relaimpo. Journal of Statistical Software, 17(1), 1-27.
Johnson, J. W., & LeBreton, J. M. (2004). History and use of relative importance indices in organizational research. Organizational Research Methods, 7(3), 238-257.
For bootstrap methods and statistical significance testing, see the dedicated bootstrap vignette and its references.
For the latest methodological developments: https://www.scotttonidandel.com/rwa-web/
The rwa package provides a user-friendly implementation of Relative Weights Analysis in R with seamless tidyverse integration. This vignette covered both practical implementation and theoretical foundations.
Methodological Strengths:
Important Limitations:
RWA is particularly valuable for:
RWA provides clear, quantifiable answers to questions about predictor importance in a multicollinear world – answers that are often obscured when using standard regression approaches. With proper understanding of its assumptions and limitations, RWA remains an essential tool for researchers and practitioners dealing with correlated predictors.
Whether you're conducting key drivers analysis in market research, exploring variable importance in predictive modeling, or addressing multicollinearity in academic research, RWA provides valuable insights that complement traditional regression approaches.
For advanced bootstrap methods and statistical significance testing, see:
vignette("bootstrap-confidence-intervals", package = "rwa")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.