# varimp: Variable Importance In npreg: Nonparametric Regression via Smoothing Splines

## Description

Computes variable importance indices for terms of a smooth model.

## Usage

 `1` ```varimp(object, combine = TRUE) ```

## Arguments

 `object` an object of class "sm" output by the `sm` function or an object of class "gsm" output by the `gsm` function. `combine` a switch indicating if the parametric and smooth components of the importance should be combined (default) or returned separately.

## Details

Suppose that the function can be written as

η = η_0 + η_1 + η_2 + ... + η_p

where η_0 is a constant (intercept) term, and η_j denotes the j-th effect function, which is assumed to have mean zero. Note that η_j could be a main or interaction effect function for all j = 1, ..., p.

The variable importance index for the j-th effect term is defined as

π_j = (η_j^\top η_*) / (η_*^\top η_*)

where η_* = η_1 + η_2 + ... + η_p. Note that ∑_{j = 1}^p π_j = 1 but there is no guarantee that π_j > 0.

If all π_j are non-negative, then π_j gives the proportion of the model's R-squared that can be accounted for by the j-th effect term. Thus, values of π_j closer to 1 indicate that η_j is more important, whereas values of π_j closer to 0 (including negative values) indicate that η_j is less important.

## Value

If `combine = TRUE`, returns a named vector containing the importance indices for each effect function (in `object\$terms`).

If `combine = FALSE`, returns a data frame where the first column gives the importance indices for the `p`arametric components and the second column gives the importance indices for the `s`mooth (nonparametric) components.

## Note

When `combine = FALSE`, importance indices will be equal to zero for non-existent components of a model term. For example, a `nominal` effect does not have a parametric component, so the `\$p` component of the importance index for a nominal effect will be zero.

## Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

## References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. doi: 10.1007/978-1-4614-5369-7

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. doi: 10.4135/9781526421036885885

See `summary.sm` for more thorough summaries of smooth models.
See `summary.gsm` for more thorough summaries of generalized smooth models.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76``` ```########## EXAMPLE 1 ########## ### 1 continuous and 1 nominal predictor # generate data set.seed(1) n <- 100 x <- seq(0, 1, length.out = n) z <- factor(sample(letters[1:3], size = n, replace = TRUE)) fun <- function(x, z){ mu <- c(-2, 0, 2) zi <- as.integer(z) fx <- mu[zi] + 3 * x + sin(2 * pi * x) } fx <- fun(x, z) y <- fx + rnorm(n, sd = 0.5) # define marginal knots probs <- seq(0, 0.9, by = 0.1) knots <- list(x = quantile(x, probs = probs), z = letters[1:3]) # fit correct (additive) model sm.add <- sm(y ~ x + z, knots = knots) # fit incorrect (interaction) model sm.int <- sm(y ~ x * z, knots = knots) # true importance indices eff <- data.frame(x = 3 * x + sin(2 * pi * x), z = c(-2, 0, 2)[as.integer(z)]) eff <- scale(eff, scale = FALSE) fstar <- rowSums(eff) colSums(eff * fstar) / sum(fstar^2) # estimated importance indices varimp(sm.add) varimp(sm.int) ########## EXAMPLE 2 ########## ### 4 continuous predictors ### additive model # generate data set.seed(1) n <- 100 fun <- function(x){ sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(4*pi*x[,4]) } data <- as.data.frame(replicate(4, runif(n))) colnames(data) <- c("x1v", "x2v", "x3v", "x4v") fx <- fun(data) y <- fx + rnorm(n) # define ssa knot indices knots.indx <- c(bin.sample(data\$x1v, nbin = 10, index.return = TRUE)\$ix, bin.sample(data\$x2v, nbin = 10, index.return = TRUE)\$ix, bin.sample(data\$x3v, nbin = 10, index.return = TRUE)\$ix, bin.sample(data\$x4v, nbin = 10, index.return = TRUE)\$ix) # fit correct (additive) model sm.add <- sm(y ~ x1v + x2v + x3v + x4v, data = data, knots = knots.indx) # fit incorrect (interaction) model sm.int <- sm(y ~ x1v * x2v + x3v + x4v, data = data, knots = knots.indx) # true importance indices eff <- data.frame(x1v = sin(pi*data[,1]), x2v = sin(2*pi*data[,2]), x3v = sin(3*pi*data[,3]), x4v = sin(4*pi*data[,4])) eff <- scale(eff, scale = FALSE) fstar <- rowSums(eff) colSums(eff * fstar) / sum(fstar^2) # estimated importance indices varimp(sm.add) varimp(sm.int) ```