knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "figures/" ) library(ghp)

GHP stands for General Hierarchical Partitioning. `ghp`

is an implementation of the technique of hierarchical partitioning first mentioned by Chevan and Sutherland (1991). This method fits all possible models for a set of covariates and then extracts a goodness of fit (e.g. $R^2$ for linear models) to obtain independent and joint contributions of the independent variables on the selected figure.

This package is an extension of the `hier.part`

R package, developed by C. Walsh and R. Mac Nally in 2003. While `hier.part`

is fast and simple at what it does, it is limited in the range of possible models as well as goodness of fit figures. Specifically, the motivation of this package is the ability to do deviance partitioning.

You can install ghp from github with:

# install.packages("devtools") devtools::install_github("Stan125/ghp")

Just call the ghp function with the name of the dependent variable (arg: `dep`

) and a data.frame with all relevant variables to obtain the independent and joint effects of the explanatory covariates.

india <- ghp::india results_lm <- ghp(depname = "stunting", india, method = "lm", gof = "r.squared") results_lm

The first dataframe captures the actual mean influence of the variable on the goodness-of-fit. Also, joint effects are calculated. The second dataframe shows the percentage influence. We can see that `cage`

has the highest influence with (~43%).

It is now possible to do deviance partitiong of gamlss models. Gamlss models can model multiple parameters of a distribution. `ghp`

can handle up to two modeled parameters, so you can find out what influence covariates have on the second modeled parameter (e.g. the variance).

results_gamlss <- ghp("stunting", india, method = "gamlss", gof = "deviance", npar = 2) results_gamlss

Since 0.3.0 you can specify variable groups. The partitioning is now not happening with specific variables, but by testing all group combinations. In the given `ghp::india`

dataset, which captures the nutrition of children in india we can now divide all covariates into to groups: those that give information about the child, and those that give information about the mother. Let's try that out:

# Specifying the groups should happen in a data.frame groupings <- data.frame(varnames = colnames(india), groups = c("0", "child", "child", "mother", "child", "mother", "mother", "mother")) results_groups <- ghp(depname = "stunting", india, method = "lm", gof = "r.squared", group_df = groupings) results_groups

We can now see that both groups have almost the same amount of influence on the $R^2$.

To get a bar plot of the percentage independent effects, use `plot_ghp()`

:

plot_ghp(results_lm) plot_ghp(results_gamlss) + ggplot2::scale_fill_grey()

Since `0.4.0`

, `ghp`

is almost as fast as its counterpart `hier.part`

, because the core partitioning was written with C++. A quick comparison:

hp <- function() hier.part::hier.part(india$stunting, dplyr::select(india,-stunting), gof = "Rsqu", barplot = FALSE) ghp <- function() ghp::ghp("stunting", india, method = "lm", gof = "r.squared") microbenchmark::microbenchmark(hp, ghp)

This README.Rmd was run on:

```
date()
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.