mvtboost example 2: Well-being

This document describes the analysis of psychological well-being by multivariate tree boosting from the paper (Miller, Lubke, McArtor, & Bergeman, 2015).

Introduction

In our exploratory analysis, we investigated the factors that influence six aspects of well-being: autonomy, environmental mastery, personal growth, positive relationships with others, purpose in life, and self-acceptance (Ryff & Keyes, 1995).

Predictors

Gender, age, income, and education were included as demographic predictors. The primary predictors of interest were chronic, somatic, and self-reported health, depression (positive and negative affect), perceived social control, control of internal states, sub-scales of dispositional resilience (commitment, control, and challenge), ego resilience, social support (friends and family), self-reported stress (problems, emotions), and loneliness. In total, 20 predictors were included in the analysis.

Fitting the model

#install.packages("mvtboost")
library(mvtboost)
data(wellbeing)

Y <- wellbeing[,21:26]
X <- wellbeing[,1:20]
Ys <- scale(Y)
ynames <- c("Autonomy","Environmental Mastery","Personal Growth","Positive Relationships","Purpose in Life","Self Acceptance")
xnames <- c("Gender","Age","Income","Education","Chronic Health","Somatic Health","Self Report Health","Positive Affect","Negative Affect","Perceived Social Control","Control Internal States","Commitment","Control","Challenge","Ego Resilience","Social Support - Friends","Social Support - Family","Stress-Problems","Stress-Emotions","Loneliness")
cont.id <- unlist(lapply(X,is.numeric))
Xs <- X
Xs[,cont.id] <- scale(X[,cont.id])
colnames(Xs) <- xnames
colnames(Ys) <- ynames
res <- mvtb(Y=Ys,X=Xs)

Tuning the model by 5-Fold CV

In gradient boosting, the number of trees, the shrinkage, and the tree depth are meta-parameters that are important to tune to improve the fit of the model. Typically, the shrinkage is fixed to a small value, and the optimal number of trees is chosen by cross-validation. This is illustrated below:

# tuning the model

res5 <- mvtb(Y=Ys,X=Xs,n.trees=1000,shrinkage=.05,cv.folds=5,compress=FALSE)
res5train <- mvtb(Y=Ys,X=Xs,n.trees=1000,shrinkage=.05,cv.folds=5,compress=FALSE,s=trainset)
save(res5,file="vignettes/wb_cv5.Rdata")
save(res5train,file="vignettes/wb_cv5test.Rdata")
res5 <- mvtb(Y=Ys,X=Xs,n.trees=1000,shrinkage=.05,cv.folds=5,compress=FALSE)

A set of observations can be explicitly specified as the training set by passing a vector of ids trainset to the argument s. Cross-validation will only occur within the training set.

set.seed(104)
trainset <- sample(1:nrow(Ys),size = 784,replace=FALSE)
res5train <- mvtb(Y=Ys,X=Xs,n.trees=1000,shrinkage=.05,cv.folds=5,compress=FALSE,s=trainset)

Computing the best number of trees

As with univariate gradient boosted trees, the number of trees can be chosen to minimize a test or cross-validation estimate of the prediction error. mvtb uses the multivariate MSE in a test set or (kth) fold as the estimate of multivariate prediction error.

res5$best.trees
summary(res5)
load("wb_cv5.Rdata")
load("wb_cv5test.Rdata")
res5$best.trees

Most procedures in the mvtboost package will by default automatically select the lowest number of trees provided by training, CV, or test error, which corresponds to a minimally complex model.

Showing training and cv error

Below we show the importance of choosing the model complexity to minimize the CV rather than training error.

plot(x=1:1000,y=res5$trainerr,type="l",ylab="Error",xlab="Number of trees")
abline(v=res5$best.trees$best.cv)
lines(x=1:1000,y=res5$cv.err,type="l",col="red")
legend("topright",legend=c("Training Error","Cross-Validation Error"),lty=c(1,1),col=c("black","red"),bty="n")

Increasing the depth of trees to 5 or 10 may also improve performance.

Interpret the model

One of the challenges of using multivariate decision tree ensembles is that the model is more difficult to interpret than a single tree. While tree boosting can be used to build a very accurate predictive model, it is potentially more important for researchers to interpret the effects of predictors. Below, we describe approaches that have been developed to

1. Relative influence

The influence (or variable importance) of each predictor can be used to identify 'important' predictors. It is defined as the reduction in sums of squared error due to any split on that predictor, summed over all trees in the model (Friedman, 2001). Usually the score is relative, expressed as a percent of the total sums of squared error reductions from all predictors.

Below, we compute the relative influences for the well-being data, and plot the influences using a heat map. Using the mvtb.ri function, the influence scores can sum to 100 for each outcome (relative='col') or across outcomes (relative='tot'). By default, importances are relative to the column.

summary(res5)
round(mvtb.ri(res5,relative = "tot"),2) 
numformat <- function(val){sub("^(-?)0.", "\\1.", sprintf("%.1f", val))}

par(mar=c(8,10,1,1))
mvtb.heat(t(mvtb.ri(res5)),clust.method = NULL,cexRow=1,cexCol=1,numformat=numformat)

For example, control of internal states affects all aspects of psychological well being except positive relationships with others. Like control of internal states, perceived stress-problems affects three aspects of well-being: self acceptance, purpose in life, and environmental mastery. Other patterns in the influences can be interpreted similarly.

2. Fit

As a check of the overall fit of the model, the $R^2$ in the test set can be computed for each dependent variable.

testset <- (1:nrow(Ys))[!(1:nrow(Ys) %in% trainset)]
yhat <- predict(res5train,newdata=Xs[testset,])
diag(var(yhat)/var(Ys[testset,]))

3. Covariance explained

It may also be informative to select a set of outcome variables that are associated with groups of predictors. One criterion for selecting outcome variables is to choose the outcome variables whose covariance can be explained by a function of a common set of predictors. The covariance explained in each pair of outcomes by predictors is estimated directly in mvtb.

A covariance-explained matrix be organized as a $Q(Q+1)/2 \times p$ table, where $Q$ is the number of outcomes, and $p$ is the number of predictors. Each element is the covariance explained in any pair of outcomes by a predictor. When the outcomes are standardized to unit variance, each element can be interpreted as the correlation explained in any pair of outcomes by a predictor.

This decomposition is similar to decomposing $R^2$ in multiple regression. When the trees of the ensemble are limited to a single split and the predictors are independent, this decomposition is exact, otherwise it is approximate.

For the well-being data, the covariance explained matrix is obtained directly from the fitted model: mvtb.covex(res5, Y=Ys, X=Xs). It can also be plotted as a heatmap, which we illustrate below:

par(mar=c(8,15,1,1),mfrow=c(1,1))
numformat <- function(val){sub("^(-?)0.", "\\1.", sprintf("%.2f", val))}
covex <- mvtb.covex(res5, Y=Ys, X=Xs)
mvtb.heat(covex[,-c(1:7)],cexRow=.9,numformat=numformat,clust.method = NULL)

Negative affect and stress problems have widespread effects on well-being. Control of internal states explains correlations across all dimensions, and is the primary explanatory predictor for autonomy.

Clustering covariance explained.

If the number of predictors/outcomes is large, interpreting the matrix is challenging. The covariance explained matrix can be clustered by grouping the predictors that explain covariance in similar pairs of outcomes. This is done by hierarchical clustering of the distance between columns (predictors) and the rows (pairs of outcomes).

Clustering the matrix can be done with mvtb.cluster. Below we plot the solution as a heatmap with mvtb.heat.

mvtb.cluster(covex)
par(mar=c(8,12,1,1),mfrow=c(1,1))
mvtb.heat(covex[,-c(1:7)],cexRow=.9)

4. Partial Dependence Plots

Partial dependence plots complement interpretations of relative influence by showing the direction and functional form of the effect of the predictor. A partial dependence plot shows the effect of the predictor averaging over (or integrating out) the effects of other predictors.

Here we show the univariate and multivariate perspective plots. The first plot shows that above-average control of internal states is associated with larger personal growth. The second shows the non-additive effect of control of internal states and perceived stress problems on self-acceptance.

par(mfcol=c(1,2),mar=c(5,5,4,1))
plot(res5,predictor.no=11,response.no=3,ylim=c(-1,1.5)) # persgrwth on cis
text(-4,1.825,labels="A",xpd=TRUE)
mvtb.perspec(res5,predictor.no=c(11,18),response.no=6)
text(-.5,.5,labels="B",xpd=TRUE)

5. Departures from additivity

Though decision trees are models of interactions, it is difficult to detect and interpret interaction effects from a decision tree ensemble. To address this issue, we can analyze the fitted values of the model. Following Elith et al. (2008), possible 2-way interactions can be detected by checking whether the fitted values of the model as a function of any pair of predictors deviates from a linear combination of the two predictors. Such departures indicate that the joint effect of the predictors is not additive, and indicate a non-linear effect or a possible interaction.

In greater detail: the fitted values of the model are computed for any pair of predictors, over a grid of all possible levels for the two variables. For continuous predictors, 100 sample values are taken. The fitted values are then regressed onto the grid. Large residuals from this model indicate the fitted values are not a linear combination of the predictors, demonstrating non-linearity or a possible interaction. For computational simplicity with many predictors, this might be done only for pairs of important variables.

We note that this approach is primarily a heuristic for interpreting the model. A variable with a non-additive effect (e.g. a non-linear effect like control of internal states) can produce bivariate departures from additivity which are not necessarily interactions.

There are several other methods for detecting departures, including 'grid', 'influence', and 'lm', following different suggestions. Generally these are still heuristics that give somewhat different results.

res.nl <- mvtb.nonlin(res5,Y=Ys,X=Xs)
save(res.nl,file="wb_nonlin.Rdata")
res.nl <- mvtb.nonlin(res5,Y=Ys,X=Xs)
load(file="wb_nonlin.Rdata")

Below, we compute the top 5 departures for each DV.

lapply(res.nl,function(r){head(r[[1]])})

References

Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.

Miller P.J., Lubke G.H, McArtor D.B., Bergeman C.S. (Submitted) Finding structure in data with multivariate tree boosting.

Ridgeway, G., Southworth, M. H., & RUnit, S. (2013). Package 'gbm'. Viitattu, 10, 2013.

Wallace, K. A., Bergeman, C. S., & Maxwell, S. E. (2002). Predicting well-being outcomes in later life: An application of classification and regression tree (CART) analysis.



Try the mvtboost package in your browser

Any scripts or data that you put into this service are public.

mvtboost documentation built on May 2, 2019, 2:14 p.m.