knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this example, we will explore how score-based SEM trees can be used. Score-based tests for variable and split-point selection are preferable because they are fast to compute, perform unbiased variable selection, and have better statistical power than some other selection algorithms proposed earlier.
For this illustration, we will use the affect
dataset and a simple (non-latent) SEM, which just has one observed variable. Using such a simple model is similar to using a conventional decision tree or random forest; however, the SEM tree version not only finds differences in the mean prediction across leafs but also in the variances (ie., individual differences) of the criterion variable.
Load affect dataset from the psychTools
package. These are data from two studies conducted in the Personality, Motivation and Cognition Laboratory at Northwestern University to study affect dimensionality and the relationship to various personality dimensions. In the following code, we replace the numeric film labels with the movie titles. Also, we select a subset of predictors including the film, personality dimensions, trait anxiety and affect before having watched the movie.
library(psychTools) data(affect) affect$Film <- factor(affect$Film, ordered = FALSE, labels=c("Frontline", "Halloween", "Nat. Geographic","Parenthood")) tree.data <- affect[,c("Film","neur","ext","soc","traitanx","NA1","PA1")] tree.data$DeltaPA <- affect$PA2-affect$PA1 knitr::kable(head(tree.data))
Here, we create a simple SEM with a single observed variable. No latent variables - only two parameters: mean of DeltaPA
and variance of DeltaPA
.
library(OpenMx) manifests<-c("DeltaPA") latents<-c() model <- mxModel("Simple Model", type="RAM", manifestVars = manifests, latentVars = latents, mxPath(from="one",to=manifests, free=c(TRUE), value=c(1.0) , arrows=1, label=c("mu") ), mxPath(from=manifests,to=manifests, free=c(TRUE), value=c(1.0) , arrows=2, label=c("sigma2") ), mxData(tree.data, type = "raw") ); result <- mxRun(model) summary(result)
Use score-based tests to create the tree. Use Bonferroni-correction to adjust for multiple testing of predictors.
library(semtree) ctrl = semtree.control( method="score", bonferroni = TRUE)
tree = semtree( model = result, data = tree.data, control=ctrl)
Now let us plot the tree.
plot(tree)
Implied mixture model:
tndata <- semtree::getTerminalNodes(tree) cols <- viridis::plasma(nrow(tndata)) pl <- ggplot2::ggplot(data = data.frame(x = c(-20, 20)), ggplot2::aes(x))+ ggplot2::xlab("Change in Positive Affect") for (i in 1:nrow(tndata)) { pl <- pl + ggplot2::stat_function(fun = dnorm, n = 101, col=cols[i], args = list(mean = tndata[i,2], sd = sqrt(tndata[i,1]))) } plot(pl)
Let us inspect the group with the largest negative change. This is the group of participants, who started out with a high positive affect and then had to watch a war movie.
i <- which.min(tndata$mu) pl <- pl +ggplot2::geom_area(stat = "function", fun = function(x){dnorm(x,mean=tndata[i,2],sd=sqrt(tndata[i,1]))}, fill = cols[i])+ ggplot2::geom_label(x=-10,y=.12, label="People with very high positive \naffect, which then watched\n a war film") plot(pl)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.