knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dpi=300, out.width="50%" ) library(ggplot2)

We first generate a mixture of bivariate normal distributions. The distributions differ only by their x- and y-displacement, that is, by their mean values. There are two predictors `grp1`

and `grp2`

which predict the differences in means. `grp1`

predicts differences in the first dimension and `grp2`

predicts differences in the second dimension. Without focus parameter, both predictors are needed to distinguish all four groups. If one of the two means is chosen as a focus parameter, only one of the two predictors is important.

library(semtree) set.seed(123) N <- 1000 grp1 <- factor(sample(x = c(0,1), size=N, replace=TRUE)) grp2 <- factor(sample(x = c(0,1), size=N, replace=TRUE)) noise <- factor(sample(x = c(0,1),size=N, replace=TRUE)) Sigma <- matrix(byrow=TRUE, nrow=2,c(2,0.2, 0.2,1)) obs <- MASS::mvrnorm(N,mu=c(0,0), Sigma=Sigma) obs[,1] <- obs[,1] + ifelse(grp1==1,3,0) obs[,2] <- obs[,2] + ifelse(grp2==1,3,0) df.biv <- data.frame(obs, grp1, grp2, noise) names(df.biv)[1:2] <- paste0("x",1:2) manifests<-c("x1","x2")

The following code specifies a bivariate Gaussian model with five parameters:

model.biv <- mxModel("Bivariate_Model", type="RAM", manifestVars = manifests, latentVars = c(), mxPath(from="x1",to=c("x1","x2"), free=c(TRUE,TRUE), value=c(1.0,.2) , arrows=2, label=c("VAR_x1","COV_x1_x2") ), mxPath(from="x2",to=c("x2"), free=c(TRUE), value=c(1.0) , arrows=2, label=c("VAR_x2") ), mxPath(from="one",to=c("x1","x2"), label=c("mu1","mu2"), free=TRUE, value=0, arrows=1), mxData(df.biv, type = "raw") ); result <- mxRun(model.biv) summary(result)

This is how the data look in a 2D space:

df.biv.pred <- data.frame(df.biv, leaf=factor(as.numeric(df.biv$grp2)*2+as.numeric(df.biv$grp1))) ggplot(data = df.biv.pred, aes(x=x1, y=x2, group=leaf))+ geom_density_2d(aes(colour=leaf))+ viridis::scale_color_viridis(discrete=TRUE)+ theme_classic()

Now, we choose the mean of the second dimension `mu2`

as focus parameter. We expect that only predictor `grp2`

. This is what we see in a single tree.

fp <- "mu2" # predicted by grp2 #fp <- "mu1" # predicted by grp1 tree.biv <- semtree(model.biv, data=df.biv, constraints = list(focus.parameters=fp))

```
plot(tree.biv)
```

Now, we are repeating the same analysis in a forest.

forest <- semforest(model.biv, data=df.biv, constraints = list(focus.parameters=fp), control=semforest.control(num.trees=10, control=semtree.control(method="score",alpha=1)))

By default, we see that individual trees are fully grown (without a p-value threshold). The first split is according to `grp2`

because it best explains the group differences. Subsequent splits are according to `grp1`

even though the chi2 values are close to zero. They only appear because there is no p-value-based stopping criterion.

plot(forest$forest[[1]])

Now, let us investigate the permutation-based variable importance:

vim <- varimp(forest, method="permutationFocus") plot(vim, main="Variable Importance")

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.