In SWotherspoon/bmixlm: Binary Mixtures of Linear Models

knitr::opts_chunk$set(echo=TRUE,fig.width=6,fig.height=5)

Model

The bmixlm model assumes that there are two linear models [ \begin{align} Y_{1} &\sim \operatorname{N}(X_{1} \beta_{1}, \sigma_{1}^{2})\ Y_{2} &\sim \operatorname{N}(X_{2} \beta_{2}, \sigma_{2}^{2}) \end{align} ] and a set of latent binary indicators governed by a probit model [ \begin{align} b &\sim \operatorname{Bin}(1,p)\ \Phi^{-1}(p) &= X_{p} \beta_{p} \end{align} ] where the design matrices (X_{1}), (X_{2}) and (X_{p}) are generated from a common set of covariates (x_{1}, x_{2},\ldots,x_{n}), and the observed response (y) is [ y_{i} \begin{cases} Y_{1i} & b_{i} = 0\ Y_{2i} & b_{i} = 1 \end{cases} ]

Simulation

We demonstrate bmixlm with a simulated data set.

## Generate 7 uniformly distributed covariates
set.seed(31)
d <- as.data.frame(matrix(runif(1000*7),1000,7))
colnames(d) <- letters[seq_along(d)]

Generate the response for the first component from covariates a, b and c, and the response for the second from c, d and e.

## Generate responses from the two models
sigma <- c(0.4,0.6)
beta1 <- c(runif(1,-0.4,0.4),rnorm(3))
y1 <- model.matrix(~a+b+c,d)%*%beta1+rnorm(nrow(d),0,sigma[1])
beta2 <- c(runif(1,-0.4,0.4),rnorm(3))
y2 <- model.matrix(~c+d+e,d)%*%beta2+rnorm(nrow(d),0,sigma[2])

The observed response is a mixture of the two responses, with the probability of component membership determined by a probit model dependent upon covariates f and g

## Draw the observed response from one model or the other
betap <- c(0,1,-1)
p <- pnorm(model.matrix(~f+g,d)%*%betap)
b <- rbinom(nrow(d),1,p)
d$y <- ifelse(b==0,y1,y2)

In this simulated example the two components of the mixture are not particularly distinct

## Show the two components and the mixture
library(ggplot2)
ggplot(data.frame(comp=factor(c(b+1,rep("mixture",nrow(d)))),
                  y=c(d$y,d$y)),
       aes(x=y,group=comp,colour=comp))+
  geom_density()

Fitting

Fit the model that correctly assumes that the response in the first component is dependent upon covariates a, b and c, the response in the second is dependent upon covariates c, d and e and the probability of the observation is drawn from one component or the other is dependent upon covariates f and g. The model is fit by Gibbs sampling and we generate an initial sample of 100 draws

## Draw a burnin sample
library(bmixlm)
fit <- bmixlm(y~a+b+c,y~c+d+e,~f+g,data=d,nsamp=100)

and examine traceplots to determine if the burn-in is sufficient

## Traceplots
plot(fit,which="comp1")
plot(fit,which="comp2")
plot(fit,which="probit")
plot(fit,which="error")

Update the fit to draw a larger sample. Unless otherwise specified, sampling starts from the last sample drawn in the previous fit

## Draw a larger sample
fit <- update(fit,nsamp=2000)
summary(fit)

The summary is consistent with the true parameter values

beta1
beta2
betap
sigma

Again examine traceplots

## Traceplots
plot(fit,which="comp1")
plot(fit,which="comp2")
plot(fit,which="probit")
plot(fit,which="error")

The predictAll function calculates a full range of fitted values

## Fitted values, residuals, and probability of component membership
d.pr <- predictAll(fit)
head(d.pr)

This yields

y the observed response from the prediction data
y1 the prediction for a response from the first component
y2 the prediction for a response from the second component
r1 the residual if the response was from the first component
r2 the residual if the response was from the second component
p the probability the response was drawn from the second component conditional on only the covariates
q the probability the response was drawn from the second component conditional on the covariates and the response itself.

Plot residuals versus fitted values for both components, colour coded by the probability of component membership

## Residuals vs fitted values
d.rf <- data.frame(comp=rep(1:2,each=nrow(d.pr)),
                   fitted=c(d.pr$y1,d.pr$y2),
                   residual=c(d.pr$r1,d.pr$r2),
                   prob=c(1-d.pr$q,d.pr$q))
library(ggplot2)
ggplot(d.rf[order(d.rf$prob),],
       aes(x=fitted,y=residual,colour=prob)) +
  geom_point(size=1)+
  facet_wrap(~comp,ncol=1)

Show the estimated decomposition into components

## Decompose response into components
cl <- classify(fit)
library(ggplot2)
ggplot(data.frame(comp=factor(c(ifelse(cl$q < 0.5,"1","2"),
                                rep("mixture",nrow(d)))),
                  y=rep(d$y,2)),
       aes(x=y,group=comp,colour=comp))+
  geom_density()

The simulate function simulates new observations from the posterior predictive distribution

## Simulate from posterior predictive distribution
ys <- simulate(fit,nsim=500)

The fit of the model can be assessed by fitting the original observations against the 95% credible interval generated from the posterior predictive distribution, coloured by the certainty of component membership

## Show observations against simulations
cl <- classify(fit)
d.pr <- as.data.frame(t(apply(ys,1,quantile,prob=c(0.025,0.5,0.975))))
d.pr <-cbind(d.pr,y=d$y,x=order(order(d.pr$`50%`)),cert=pmax(cl$q,1-cl$q))
library(ggplot2)
ggplot(d.pr,aes(x=x,y=y,ymin=`2.5%`,ymax=`97.5%`,colour=cert))+
  geom_ribbon(col="grey80",fill="grey80")+
  geom_point(size=1)

Component Relabelling

It is possible to fit a simple two component mixture model by specifying no covariates for any components of the model.

## Fit a simple two component mixture
fit0 <- bmixlm(y~1,y~1,~1,data=d,nsamp=100)
fit0 <- update(fit0,nsamp=2000)
summary(fit0)

Models in which the linear models for the two components depend on identical sets of covariates must be interpreted with care. When there is nothing to distinguish the models for the two components, the two components become interchangeable and the model is subject to the relabelling problem common to MCMC treatments of simple mixture models.

WAIC

Models can be compared based on WAIC. If an important covariate is removed from the model, typically the WAIC increases substantially.

fit1 <- update(fit,formula1=y~a+b,nsamp=100)
fit1 <- update(fit1,nsamp=2000)
summary(fit1)

But when an additional covariate is included in the model the change in WAIC is more subtle

fit2 <- update(fit,formula1=y~a+b+c+d,nsamp=100)
fit2 <- update(fit2,nsamp=2000)
summary(fit2)

The summary table for the original fit suggests c may not be required in the model for the second component. However if this term is deleted the WAIC increases, suggsting this term should be retained

fit3 <- update(fit,formula2=y~d+e,nsamp=100)
fit3 <- update(fit3,nsamp=2000)
summary(fit3)

Model Building

One approach to model building is to fit almost full models for each component, ensuring that the two component models differ so that the components are differentiated

## Draw a burnin sample
fit0 <- bmixlm(y~a+b+c+e+f+g,y~b+c+d+e+f+g,~a+b+c+d+e+f+g,data=d,nsamp=100)
fit0 <- update(fit0,nsamp=2000)
summary(fit0)

and then form a baseline model by deleting terms for which there is not strong evidence

fit0 <- update(fit0,formula1=y~a+b+c,formula2=y~d+e,formulap=~f+g)
summary(fit0)

Alternately, given a reasonable initial model, the data can be classified into the two components and classical techniques used to suggest terms for the model components

q <- classify(fit0)$q
summary(step(lm(qnorm(q)~a+b+c+d+e+f+g,data=d),trace=FALSE))
summary(step(lm(y~a+b+c+d+e+f+g,data=d[q<0.4,]),trace=FALSE))
summary(step(lm(y~a+b+c+d+e+f+g,data=d[q>0.6,]),trace=FALSE))