knitr::opts_chunk$set(echo=TRUE,fig.width=6,fig.height=5)
The bmixlm
model assumes that there are two linear models
[
\begin{align}
Y_{1} &\sim \operatorname{N}(X_{1} \beta_{1}, \sigma_{1}^{2})\
Y_{2} &\sim \operatorname{N}(X_{2} \beta_{2}, \sigma_{2}^{2})
\end{align}
]
and a set of latent binary indicators governed by a probit model
[
\begin{align}
b &\sim \operatorname{Bin}(1,p)\
\Phi^{-1}(p) &= X_{p} \beta_{p}
\end{align}
]
where the design matrices (X_{1}), (X_{2}) and (X_{p}) are
generated from a common set of covariates (x_{1},
x_{2},\ldots,x_{n}), and the observed response (y) is
[
y_{i}
\begin{cases}
Y_{1i} & b_{i} = 0\
Y_{2i} & b_{i} = 1
\end{cases}
]
We demonstrate bmixlm
with a simulated data set.
## Generate 7 uniformly distributed covariates set.seed(31) d <- as.data.frame(matrix(runif(1000*7),1000,7)) colnames(d) <- letters[seq_along(d)]
Generate the response for the first component from covariates a, b and c, and the response for the second from c, d and e.
## Generate responses from the two models sigma <- c(0.4,0.6) beta1 <- c(runif(1,-0.4,0.4),rnorm(3)) y1 <- model.matrix(~a+b+c,d)%*%beta1+rnorm(nrow(d),0,sigma[1]) beta2 <- c(runif(1,-0.4,0.4),rnorm(3)) y2 <- model.matrix(~c+d+e,d)%*%beta2+rnorm(nrow(d),0,sigma[2])
The observed response is a mixture of the two responses, with the probability of component membership determined by a probit model dependent upon covariates f and g
## Draw the observed response from one model or the other betap <- c(0,1,-1) p <- pnorm(model.matrix(~f+g,d)%*%betap) b <- rbinom(nrow(d),1,p) d$y <- ifelse(b==0,y1,y2)
In this simulated example the two components of the mixture are not particularly distinct
## Show the two components and the mixture library(ggplot2) ggplot(data.frame(comp=factor(c(b+1,rep("mixture",nrow(d)))), y=c(d$y,d$y)), aes(x=y,group=comp,colour=comp))+ geom_density()
Fit the model that correctly assumes that the response in the first component is dependent upon covariates a, b and c, the response in the second is dependent upon covariates c, d and e and the probability of the observation is drawn from one component or the other is dependent upon covariates f and g. The model is fit by Gibbs sampling and we generate an initial sample of 100 draws
## Draw a burnin sample library(bmixlm) fit <- bmixlm(y~a+b+c,y~c+d+e,~f+g,data=d,nsamp=100)
and examine traceplots to determine if the burn-in is sufficient
## Traceplots plot(fit,which="comp1") plot(fit,which="comp2") plot(fit,which="probit") plot(fit,which="error")
Update the fit to draw a larger sample. Unless otherwise specified, sampling starts from the last sample drawn in the previous fit
## Draw a larger sample fit <- update(fit,nsamp=2000) summary(fit)
The summary is consistent with the true parameter values
beta1 beta2 betap sigma
Again examine traceplots
## Traceplots plot(fit,which="comp1") plot(fit,which="comp2") plot(fit,which="probit") plot(fit,which="error")
The predictAll
function calculates a full range of fitted values
## Fitted values, residuals, and probability of component membership d.pr <- predictAll(fit) head(d.pr)
This yields
y
the observed response from the prediction datay1
the prediction for a response from the first componenty2
the prediction for a response from the second componentr1
the residual if the response was from the first componentr2
the residual if the response was from the second componentp
the probability the response was drawn from the second component
conditional on only the covariatesq
the probability the response was drawn from the second component
conditional on the covariates and the response itself.Plot residuals versus fitted values for both components, colour coded by the probability of component membership
## Residuals vs fitted values d.rf <- data.frame(comp=rep(1:2,each=nrow(d.pr)), fitted=c(d.pr$y1,d.pr$y2), residual=c(d.pr$r1,d.pr$r2), prob=c(1-d.pr$q,d.pr$q)) library(ggplot2) ggplot(d.rf[order(d.rf$prob),], aes(x=fitted,y=residual,colour=prob)) + geom_point(size=1)+ facet_wrap(~comp,ncol=1)
Show the estimated decomposition into components
## Decompose response into components cl <- classify(fit) library(ggplot2) ggplot(data.frame(comp=factor(c(ifelse(cl$q < 0.5,"1","2"), rep("mixture",nrow(d)))), y=rep(d$y,2)), aes(x=y,group=comp,colour=comp))+ geom_density()
The simulate
function simulates new observations from the posterior
predictive distribution
## Simulate from posterior predictive distribution ys <- simulate(fit,nsim=500)
The fit of the model can be assessed by fitting the original observations against the 95% credible interval generated from the posterior predictive distribution, coloured by the certainty of component membership
## Show observations against simulations cl <- classify(fit) d.pr <- as.data.frame(t(apply(ys,1,quantile,prob=c(0.025,0.5,0.975)))) d.pr <-cbind(d.pr,y=d$y,x=order(order(d.pr$`50%`)),cert=pmax(cl$q,1-cl$q)) library(ggplot2) ggplot(d.pr,aes(x=x,y=y,ymin=`2.5%`,ymax=`97.5%`,colour=cert))+ geom_ribbon(col="grey80",fill="grey80")+ geom_point(size=1)
It is possible to fit a simple two component mixture model by specifying no covariates for any components of the model.
## Fit a simple two component mixture fit0 <- bmixlm(y~1,y~1,~1,data=d,nsamp=100) fit0 <- update(fit0,nsamp=2000) summary(fit0)
Models in which the linear models for the two components depend on identical sets of covariates must be interpreted with care. When there is nothing to distinguish the models for the two components, the two components become interchangeable and the model is subject to the relabelling problem common to MCMC treatments of simple mixture models.
Models can be compared based on WAIC. If an important covariate is removed from the model, typically the WAIC increases substantially.
fit1 <- update(fit,formula1=y~a+b,nsamp=100) fit1 <- update(fit1,nsamp=2000) summary(fit1)
But when an additional covariate is included in the model the change in WAIC is more subtle
fit2 <- update(fit,formula1=y~a+b+c+d,nsamp=100) fit2 <- update(fit2,nsamp=2000) summary(fit2)
The summary table for the original fit suggests c may not be required in the model for the second component. However if this term is deleted the WAIC increases, suggsting this term should be retained
fit3 <- update(fit,formula2=y~d+e,nsamp=100) fit3 <- update(fit3,nsamp=2000) summary(fit3)
One approach to model building is to fit almost full models for each component, ensuring that the two component models differ so that the components are differentiated
## Draw a burnin sample fit0 <- bmixlm(y~a+b+c+e+f+g,y~b+c+d+e+f+g,~a+b+c+d+e+f+g,data=d,nsamp=100) fit0 <- update(fit0,nsamp=2000) summary(fit0)
and then form a baseline model by deleting terms for which there is not strong evidence
fit0 <- update(fit0,formula1=y~a+b+c,formula2=y~d+e,formulap=~f+g) summary(fit0)
Alternately, given a reasonable initial model, the data can be classified into the two components and classical techniques used to suggest terms for the model components
q <- classify(fit0)$q summary(step(lm(qnorm(q)~a+b+c+d+e+f+g,data=d),trace=FALSE)) summary(step(lm(y~a+b+c+d+e+f+g,data=d[q<0.4,]),trace=FALSE)) summary(step(lm(y~a+b+c+d+e+f+g,data=d[q>0.6,]),trace=FALSE))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.