combayes implements Bayesian inference for COM-Poisson regression models using exact samplers. It also provides functions for sampling exactly from the COM-poisson distribution (using rejection sampling) and for evaluating exact bounds for the normalisation constant of the probability mass function of the COM-Poisson distribution. More information behind the techniques used can be found in the papers:
Both papers focus on the Bayesian implementation of the COM-Poisson regression model. The latter paper takes advantage of the exchange algorithm, an MCMC method applicable to situations where the sampling model (likelihood) can only be computed up to a normalisation constant. The algorithm requires to draw from the sampling model, which in the case of the COM-Poisson distribution can be done efficiently using rejection sampling.
If you want to get a really short intro to the COM-Poisson distribution enter the URL of the README.html document at http://htmlpreview.github.io/.
Are you still confused about the COM-Poisson distribution and the exchange algorithm? Have a look at these slides.
library(devtools) install_github("cchanialidis/combayes")
All you need to do now, I hope, is
library(combayes)
# Set random seed for reproducibility set.seed(84) # Sample size n <- 200 # Sampling from an underdispersed COM-Poisson distribution comp_under <- rcmpois(mu=10,nu=2,n=n) # Sampling from a COM-Poisson distribution where nu=1 (i.e. Poisson distribution) comp_equi <- rcmpois(mu=10,nu=1,n=n) # Sampling from an overdispersed COM-Poisson distribution comp_over <- rcmpois(mu=10,nu=0.5,n=n) # Save samples in a data frame distributions <- data.frame(comp_under,comp_equi,comp_over)
apply(distributions,2,mean)# Similar means (close to the value of mu) apply(distributions,2,var)# Different variances (close to the value of mu/nu)
logzcmpois(mu=10,nu=2) logzcmpois(mu=10,nu=1)# Any ideas on what we expect the answer to be for nu=1? logzcmpois(mu=10,nu=0.5)
# Compare densities of COM-Poisson distribution with different nu x <- 0:25 matplot(x, cbind(dcmpois(x, mu=10, nu=0.5), dcmpois(x, mu=10, nu=1), dcmpois(x, mu=10, nu=2)), type="o", col=2:4, pch=16, ylab="Probability mass function") legend("topright", col=2:4, lty=1:3, c(expression(nu*"="*0.5), expression(nu*"="*1), expression(nu*"="*2)))
We illustrate the method and the benefits of using a Bayesian COM-Poisson regression model, through two real-world data sets with different levels of dispersion. If one wants to use the alternative technique proposed in the earlier paper they have to specify that the argument algorithm
in the cmpoisreg
is equal to "bounds"
(in its default version is equal to "exchange"
). Bear in mind that the MCMC algorithm will be significantly slower in that case.
# Load data from library Rchoice library(Rchoice) data(Articles) # Focusing only on the students with at least one publication phdpublish <- subset(Articles, art>0) phdpublish <- transform(phdpublish, art=art-1) # Standardise all non-binary covariates phdpublish <- cbind(phdpublish[,c(1,2,3)],scale(phdpublish[,-c(1,2,3)],center=TRUE,scale=TRUE)) result <- cmpoisreg(y=phdpublish$art, X=phdpublish[,2:6], num_samples=1e4, burnin=1e3,prior_var_beta=diag(6),prior_var_delta=diag(6)) colMeans(result$posterior_beta) colMeans(result$posterior_delta) mcmc_beta <- mcmc(result$posterior_beta) mcmc_delta <- mcmc(result$posterior_delta) colnames(mcmc_beta) <- c("intercept","female","married","kids","phd","mentor") colnames(mcmc_delta) <- colnames(mcmc_beta) # Plot traceplots of regression coefficients plot(mcmc_beta) plot(mcmc_delta) # Plot caterplots of regression coefficients caterplot(mcmc_beta,style="plain",bty="n",collapse=FALSE) abline(v=0,lty=2) title("Regression coefficients for"~ mu) caterplot(mcmc_delta,style="plain",bty="n",collapse=FALSE) abline(v=0,lty=2) title("Regression coefficients for"~ nu)
A talk by Galit Shmueli on the COM-Poisson distribution can be found here.
If you prefer reading papers instead, here are some starting points:
I have to add a lot of info on this subsection but until then; here is a really informative paper that outlines the theory and the implementation of regression models for count data in R.
Finally, Joseph M. Hilbe released the COUNT package which contains lots of data sets where the response variable is discrete.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.