demography: Gaussian mixed linear regression for modelling population...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/demography.R

Description

The demography() function estimates the parameters of a Gaussian linear mixed model in a hierarchical Bayesian framework. To estimate the posterior distribution of the parameters, block sampling is used by applying the algorithm 2 of Chib and Carlin (1999). The user supplies data and priors and a sample from the posterior distribution is returned as an MCMC object, which can be subsequently analyzed with functions provided in the coda package.

Usage

1
2
3
demography(fixed, random, group, data, burnin=1000, mcmc=10000,
thin=10, verbose=1, seed=NA, beta.start=NA, sigma2.start=NA,
Vb.start=NA, mubeta=0, Vbeta=1.0E6, r, R, nu=0.001, delta=0.001)

Arguments

fixed

A two-sided linear formula of the form 'y~x1+...+xp' describing the fixed-effects part of the model, with the response on the left of a '~' operator and the p fixed terms, separated by '+' operators, on the right.

random

A one-sided formula of the form '~x1+...+xq' specifying the model for the random effects, with the q random terms, separated by '+' operators. If random=NULL, a fixed effect model is fitted.

group

String indicating the name of the grouping variable in data, defining the hierarchical structure of the model.

data

A data frame containing the variables in the model.

burnin

The number of burnin iterations for the sampler.

mcmc

The number of Gibbs iterations for the sampler. Total number of Gibbs iterations is equal to burnin+mcmc. burnin+mcmc must be divisible by 10 and superior or equal to 100 so that the progress bar can be displayed.

thin

The thinning interval used in the simulation. The number of Gibbs iterations must be divisible by this value.

seed

The seed for the random number generator. If NA, the Mersenne Twister generator is used with default seed 12345; if an integer is passed it is used to seed the Mersenne twister.

verbose

A switch (0,1) which determines whether or not the progress of the sampler is printed to the screen. Default is 1: a progress bar is printed, indicating the step (in %) reached by the Gibbs sampler.

beta.start

The starting values for the beta vector. This can either be a scalar or a p-length vector. The default value of NA will use the OLS beta estimate of the corresponding Gaussian linear regression without random effects. If this is a scalar, that value will serve as the starting value for all of the betas.

sigma2.start

Scalar for the starting value of the residual error variance. The default value of NA will use the OLS estimates of the corresponding Gaussian linear regression without random effects.

Vb.start

The starting value for variance matrix of the random effects. This must be a square q-dimension matrix. Default value of NA uses an identity matrix.

mubeta

The prior mean of beta. This can either be a scalar or a p-length vector. If this takes a scalar value, then that value will serve as the prior mean for all of the betas. The default value of 0 will use a vector of zeros for an uninformative prior.

Vbeta

The prior variance of beta. This can either be a scalar or a square p-dimension matrix. If this takes a scalar value, then that value times an identity matrix serves as the prior variance of beta. Default value of 1.0E6 will use a diagonal matrix with very large variance for an uninformative flat prior.

r

The shape parameter for the Inverse-Wishart prior on variance matrix for the random effects. r must be superior or equal to q. Set r=q for an uninformative prior.

R

The scale matrix for the Inverse-Wishart prior on variance matrix for the random effects. This must be a square q-dimension matrix. Use plausible variance regarding random effects for the diagonal of R.

nu

The shape parameter for the Inverse-Gamma prior on the residual error variance. Default value is nu=delta=0.001 for uninformative prior.

delta

The rate (1/scale) parameter for the Inverse-Gamma prior on the residual error variance. Default value is nu=delta=0.001 for uninformative prior.

Details

The demography() function samples from the posterior distribution using the blocked Gibbs sampler of Chib and Carlin (1999), Algorithm 2. The simulation is done in compiled C++ code to maximize efficiency. Please consult the coda documentation for a comprehensive list of functions that can be used to analyze the posterior sample.

The model takes the following form:

y_i = X_i * beta + W_i * b_i + epsilon_i

Where each group i have k_i observations.

Where the random effects:

b_i ~ N_q(0,Vb)

And the errors:

epsilon_i ~ N(0, sigma^2 I_{k_i})

We assume standard, conjugate priors:

beta ~ N_p(mubeta,Vbeta)

And:

sigma^2 ~ IGamma(nu, 1/delta)

And:

Vb ~ IWishart(r, rR)

See Chib and Carlin (1999) for more details.

NOTE: We do not provide default parameters for the priors on the variance matrix for the random effects. When fitting one of these models, it is of utmost importance to choose a prior that reflects your prior beliefs about the random effects. Using the dwish and rwish functions might be useful in choosing these values.

Value

mcmc

An MCMC object that contains the posterior sample. This object can be summarized by functions provided by the coda package.

deviance

The posterior mean of the deviance D, with D=-2log(prod_i P(y_i|beta,b_i,sigma^2)), is also provided.

Y.pred

Predictive posterior mean for each observation.

Author(s)

Ghislain Vieilledent <ghislain.vieilledent@cirad.fr>

References

Siddhartha Chib and Bradley P. Carlin. 1999. “On MCMC Sampling in Hierarchical Longitudinal Models.” Statistics and Computing. 9: 17-26.

Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. Scythe Statistical Library 1.0. http://scythe.wustl.edu.

Andrew D. Martin and Kyle L. Saunders. 2002. “Bayesian Inference for Political Science Panel Data.” Paper presented at the 2002 Annual Meeting of the American Political Science Association.

Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2002. Output Analysis and Diagnostics for MCMC (CODA). http://www-fis.iarc.fr/coda/.

Ghislain Vieilledent, Clovis Grinand and Romuald Vaudry. 2013. Forecasting deforestation and carbon emissions in tropical developing countries facing demographic expansion: a case study in Madagascar. Ecology and Evolution. DOI: 10.1002/ece3.550

See Also

plot.mcmc, summary.mcmc

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
## Not run: 
#========================================
# Hierarchical Gaussian Linear Regression
#========================================

library(phcfM)

#== Generating data

# Constants
nobs <- 1000
ntown <- 20
town <- c(1:ntown,sample(c(1:ntown),(nobs-ntown),replace=TRUE))

# Covariates
X1 <- runif(n=nobs,min=0,max=10)
X2 <- runif(n=nobs,min=0,max=10)
X <- cbind(rep(1,nobs),X1,X2)
W <- X

# Target parameters
# beta
beta.target <- matrix(c(0.1,0.3,0.2),ncol=1)
# Vb
Vb.target <- c(0.5,0.2,0.1)
# b
b.target <- cbind(rnorm(ntown,mean=0,sd=sqrt(Vb.target[1])),
                  rnorm(ntown,mean=0,sd=sqrt(Vb.target[2])),
                  rnorm(ntown,mean=0,sd=sqrt(Vb.target[3])))
# sigma2
sigma2.target <- 0.02

# Response
Y <- vector()
for (n in 1:nobs) {
  Y[n] <- rnorm(n=1,
                mean=X[n,]%*%beta.target+W[n,]%*%b.target[town[n],],
                sd=sqrt(sigma2.target))
}

# Data-set
Data <- as.data.frame(cbind(Y,X1,X2,town))
plot(Data$X1,Data$Y)

#== Call to demography
model <- demography(fixed=Y~X1+X2, random=~X1+X2, group="town",
              data=Data, burnin=1000, mcmc=1000, thin=1,verbose=1,
              seed=NA, beta.start=0, sigma2.start=1,
              Vb.start=1, mubeta=0, Vbeta=1.0E6,
              r=3, R=diag(c(1,0.1,0.1)), nu=0.001, delta=0.001)

#== MCMC analysis

# Graphics
pdf("Posteriors-demography.pdf")
plot(model$mcmc)
dev.off()

# Summary
summary(model$mcmc)

# Predictive posterior mean for each observation
model$Y.pred

# Predicted-Observed
plot(Data$Y,model$Y.pred)
abline(a=0,b=1)


## End(Not run)

phcfM documentation built on May 30, 2017, 5:21 a.m.