deforestation: Binary logistic regression with variable time-intervals...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/deforestation.R

Description

The deforestation() function estimates the parameters of a Binary logistic regression model with variable time-intervals between observations in a hierarchical Bayesian framework. To estimate the posterior distribution of the parameters, an adaptive Metropolis algorithm is used. The user supplies data and priors and a sample from the posterior distribution is returned as an MCMC object, which can be subsequently analyzed with functions provided in the coda package.

Usage

1
2
3
deforestation(formula, interval=1, data, burnin=1000, mcmc=1000,
   thin=1, verbose=1, seed=NA, tune=1, beta.start=NA, mubeta=0,
   Vbeta=1.0E6)

Arguments

formula

A two-sided linear formula of the form 'y~x1+...+xp' describing the model, with the response on the left of a '~' operator and the p fixed terms, separated by '+' operators, on the right. Response variable y must be 0 or 1 (Bernoulli trials).

interval

A numeric scalar or a vector of length equal to the number of observations. interval specifies the time interval between land cover observations. Default to 1.

data

A data frame containing the variables of the model.

burnin

The number of burnin iterations for the sampler.

mcmc

The number of Gibbs iterations for the sampler. Total number of Gibbs iterations is equal to burnin+mcmc. burnin+mcmc must be divisible by 10 and superior or equal to 100 so that the progress bar can be displayed.

thin

The thinning interval used in the simulation. The number of Gibbs iterations must be divisible by this value.

verbose

A switch (0,1) which determines whether or not the progress of the sampler is printed to the screen. Default is 1: a progress bar is printed, indicating the step (in %) reached by the Gibbs sampler.

seed

The seed for the random number generator. If NA, the Mersenne twister generator is used with default seed 12345; if an integer is passed it is used to seed the Mersenne twister.

tune

Metropolis tuning parameter. Must be a positive scalar. The tuning parameter is updated during the burning period to approach an acceptance rate of 0.44.

beta.start

The starting values for the beta vector. This can either be a scalar or a p-length vector. The default value of NA will use the OLS beta estimate of the corresponding Binary logistic regression. If this is a scalar, that value will serve as the starting value for all of the betas.

mubeta

The prior mean of beta. This can either be a scalar or a p-length vector. If this takes a scalar value, then that value will serve as the prior mean for all of the betas. The default value of 0 will use a vector of zeros for an uninformative prior.

Vbeta

The prior variance of beta. This can either be a scalar or a square p-dimension matrix. If this takes a scalar value, then that value times an identity matrix serves as the prior variance of beta. Default value of 1.0E6 will use a diagonal matrix with very large variance for an uninformative flat prior.

Details

The deforestation() function estimates the parameters of a logistic regression model with variable time-intervals between observations. The estimation is done in a hierarchical Bayesian framework using an adaptive Metropolis algorithm. Function is developped in C++ code using the Scythe statistical library (Pemstein et al. 2007) to maximize efficiency.

The model takes the following form:

y_i ~ Bernoulli(theta'_i)

With theta_i' = 1-(1-theta_i)^I_i, where I_i stands for the time-interval for observation i. Thus, theta_i is a probability by unit of time (an annual rate for example if the unit of the time-interval is in year).

Using the logit link function denoted phi, we set:

phi(theta_i) = X_i * beta

By default, we assume a multivariate Normal prior on beta:

beta ~ N(mu_beta,V_beta)

For the Metropolis algorithm, the proposal distribution is centered at the current value of beta and has variance-covariance V = T (V_beta^(-1) + C^(-1))^(-1) T, where T is the diagonal positive definite matrix formed from the tune, V_beta is the prior variance, and C is the large sample variance-covariance matrix of the MLEs without considering the variation of the time-interval between observations. This last calculation is done via an initial call to glm.

Value

mcmc

An MCMC object that contains the posterior sample. This object can be summarized by functions provided by the coda package.

deviance

The posterior mean of the deviance D, with D=-2log(prod_i P(y_i|theta_i')).

tune

The optimized value for the tuning parameter. This value can be used for potential future runs.

Author(s)

Ghislain Vieilledent <ghislain.vieilledent@cirad.fr>

References

Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. Scythe Statistical Library 1.0. http://scythe.wustl.edu.

Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2002. Output Analysis and Diagnostics for MCMC (CODA). http://www-fis.iarc.fr/coda/.

Ghislain Vieilledent, Clovis Grinand and Romuald Vaudry. 2013. Forecasting deforestation and carbon emissions in tropical developing countries facing demographic expansion: a case study in Madagascar. Ecology and Evolution. DOI: 10.1002/ece3.550

See Also

plot.mcmc, summary.mcmc

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
## Not run: 
#=====================================================================
# Logistic regression with variable time-interval between observations 
#=====================================================================

#= Library
library(phcfM)

#== Generating data

# Random seed
set.seed(1234)

# Constants
nobs <- 3000

# Covariates
X1 <- runif(n=nobs,min=-10,max=10)
X2 <- runif(n=nobs,min=-10,max=10)
X <- cbind(rep(1,nobs),X1,X2)
I <- runif(n=nobs,min=1,max=5) # Time-interval

# Target beta parameters
beta.target <- matrix(c(0.3,0.2,0.1),ncol=1)

# Response
theta <- vector()
theta_prim <- vector()
Y <- vector()
for (n in 1:nobs) {
  theta[n] <- inv.logit(X[n,]%*%beta.target)
  theta_prim[n] <- 1-(1-theta[n])^I[n]
  Y[n] <- rbinom(n=1,size=1,prob=theta_prim[n])
}

# Data-set
Data <- as.data.frame(cbind(Y,I,theta_prim,theta,X1,X2))
plot(Data$X1,Data$theta)
plot(Data$X2,Data$theta)

#== Call to deforestation()
model <- deforestation(formula=Y~X1+X2, interval=Data$I, data=Data, burnin=1000, mcmc=1000,
                       thin=1, verbose=1, seed=NA, tune=1, beta.start=NA, mubeta=0,
                       Vbeta=1.0E6)

#== MCMC analysis

# Graphics
plot(model$mcmc)

# Parameter estimates
str(model)
summary(model$mcmc)
model$deviance
model$tune

## We obtain good parameter estimates 
##
##                    Mean       SD  Naive SE Time-series SE
## beta.(Intercept) 0.3362 0.054534 0.0017245       0.006589
## beta.X1          0.2027 0.009345 0.0002955       0.001154
## beta.X2          0.1037 0.007269 0.0002299       0.000838

#== GLM resolution if time-interval is not taken into account

model.glm <- glm(Y~X1+X2,data=Data,family="binomial")
summary(model.glm)

## In this case, the parameter estimates are biased
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 2.000761   0.072069   27.76   <2e-16 ***
## X1          0.247514   0.012060   20.52   <2e-16 ***
## X2          0.123137   0.009803   12.56   <2e-16 ***

## End(Not run)

phcfM documentation built on May 30, 2017, 5:21 a.m.