gmrfdpgrow: Bayesian instrinsic Gaussian Markov Random Field model for...

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/gmrfdpgrow.R

Description

Estimates a collection of time-indexed functions under intrinsic Gaussian Markov random field prior formulations where a Dirichlet process mixture allows sub-groupings of the functions to share the same iGMRF precision parameter. The iGMRF formulation supports any number of additive precision terms, expressing either or both of multiple trend and seasonality.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
gmrfdpgrow(
  y,
  ksi,
  E,
  ipr,
  q_order,
  q_type,
  q_shape,
  q_rate,
  tau_shape,
  tau_rate,
  dp_shape,
  dp_rate,
  M_init,
  w_star,
  n.iter,
  n.burn,
  n.thin,
  nu,
  Rep,
  progress,
  jitter,
  kappa_fast,
  stable_launch
)

Arguments

y

A multivariate continuous response, specified as an N x T matrix, where N denotes the number of functions and T, the number of time points per function. Intermittent missing-at-random values are allowed and will be estimated from the posterior predictive distribution. Missing cells should be denoted with NA.

ksi

An N x P matrix of N observations of P predictors to be used in prior probability of co-clustering of set of N, T x 1 observations. Defaults to ksi = NULL such that predictors are not used to a priori determine co-clustering probabilities.

E

A multivariate offset variable, specified as an N x T matrix, in the case that y is of type count data. The offset will be used to model the y as under a poisson lognormal where y ~ Pois(E*exp(Psi)). Defaults to NULL, in which case the response type is assumed continuous.

ipr

An optional input vector of inclusion probabilities for each observation unit in the case the observed data were acquired through an informative sampling design, so that unbiased inference about the population requires adjustments to the observed sample Defaults to ipr = rep(1,nrow(y)) indicating an iid sample.

q_order

An integer vector of length K to select the order for each iGMRF precision term. e.g. If the first term is a RW2 and there is a second is a 3-month seasonality term, where the time points are indexed by month, then q_order = c(2,3). Defaults to q_order = 2

q_type

A vector of length K, the number of iGMRF precision terms, with each entry indicating whether the associated term is a trend ("tr") or seasonality ("sn") term. So all entries must be one of c("tr","sn"). Defaults to q_type = "tr".

q_shape

The value (in (0,infty)) for the shape hyperparameter for the Gamma base distribution for the iGMRF scale parameters, kappa_star(k,m), where k denotes the term and m, the cluster. Defaults to q_shape = 0.3.

q_rate

The rate parameter of the Gamma base distribution on kappa_star. Defaults to q_rate = 0.0005.

tau_shape

The value (in (0,infty)) for the shape hyperparameter for the Gamma prior on the error precision parameter. Defaults to tau_shape = 1.0.

tau_rate

The rate parameter of the Gamma prior distribution on tau_e. Defaults to tau_rate = 1.

dp_shape

The shape parameter for the Gamma prior on the DP concentration parameter, conc. Defaults to dp_shape = 1.

dp_rate

The rate parameter for the Gamma prior on the DP concentration parameter, conc. Defaults to dp_rate = 1.

M_init

Starting number of clusters of nrow(y) units to initialize sampler. Defaults to M_init = nrow(y).

w_star

Integer value denoting the number of cluster locations to sample ahead of observations in the auxiliary Gibbs sampler used to sample the number of clusters and associated cluster assignments. A higher value reduces sampling auto-correlation, but increases computational burden. Defaults to w_star = 2. The auxiliary Gibbs sampler is only used in the case that !is.null(ksi), for predictor- assisted clustering; otherwise, the full conditionals for cluster assigment are conjugate.

n.iter

Total number of MCMC iterations.

n.burn

Number of MCMC iterations to discard. gmrfdpgrow will return (n.iter - n.burn) posterior samples.

n.thin

Gap between successive sampling iterations to save.

nu

The degree of freedom parameter for the Huang and Wand prior on precision R x R matrix locations, Lambda_star, in the case that N x R predictors, ksi, are entered to instantiate a predictor-dependent prior for co-clustering. Default value is 4.

Rep

The number of times to draw samples of the N x T log-mean parameters, Psi, for each MCMC iteration under a Poisson-lognormal model when the response type for y is count (not continuous), which is indicated by inputing an offset matrix, E. Default value is 1.

progress

A boolean value denoting whether to display a progress bar during model execution. Defaults to progress = true.

jitter

A scalar double indicating amount of jitter to subract from the posterior rate and shape hyperparameters of tau_e to stabilize computation. Defaults to jitter = 0.0.

kappa_fast

Boolean for whether to generate rate hyperparameter from full conditionals versus joint Gaussian (on random effects, bb, given kappa. The former is faster, but numerically less stable. Defaults to kappa_fast = FALSE.

stable_launch

A boolean indicator on whether to generate initial values for N x T log-mean, Psi, when y are count data from a Gaussian prior distribution, which can induce numerical launch instabilities, or whether to initialize as y/E (where missing values in y are imputed). Defaults to stable_launch = TRUE.

Value

S3 gmrfdpgrow object, for which many methods are available to return and view results. Generic functions applied to an object, res of class gmrfdpgrow, includes:

plot(res)

returns results plots, including fit functions versus data and allocation of fitted functions into clusters

samples(res)

contains (n.iter - n.burn) posterior sampling iterations for every model parameter

resid(res)

contains the model residuals.

Note

The intended focus for this package are data composed of observed noisy functions (each of length T) for a set of experimental units where the functions may express dependence among the experimental units

Author(s)

Terrance Savitsky tds151@gmail.com Daniell toth danielltoth@yahoo.com

References

T. D. Savitsky and D. Toth (2014) Bayesian Non-parametric Models for Collections of Time- indexed Functions. submitted to: JRSS Series A (Statistics in Society).

T. D. Savitsky (2014) Bayesian Non-parametric Functional Mixture Estimation for Time-indexed data. submitted to: Annals of Applied Statistics.

T. D. Savitsky (2014) Bayesian Non-Parametric Mixture Estimation for Time-Indexed Functional Data for R. Submitted to: Journal of Statistical Software.

See Also

gmrfdpgrow

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
{
library(growfunctions)

## load the monthly employment count data for a collection of 
## U.S. states from the Current 
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated 
## with the years 2008 - 2013
## to examine the state level employment levels 
## during the "great recession"
y_short   <- cps$y[,(cps$yr_label %in% c(2008:2013))]

## Run the DP mixture of iGMRF's to estimate posterior 
## distributions for model parameters
## Under default RW2(kappa) = order 2 trend 
## precision term
## Run for 1500 iterations, with half as burn-in for a
## more useful (converged) result.
res_gmrf            <- gmrfdpgrow(y = y_short, 
                                     n.iter = 40, 
                                     n.burn = 20, 
                                     n.thin = 1) 
                                     
## 2 plots of estimated functions: 1. faceted by cluster and fit;
## 2.  data for experimental units.
## for a group of randomly-selected functions
fit_plots_gmrf      <- cluster_plot( object = res_gmrf, 
                                     units_name = "state", 
                                     units_label = cps$st, 
                                     single_unit = FALSE, 
                                     credible = TRUE )    
}

growfunctions documentation built on July 17, 2021, 1:08 a.m.