Bayesian nonparametric dependent Gaussian process model for timeindexed functional data
Description
Estimates a collection of timeindexed functions with Gaussian process (GP) formulations where a Dirichlet process mixture allows subgroupings of the functions to share the same GP covariance parameters. The GP formulation supports any number of additive GP covariance terms, expressing either or both of multiple trend and seasonality.
Usage
1 2 3 
Arguments
y 
A multivariate continuous response, specified as an N x T matrix, where 
ipr 
An optional input vector of inclusion probabilities for each observation unit in the case
the observed data were acquired through an informative sampling design, so that unbiased
inference about the population requires adjustments to the observed sample. Defaults to

time_points 
Inputs a vector of common time points at which the collections of functions were
observed (with the possibility of intermittent missingness). The length of 
gp_cov 
A vector of length 
sn_order 
A vector of length 
jitter 
A scalar numerical value added to the diagonal elements of the T x T GP covariance
matrix to stabilize computation. Defaults to 
gp_shape 
The shape parameter of the Gamma base distribution for the DP prior on
the P x N matrix of GP covariance parameters (where P
denotes the number of parameters for each of the N experimental units).
Defaults to 
gp_rate 
The rate parameter of the Gamma base distribution on GP covariance parameters.
Defaults to 
noise_shape 
The shape parameter of the Gamma base distribution on 
noise_rate 
The rate parameter of the Gamma base distribution on 
dp_shape 
The shape parameter for the Gamma prior on the DP concentration parameter,

dp_rate 
The rate parameter for the Gamma prior on the DP concentration parameter,

M_init 
Starting number of clusters of 
lower 
The lower end of the range to be used in conditionally sampling the GP covariance
parameters ( 
upper 
The upper end of the range to be used in conditionally sampling the GP covariance
parameters ( 
sub_size 
Integer vector whose length, 
w_star 
Integer value denoting the number of cluster locations to sample ahead of
observations in the auxiliary Gibbs sampler used to sample the number of clusters
and associated cluster assignments. A higher value reduces samplin autocorrelation, but
increases computational burden. Defaults to 
w 
Numeric value denoting the step width used to construct the interval from
which to draw a sample for each GP covariance parameter in the slice sampler. This
value is adaptively updated in the sampler tuning stage for each parameter to be equal
to the difference in the 0.95 and 0.05 sample quantiles for each of 5 block updates.
Defaults to 
n.iter 
Total number of MCMC iterations. 
n.burn 
Number of MCMC iterations to discard.

n.thin 
Gap between successive sampling iterations to save. 
n.tune 
Number of iterations (before ergodic chain instantiated) to adapt 
progress 
A boolean value denoting whether to display a progress bar during model execution.
Defaults to 
b_move 
A boolean value denoting whether to sample the GP function, 
cluster 
A boolean value denoting whether to employ DP mix model over set of GP functions or
to just use GP model with no clustering of covariance function parameters.
Defaults to 
s 
An N x 1 integer vector that inputs a fixed clustering, rather than sampling it.
Defaults to 
Value
S3 gpdpgrow
object, for which many methods are available to return and view results. Generic functions applied
to an object, res
of class gpdpgrow
, includes:
samples(res) 
contains ( 
resid(res) 
contains the model residuals. 
Note
The intended focus for this package are data composed of observed noisy functions (each of
length T
) for a set of experimental units where the functions may express dependence
among the experimental units
Author(s)
Terrance Savitsky tds151@gmail.com Daniell Toth danielltoth@yahoo.com
References
T. D. Savitsky and D. Toth (2014) Bayesian Nonparametric Models for Collections of Time indexed Functions. submitted to: JRSS Series A (Statistics in Society).
T. D. Savitsky (2014) Bayesian Nonparametric Functional Mixture Estimation for Timeindexed data. submitted to: Annals of Applied Statistics.
T. D. Savitsky (2014) Bayesian NonParametric Mixture Estimation for TimeIndexed Functional
Data for R
. Submitted to: Journal of Statistical Software.
See Also
gmrfdpgrow
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45  {
library(growfunctions)
## load the monthly employment count data for a collection of
## U.S. states from the Current
## Population Survey (cps)
data(cps)
## subselect the columns of N x T, y, associated with
## the years 2011  2013
## to examine the state level employment
## levels during the "great recession"
y_short < cps$y[,(cps$yr_label %in% c(2011:2013))]
## uses default setting of a single "rational quadratic" covariance
## run for 500 iterations, with half discarded as burnin to
## obtain a more useful result.
res_gp < gpdpgrow(y = y_short,
n.iter = 4,
n.burn = 1,
n.thin = 1,
n.tune = 0)
## Two plots of estimated functions,
## 1. faceted by cluster
## 2. fitted functions vs noisy observations
## first plot will plot estimated denoised function,
## bb_i, for a single (randomlyselected) "state"
fit_plots_gp < cluster_plot( object = res_gp,
units_name = "state",
units_label = cps$st,
single_unit = TRUE,
credible = TRUE )
## second plot will randomly select 6 states
## and plot their estimated denoised functions, bb_i.
## with setting "single_unit = FALSE".
## (Option "num_plot" may be set to plot
## any integer number of
## randomlyselected units.)
fit_plots_gp < cluster_plot( object = res_gp,
units_name = "state",
units_label = cps$st,
single_unit = FALSE,
credible = TRUE )
}
