NHMM: Bayesian Non-homogeneous Markov Model (NHMM)

Description Usage Arguments Value Examples

View source: R/NHMM.R

Description

NHMM calculates an NHMM for multiple sequences of data. The sequences can actually be short sets of equal length sequences (subseq). The traditional input variables (X) influence the non-homogenous transition probabiities of the model. An additional set of input variables (W) can be included to influence the mixture proportions of the emission distributions. The NHMM follows the general weather state formulation of Hughes and Guttorp but in a Bayesian fashion. All parameters are sampled via Gibbs steps (latent variables such that no tuning is needed.) The W variable coefficients are sampled through an ordered Mulinomial probit (Albert and Chib). The X variable coefficients are sampled through an unordered Multinomial logit model Polya-Gamma formulation (Polson, Scott, Windle). The hidden states are sampled through a blocked Gibbs sampler.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
NHMM(
  y,
  subseq = NULL,
  X = NULL,
  betapriorm = NULL,
  betapriorp = NULL,
  K = 2,
  iters = 1000,
  burnin = 200,
  emdist = "normal",
  nmix = 1,
  delta = FALSE,
  W = NULL,
  psipriorm = NULL,
  psipriorp = NULL,
  priors = NULL,
  outdir = NULL,
  ymiss = FALSE,
  yrep = 0,
  ypred = 0,
  Xp = NULL,
  Wp = NULL,
  yhold = NULL
)

Arguments

y

T by J matrix of data (J=1 is sufficient) - missing data is denoted with NA

subseq

[optional] if y is actually a set of subsequences then give the length of those sequences (122 for JJAS) (365 is not it!). Default is subseq=T.

X

B by T matrix for the transition input data (B different inputs) Missing values are not allowed. If there are no Xs then use HMM function.

betapriorm

[optional] default=NULL which is reference prior. Or a [K+B by K matrix] for the mean of the Normal prior for the beta coefficients.

betapriorp

[optional] default=NULL which is reference prior. Or a [K+B by K+B by K array] for the precision prior(1/sig^2) of the Normal prior for the beta coefficients.

K

number of states (default=2)

iters

number of iterations to keep after burn in (default=1000)

burnin

the number of burn in (default=200)

emdist

emission distribution: "normal", "poisson", "gamma" actual choices are Normal, Poisson, Gamma, Exponential, or finite mixtures of mixtures or zero inflated version of any of these

nmix

[optional] number of mixture components for emdist, default is one (do not include delta)

delta

[TRUE/FALSE] TRUE-if we are using a zero inflated distribution (adds a delta function at zero as the first mixture component)

W

[optional] is an A by T by J array of emission input data (A different inputs), missing values are not allowed, do not include an intercept term here The mixture components of emission depend on W.

psipriorm

[optional] default=NULL which is reference prior. Or a [K+A by J matrix] for the mean of the Normal prior for the beta coefficients.

psipriorp

[optional] default=NULL which is reference prior. Or a [K+A by J matrix] for the precision prior(1/sig^2) of the Normal prior for the beta coefficients.

priors

priors for emission components, each state can have a different prior dimension 5 by nmix by K by J (some of the 5 dimensions are filled with zeros for some distributions) Normal(mu,sig2) or reference prior: mu~Normal(a.mu, 1/b.mu^2), sig2~IG(a.sig,b.sig) -the first 4 rows are used [a.mu, b.mu, a.sig, b.sig] but the 5th row is not used but must be present. -reference prior: priors=NULL

Poisson(lambda): lambda~Gamma(a.lam,b.lam) -the first 2 rows are used [a.lam, b.lam] but the 3rd-5th rows are not used but must be present.

Gamma(alpha,beta): alpha~Gamma(a.alpha,b.alpha) beta~Gamma(a.beta,b.beta) -if the second row is supplied as NA then the first row supplies fixed first parameters for the Gamma this is the suggested parametrization of the Gamma. Estimating the first parameter of the Gamma distribution switches to an MCMC that is not as stable. -to create an Exponential distribution the first row is set to 1 and the second is set to NA then the third and fourth rows contain the (a.lam, b.lam) priors for the parameter of the Exponential if prior=NULL then these are set to ones. setting these parameters to zeros is not a good idea. -a non-informative prior may not work, use a matrix of ones for a lowly informative prior instead of zeros. -the 5th row is for the tuning parameter for the MCMC (try 0.5) if a two parameter Gamma is desired. A 2 parameter Gamma needs an informative prior and results may be more unstable. -default prior=NULL is the Exponential distribution with lowly informative prior of ones

outdir

[optional] can output each set of parameters to output files in a directory use this with larger dimension data sets or with large number of iterations the output will be written line by line and not overburden the memory limit outdir needs to end with a slash or double slash depending on OS.

ymiss

[optional-TRUE/FALSE] if outdir is specified then draws for any missing data points will be saved to ymiss-J*.txt. There will be one data point for each sequence of length Tmiss (the amount missing from that sequence). Each row of the output file will be an iteration of the algorithm after burnin is removed.

yrep

[optional] number (ie. 100,200,or 500) of output replicate data sets to print to outdir. The replicates will be the same dimension as y and start after burn in period. default is zero. These replicate data sets are generated from the same input variables. - must be shorter than iters

ypred

number of predictive sets (ie. 0,100, 200, 500) - must be shorter than iters. Predicted chains: will produce ypred sets of predictives [pT by J] to print to outdir. yrep uses the same inputs to make replications, this uses new inputs to make predictions over a different time span (pT) Also outputs a set of predictive z values or length (pT by ypred)

Xp

predictive set of Xs of length pT [B by pT] -missing values is not allowed -ensure that the Xp inputs are in the same order as X

Wp

predictive set of Ws of length pT [A by pT by J] -missing valus are not allowed -ensure that the Wp inputs are in the same order as W

yhold

[optioinal] a sequence of y observed values [pT by J], held out data that is of length pT that is used to compute the predictive log score (PLS) which is a metric like BIC (ie. hold out last 10 missing data values are filled in with mean PLS value. PLS is the average PLS across sequences.

Value

my.nhmm object

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
## Gamma or Exponential
### if "priors" is not specified, this is an Exponetial distribution
data(NHMMdata)
attach(NHMMdata)

## Set to iters=40 for example only this should be in the thousands
my.nhmm=NHMM(y=ygamma[1:200,1:3],  X=matrix(tX[,1:200],1,200),
    K=3, iters=40, burnin=2, emdist="gamma", nmix=3, delta=TRUE)
          
OBIC(my.nhmm)
Oz(my.nhmm)  #compare with the truth: tz1
OQQ(my.nhmm) #transition probabilities 
## Not run: 
bb=OXcoef(my.nhmm)
pp=OWcoef(my.nhmm,FALSE)
tt=Oemparams(my.nhmm,FALSE)
 
 ## Normal - X is not used to create this data, so it should not be significant
 my.nhmm2=NHMM(y=ynormal, subseq=1000, X=tX, K=3, iters=100, 
           burnin=10, emdist="normal", nmix=2, delta=FALSE)
 OBIC(my.nhmm2)
 
 ## Poisson
 my.nhmm3=NHMM(y=ypoisson, X=tX, K=3, iters=100, burnin=10,
              emdist="poisson", nmix=2, delta=FALSE)
 OBIC(my.nhmm3)
 
 ## Predictive estimation - make 15 predictive data sets (new X) and 20 replicate data sets (same X)
 #filelocation="C:\Users\iamrandom\Desktop\here\"
 #my.nhmm4=NHMM(y=ygamma,  X=tX, K=3, iters=100, burnin=10, 
 #              emdist="gamma", nmix=3, delta=TRUE, 
 #              outdir=filelocation, yrep=20, Xp=Xp1, ypred=15)
 #OBIC(my.nhmm4)  #needed larger burnin
 #tt=Oemparams(my.nhmm4,TRUE,filelocation)
 
 ## Exponential with W variable
 #filelocation="C:\Users\iamrandom\Desktop\here\"
 #my.nhmm5=NHMM(y=ygamma,  X=tX, W=tW1, K=3, iters=50, burnin=10,
 #               emdist="gamma", nmix=3, delta=TRUE, 
 #              outdir=filelocation, yrep=20, Xp=Xp1, Wp=Wp1,ypred=35)
 #OBIC(my.nhmm5)
 #pp=OWcoef(my.nhmm5,filelocation)
 
 ## Gamma with fixed first variables nmix=2
 nmix=2; K=3; J=dim(ygamma)[2]
 prior1=array(1,dim=c(5,nmix,K,J));  prior1[1,1,,]=1;  
                prior1[1,2,,]=2;  prior1[2,,,]=NA
 my.nhmm6=NHMM(y=ygamma,  X=tX, priors=prior1, K=3, iters=100, 
                burnin=10, emdist="gamma", nmix=2, delta=TRUE)
 
 
 ## One dimensional y vector case (J=1)
 #my.nhmm=NHMM(y=matrix(ygamma[1:200,1],200,1),  X=matrix(tX[,1:200],1,200),
 #K=3, iters=40, burnin=2, emdist="gamma", nmix=3, delta=TRUE)
 
 ### Compare my.nhmm6 (K=3) and my.nhmm7 (K=1) using both BIC 
 ###     and PLS (yhold is the last 10% of the data)
 #ygamma2=ygamma
 #ygamma2[1600,10]=NA  #add some missingness
 #ygamma2[1840,10]=NA  #add some missingness to yhold
 #filelocation="C:\Users\iamrandom\Desktop\here\"
 #my.nhmm7=NHMM(y=ygamma2[1:1800,],  X=matrix(tX[,1:1800],1,1800), 
 #              W=array(tW[,1:1800,],dim=c(1,1800,15)), 
 #              K=3, iters=50, burnin=10, emdist="gamma", nmix=3, 
 #               delta=TRUE, outdir=filelocation, ymiss=TRUE, yrep=10, 
 #              Xp=matrix(tX[,1801:2000],1,200), Wp=array(tW[,1801:2000,],dim=c(1,200,15)), 
 #              ypred=10, yhold=ygamma2[1801:2000,])
 #OBIC(my.nhmm7)
 
 #compare K=1 and K=3
 
## End(Not run)

NHMM documentation built on July 1, 2020, 7:28 p.m.

Related to NHMM in NHMM...