est_mix: Estimate a mixture model parameters using MCMC

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

These functions fit a Poisson point process with a mixture intensity, in a Bayesian framework, using the Data Augmentation MCMC (DAMCMC) method for a fixed number of components or the Birth-Death MCMC (BDMCMC) method for a random, finite number of components. Estimation of the parameters of the models is accomplished via calculation of the posterior means, based on posterior realizations obtained by these computational methods.

For DAMCMC examples see

http://faculty.missouri.edu/~micheasa/sppmix/sppmix_all_examples.html#est_mix_damcmc

For BDMCMC examples see

http://faculty.missouri.edu/~micheasa/sppmix/sppmix_all_examples.html#est_mix_bdmcmc

Usage

1
2
3
4
5
est_mix_damcmc(pp, m, truncate = FALSE, L = 10000, hyper_da = c(3, 1, 1),
  useKmeans = FALSE)

est_mix_bdmcmc(pp, m, truncate = FALSE, lambda1 = 1, lambda2 = 10,
  hyper = c(m/2, 1/36, 3, 2, 1, 1), L = 30000)

Arguments

pp

Point pattern object of class ppp.

m

Either the number of components to fit in DAMCMC or the maximum number of components requested for a BDMCMC fit.

truncate

Logical variable indicating whether or not we normalize the densities of the mixture components to have all their mass within the window defined in the point pattern pp.

L

Number of iterations for the DAMCMC (default is 10000) or BDMCMC (default is 30000). If the value passed is less than the default, then it is set to the default value.

hyper_da

Hyperparameters for DAMCMC, default is (3, 1, 1).

useKmeans

Logical variable. If TRUE use a kmeans clustering method to obtain the starting values for the component means, otherwise, randomly sample a point from the pattern and use it as a component mean.

lambda1

Parameter for the truncated Poisson prior; lambda1 = 1 by default.

lambda2

Birth rate; lambda2 = 10 by default.

hyper

Hyperparameters for the hierarchical prior. See 'Details' for more information.

Details

The DAMCMC follows the sampling scheme of Diebolt and Robert (1994).

The BDMCMC uses the sampling scheme of Stephens (2000). The definitions of the hyperparameters can be found in equations (21)-(24). The number of components entertained is from 1 to m, and the moves for the BDMCMC chain for the number of components, are either one up (add a component) or one down (remove a component). Make sure you plot the BDMCMC fit to obtain additional information on the fit.

Value

An object of type damcmc_res, containing MCMC realizations.

allgens_List

A list of size L containing posterior realizations, with elements that are lists of size m, with each element being the posterior realization of the parameters of a component, i.e., p, mu and sigma.

genps

An Lxm matrix containing the L posterior realizations of the m component probabilities of the normal mixture.

genmus

An mx2xL array containing the L posterior realizations of m component mean vectors of the normal mixture.

gensigmas

An Lxm list, with elements that are 2x2 matrices, the posterior realizations of the covariance matrices of the components of the normal mixture.

genzs

An Lxn matrix containing the membership indicators of the n points of the point pattern pp, over all iterations of the MCMCM.

genlambdas

An Lx1 vector containing the posterior realizations of the of the lambda parameter (the average number of points over the window).

genlambdas

An Lx1 vector containing the posterior realizations of the of the lambda parameter (the average number of points over the window).

ApproxCompMass

An Lxm matrix containing the approximate mass of the m mixture components for each iteration.

data

The original point pattern.

L

Same as input.

m

Same as input.

Additional return values from the BDMCMC fit (this is a bdmcmc_res object).

numcomp

An Lx1 vector containing the number of components the BDMCMC chooses to fit at each iteration.

maxnumcomp

Same as input m.

Badgen

An Lx1 vector with 0-1 values, where 1 indicates that the realization should be dropped, or 0 if the realization should be kept. The BDMCMC can produce "bad" births and degenerate realizations (zero component mass) that end up in the chain for a specific number of components. Although these realizations do not affect averages of surfaces (no label switching problem and the corresponding component probability is zero), they have a detrimental effect when working for mixture deconvolution within the chain of a specific number of components, i.e., computing the posterior averages of the mixture parameters corresponding to the mixture with MAP number of components. If this parameter is 1 for some realizations, then dropping these degenerate realizations allows us to use the label switching algorithms efficiently and achieve mixture deconvolution. Obviously, the number of posterior realizations can drop significantly in number when we first apply burnin and then drop the bad realizations, so it is good practice to run the BDMCMC for at least 20000 iterations.

Author(s)

Jiaxun Chen, Sakis Micheas, Yuchen Wang

References

Diebolt, J., and Robert, C. P. (1994). Estimation of Finite Mixture Distributions through Bayesian Sampling. Journal of the Royal Statistical Society B, 56, 2, 363-375.

Stephens, M. (2000). Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods. The Annals of Statistics, 28, 1, 40-74.

See Also

PlotUSAStates, plotmix_2d, GetPMEst, plot_chains, check_labels, GetBDTable, GetBDCompfit, plot.intensity_surface, plot.bdmcmc_res, to_int_surf, owin, plot_CompDist, drop_realization, plot_chains, plot_ind, FixLS_da, rnormmix

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
fit <- est_mix_damcmc(spatstat::redwood, m = 3)
fit
plot(fit)
#We work with the California Earthquake data. We fit an IPPP with intensity surface modeled
#by a mixture with 5 normal components.
CAfit=est_mix_damcmc(CAQuakes2014.RichterOver3.0, m=5, L = 20000)
#Now retrieve the surface of Maximum a Posteriori (MAP) estimates of the mixture parameter.
#Note that the resulting surface is not affected by label switching.
MAPsurf=GetMAPEst(CAfit)
#Plot the states and the earthquake locations along with the fitted MAP IPPP intensity
#surface.
ret=PlotUSAStates(states=c('California','Nevada','Arizona'), showcentroids=FALSE,
 shownames=TRUE,main= "Earthquakes in CA, 2014",pp=CAQuakes2014.RichterOver3.0, surf=MAPsurf,
 boundarycolor="white",namescolor="white")
plot(CAfit)
#check labels
check_labels(CAfit)
# Fix label switching, start with approx=TRUE
post_fixed = FixLS_da(CAfit, plot_result = TRUE)
plot_chains(post_fixed)
plot_chains(post_fixed,separate = FALSE)
#this one works generally better but it is slow for large m
post_fixed = FixLS_da(CAfit,approx=FALSE, plot_result = TRUE)
plot_chains(post_fixed)
plot_chains(post_fixed,separate = FALSE)


fitBD <- est_mix_bdmcmc(spatstat::redwood, m = 5)
fitBD
plotsBDredwood=plot(fitBD)
#Earthquakes example
CAfitBD=est_mix_bdmcmc(pp = CAQuakes2014.RichterOver3.0, m = 5)
BDtab=GetBDTable(CAfitBD)#retrieve frequency table and MAP estimate for number of components
BDtab
MAPm=BDtab$MAPcomp
plotsCAfitBD=plot(CAfitBD)
#get the surface of posterior means with MAP components and plot it
plotmix_2d(GetPMEst(CAfitBD,MAPm),CAQuakes2014.RichterOver3.0)
#retrieve all BDMCMC realizations corresponding to a mixture with MAP components
BDfitMAPcomp=GetBDCompfit(CAfitBD,MAPm)
BDfitMAPcomp
plot(BDfitMAPcomp$BDsurf,main=paste("Mixture intensity surface with",MAPm, "components"))
#Example of Dropping bad realizations and working with the MAP surface
open_new_plot=FALSE
truncate=FALSE
truemix4=rnormmix(m = 4, sig0 = .1, df = 5,xlim= c(-2,2), ylim = c(-2,2))
plot(truemix4,xlim= c(-2,2), ylim = c(-2,2),whichplots=0, open_new_window=
 open_new_plot)+add_title("True mixture of normals density")
trueintsurfmix4=to_int_surf(truemix4,lambda = 150,win =spatstat::owin( c(-2,2),c(-2,2)))
#not truncating so let us use a larger window
bigwin=spatstat::owin(c(-4,4),c(-4,4))
ppmix4 <- rsppmix(intsurf = trueintsurfmix4,truncate = truncate,win=bigwin)# draw points
print(plotmix_2d(trueintsurfmix4,ppmix4, open_new_window=open_new_plot,
 win=spatstat::owin(c(-4,4),c(-4,4)))+add_title(
 "True Poisson intensity surface along with the point pattern, W=[-4,4]x[-4,4]",
 lambda =trueintsurfmix4$lambda,m=trueintsurfmix4$m,n=ppmix4$n))
BDMCMCfit=est_mix_bdmcmc(pp = ppmix4, m = 5,L=30000,truncate = truncate)
#check the original distribution of the number of components
plot_CompDist(BDMCMCfit, open_new_window=open_new_plot)
#get the realizations corresponding to the MAP number of components
BDtab=GetBDTable(BDMCMCfit,FALSE)#retrieve frequency table and MAP estimate for the
#number of components
MAPm=BDtab$MAPcomp
BDMCMCfitMAPcomp=GetBDCompfit(BDMCMCfit,MAPm)
BDMCMCfitMAPcompgens=BDMCMCfitMAPcomp$BDgens
#look at the range of the means with the degenerate realizations included
print(range(BDMCMCfitMAPcompgens$genmus[,1,]))
print(range(BDMCMCfitMAPcompgens$genmus[,2,]))
#use the original output of the BDMCMC and apply 10% burnin (default)
BDMCMCfit=drop_realization(BDMCMCfit)
#now we drop the bad realizations
BDMCMCfit=drop_realization(BDMCMCfit, (BDMCMCfit$Badgen==1))
#we see how many realizations are left
plot_CompDist(BDMCMCfit, open_new_window=open_new_plot)
#get the realizations for the MAP only
BDMCMCfitMAPcomp=GetBDCompfit(BDMCMCfit,MAPm)
BDMCMCfitMAPcompgens=BDMCMCfitMAPcomp$BDgens
#check again the range of values for the x-y coords of the component means; they
#should be within the window
print(range(BDMCMCfitMAPcompgens$genmus[,1,]))
print(range(BDMCMCfitMAPcompgens$genmus[,2,]))
#check the MAP surface
plotmix_2d(BDMCMCfitMAPcomp$BDsurf,ppmix4, open_new_window=open_new_plot,
 win=bigwin) +add_title("MAP Poisson intensity surface along with the point pattern",
 lambda =BDMCMCfitMAPcomp$BDsurf$lambda, m=BDMCMCfitMAPcomp$BDsurf$m, n=ppmix4$n,
 L=BDMCMCfitMAPcomp$BDgens$L)
plot_chains(BDMCMCfitMAPcompgens, open_new_window=open_new_plot, separate = FALSE)
#burnin has been applied, set to zero and check for label switching
labelswitch=check_labels(BDMCMCfitMAPcompgens,burnin=0)
#use the identifiability constraint approach first
post_fixedBDMCMCfitIC = FixLS_da(BDMCMCfitMAPcompgens,burnin=0)
plot_chains(post_fixedBDMCMCfitIC, open_new_window=open_new_plot, separate = FALSE)
print(plot_ind(post_fixedBDMCMCfitIC, burnin=0, open_new_window=
 open_new_plot)+add_title("Posterior means of the membership indicators (IC permuted labels)",
 m = post_fixedBDMCMCfitIC$m, n = post_fixedBDMCMCfitIC$data$n))
permSurfaceofPostMeansIC=GetPMEst(post_fixedBDMCMCfitIC, burnin=0)
print(plotmix_2d(permSurfaceofPostMeansIC,ppmix4, open_new_window=open_new_plot,
 win=bigwin)+add_title("Poisson surface of posterior means (IC)",
 lambda=permSurfaceofPostMeansIC$lambda, m=permSurfaceofPostMeansIC$m, n=ppmix4$n,
 L=post_fixedBDMCMCfitIC$L))
#use the decision theoretic approach via SEL to find the best permutation; this one should
#work much better
post_fixedBDMCMCfitSEL = FixLS_da(BDMCMCfitMAPcompgens,approx=FALSE, burnin=0)
plot_chains(post_fixedBDMCMCfitSEL, open_new_window=open_new_plot, separate = FALSE)
print(plot_ind(post_fixedBDMCMCfitSEL, burnin=0, open_new_window=open_new_plot)+add_title(
 "Posterior means of the membership indicators (best permutation)",
 m=post_fixedBDMCMCfitSEL$m,n = post_fixedBDMCMCfitSEL$data$n))
permSurfaceofPostMeansSEL=GetPMEst(post_fixedBDMCMCfitSEL, burnin=0)
print(plotmix_2d(permSurfaceofPostMeansSEL,ppmix4, open_new_window=open_new_plot,
 win=bigwin)+ add_title("Poisson surface of posterior means (best permutation)",
 lambda=permSurfaceofPostMeansSEL$lambda, m=permSurfaceofPostMeansSEL$m,
 n=ppmix4$n,L=post_fixedBDMCMCfitSEL$L))

sppmix documentation built on Jan. 13, 2021, 10:04 p.m.