PTdensity: Nonparametric Bayesian density estimation using Mixtures of...
In DPpackage: Bayesian Nonparametric Modeling in R

Description Usage Arguments Details Value Author(s) References See Also Examples

This function generates a posterior density sample for a Mixture of Polya trees model.

1
2
3

PTdensity(y,ngrid=1000,grid=NULL,prior,mcmc,state,status,
          data=sys.frame(sys.parent()),na.action=na.fail)

`y`	a vector or matrix giving the data from which the density estimate is to be computed.
`ngrid`	number of grid points where the density estimate is evaluated. This is only used if dimension of `y` is lower or equal than 2. The default value is 1000.
`grid`	matrix of dimension ngrid*nvar of grid points where the density estimate is evaluated. This is only used if dimension of `y` is lower or equal than 2. The default value is NULL and the grid is chosen according to the range of the data.
`prior`	a list giving the prior information. The list includes the following parameter: `a0` and `b0` giving the hyperparameters for prior distribution of the precision parameter of the Poly tree prior, `alpha` giving the value of the precision parameter (it must be specified if `alpha` is missing, see details below), optionally `M` giving the finite level to be considered (if `M` is specified, a partially specified mixture of Polya trees model is fitted), `nu0` and `tinv` or `tau1` and `tau2` giving the hyperparameters of the inverted Wishart or inverted gamma prior distribution for the centering covariance or variance, respectively, `sigma` giving the value of the standard deviation (univariate case) or covariance matrix (multivariate case) of the centering distribution (if missing and if `nu0` and `tinv` or `tau1` and `tau2` are missing, Jeffrey's prior is used for the centering (co)variance matrix, `m0` and `S0` giving the hyperparameters of the normal prior distribution for the mean of the normal baseline distribution, and `mu` giving the value of the mean of the centering distribution (if missing and if `m0` and `S0` are missing, Jeffery's prior is used for `mu`).
`mcmc`	a list giving the MCMC parameters. The list must include the following integers: `nburn` giving the number of burn-in scans, `nskip` giving the thinning interval, `nsave` giving the total number of scans to be saved, `ndisplay` giving the number of saved scans to be displayed on screen (the function reports on the screen when every `ndisplay` iterations have been carried out), `tune1`, `tune2`, and `tune3` giving the positive Metropolis tuning parameter for the baseline mean, variance, and precision parameter, respectively (the default value is 1.1)
`state`	a list giving the current value of the parameters. This list is used if the current analysis is the continuation of a previous analysis.
`status`	a logical variable indicating whether this run is new (`TRUE`) or the continuation of a previous analysis (`FALSE`). In the latter case the current value of the parameters must be specified in the object `state`.
`data`	data frame.
`na.action`	a function that indicates what should happen when the data contain `NA`s. The default action (`na.fail`) causes `PTdensity` to print an error message and terminate if there are any incomplete observations.

This generic function fits a Mixture of Polya Trees prior for the density estimation (see, e.g., Lavine, 1992 and 1994; Hanson, 2006). In the univariate case, the model is given by:

Y1,...,Yn | G ~ G

G | alpha,mu,sigma2 ~ PT(Pi^{mu,sigma2},\textit{A})

where, the the PT is centered around a N(mu,sigma2) distribution, by taking each m level of the partition Pi^{mu, sigma2} to coincide with the k/2^m, k=0,…,2^m quantile of the N(mu,sigma2) distribution. The family \textit{A}={alphae: e \in E^{*}}, where E^{*}=\bigcup_{m=0}^{M} E^m and E^m is the m-fold product of E=\{0,1\}, was specified as alpha{e1 … em}=α m^2.

Analogous to the univariate model, in the multivariate case the PT prior is characterized by partitions of R^d, and a collection of conditional probabilities that link sets in adjacent tree levels, i.e., they link each parent set in a given level to its 2^d offpring stes in the subsequent level. The multivariate model is given by:

Y1,...,Yn | G ~ G

G | alpha,mu,Sigma ~ PT(Pi^{mu,Sigma},\textit{A})

where, the the PT is centered around a N_d(mu,Sigma) distribution. In this case, the class of partitions that we consider, starts with base sets that are Cartesian products of intervals obtained as quantiles from the standard normal distribution. A multivariate location-scale transformation, Y=mu+Sigma^{1/2} z, is applied to each base set yielding the final sets.

A Jeffry's prior can be specified for the centering parameters,

f(mu,sigma2) \propto 1/sigma2

and

p(mu,Sigma) \propto |Sigma|^{-(d+1)/2}

in the univariate and multivariate case, respectively. Alternatively, the centering parameters can be fixed to user-specified values or proper priors can be assigned. In the univariate case, the following proper priors can be assigned:

mu | m0, S0 ~ N(m0,S0)

sigma^-2 | tau1, tau2 ~ Gamma(tau1/2,tau2/2)

In the multivariate case, the following proper priors can be assigned:

mu | m0, S0 ~ N(m0,S0)

Sigma | nu0, T ~ IW(nu0,T)

Note that the inverted-Wishart prior is parametrized such that E(Sigma)= Tinv/(nu0-q-1).

To complete the model specification, independent hyperpriors are assumed,

alpha | a0, b0 ~ Gamma(a0,b0)

The precision parameter, alpha, of the PT prior can be considered as random, having a gamma distribution, Gamma(a0,b0), or fixed at some particular value. To let alpha to be fixed at a particular value, set a0 to NULL in the prior specification.

In the computational implementation of the model, Metropolis-Hastings steps are used to sample the posterior distribution of the baseline and precision parameters.

An object of class PTdensity representing the Polya tree model fit. Generic functions such as print, plot, and summary have methods to show the results of the fit. The results include mu, sigma or Sigma in the univariate or multivariate case, respectively, and the precision parameter alpha.

The list state in the output object contains the current value of the parameters necessary to restart the analysis. If you want to specify different starting values to run multiple chains set status=TRUE and create the list state based on this starting values. In this case the list state must include the following objects:

`mu`	giving the value of the baseline mean.
`sigma`	giving the baseline standard deviation or the baseline covariance matrix in the univariate or multivariate case, respectively.
`alpha`	giving the value of the precision parameter.

Alejandro Jara <atjara@uc.cl>

Tim Hanson <hansont@stat.sc.edu>

Hanson, T. (2006) Inference for Mixtures of Finite Polya Trees. Journal of the American Statistical Association, 101: 1548-1565.

Lavine, M. (1992) Some aspects of Polya tree distributions for statistical modelling. The Annals of Statistics, 20: 1222-11235.

Lavine, M. (1994) More aspects of Polya tree distributions for statistical modelling. The Annals of Statistics, 22: 1161-1176.

DPdensity, BDPdensity

## Not run: 
    ####################################
    # Univariate example
    ####################################

    # Data
      data(galaxy)
      galaxy<-data.frame(galaxy,speeds=galaxy$speed/1000) 
      attach(galaxy)

    # Initial state
      state <- NULL

    # MCMC parameters
      nburn <- 2000
      nsave <- 5000
      nskip <- 49
      ndisplay <- 500
      mcmc <- list(nburn=nburn,nsave=nsave,nskip=nskip,ndisplay=ndisplay,
                   tune1=0.03,tune2=0.25,tune3=1.8)

    # Prior information
      prior<-list(a0=1,b0=0.01,M=6,m0=21,S0=100,sigma=20)

    # Fitting the model

      fit1 <- PTdensity(y=speeds,ngrid=1000,prior=prior,mcmc=mcmc,
                        state=state,status=TRUE)

    # Posterior means
      fit1

    # Plot the estimated density
      plot(fit1,ask=FALSE)
      points(speeds,rep(0,length(speeds)))

    # Plot the parameters
    # (to see the plots gradually set ask=TRUE)
      plot(fit1,ask=FALSE,output="param")

    # Extracting the density estimate
      cbind(fit1$x1,fit1$dens)


    ####################################
    # Bivariate example
    ####################################

    # Data
      data(airquality)
      attach(airquality)

      ozone <- Ozone**(1/3)
      radiation <- Solar.R

    # Prior information
      prior <- list(a0=5,b0=1,M=4,
                    m0=c(0,0),S0=diag(10000,2),
                    nu0=4,tinv=diag(1,2))

    # Initial state
      state <- NULL

    # MCMC parameters
      nburn <- 2000
      nsave <- 5000
      nskip <- 49
      ndisplay <- 500
      mcmc <- list(nburn=nburn,nsave=nsave,nskip=nskip,ndisplay=ndisplay,
                   tune1=0.8,tune2=1.0,tune3=1)

    # Fitting the model
      fit1 <- PTdensity(y=cbind(radiation,ozone),prior=prior,mcmc=mcmc,
                        state=state,status=TRUE,na.action=na.omit)

      fit1

    # Plot the estimated density
      plot(fit1)

    # Extracting the density estimate
      x1 <- fit1$x1
      x2 <- fit1$x2
      z <- fit1$dens
      par(mfrow=c(1,1))
      contour(x1,x2,z)
      points(fit1$y)  


## End(Not run)