Description Usage Arguments Details Value Author(s) References See Also Examples
This function finds the most probable cluster partition in a binary product partition model (PPM). The Dirichlet process mixture of binary models is the default PPM.
1 2 | profBinary(formula, data, clust, param, method="agglomerative",
maxiter=1000, crit=1e-6, verbose=FALSE, sampler=FALSE)
|
formula |
a one-sided formula specifying a set of binary response variables. |
data |
a dataframe where |
clust |
optional vector of factors (or coercible to factors) indicating initial clustering among observations. |
param |
optional list containing the any of the named elements |
method |
character string indicating the optimization method to be used. Meaningful values for this string are
|
maxiter |
integer value specifying the maximum number of iterations for the optimization algorithm. |
crit |
numeric scalar constituting a stopping criterion for the |
verbose |
logical value indicating whether the routine should be verbose in printing. |
sampler |
for the "gibbs" method, return the last sampled value instead of the MAP estimate |
This function fits a Dirichlet process mixture of binary models (DPMBM) using the profile method. This method will cluster binary observations vectors (rows of y
) into clusters. The cluster partition is estimated by maximizing the marginal posterior distribution over all possible cluster partitions. Each cluster has an associated binary model. The binary model assigns Bernoulli probabilities independently to each binary valued outcome, corresponding to the columns of y
. The prior parameters a0
and b0
assign a beta prior distribution to each outcome probability. Conditional on the estimated cluster partition, each outcome probability is beta distributed a posteriori. The function profBinary
returns the associated posterior parameters of the beta destribution for each cluster and outcome probability.
Missing observations (NA
) are removed automatically and a warning is issued. The return value contains the reduced observation matrix.
An instance of the class profBinary
containing the following objects
y |
the numeric matrix of observations, where rows with missing observations ( |
param |
the list of prior parameters |
clust |
a numeric vector of integers indicating cluster membership for each non-missing observation |
a |
a list of numeric vectors containing the posterior vector a for each cluster |
b |
a list of numeric vectors containing the posterior vector b for each cluster |
logp |
the unnormalized log value of the marginal posterior mass function for the cluster partition evaluated at |
model |
a model frame, resulting from a call to |
Matt Shotwell
Matthew S. Shotwell (2013). profdpm: An R Package for MAP Estimation in a Class of Conjugate Product Partition Models. Journal of Statistical Software, 53(8), 1-18. URL http://www.jstatsoft.org/v53/i08/.
Ward, J. H. (1963) Heirarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association 58:236-244 MacEachern, S. N. (1994) Estimating Normal Means with Conjugate Style Dirichlet Process Prior. Communications in Statistics B 23:727-741
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | library(profdpm)
set.seed(42)
# simulate two clusters of multivariate binary data
p <- seq(0.9,0.1,length.out=3)
y1 <- matrix(rbinom(333, 1, p), 111, 3, TRUE)
y2 <- matrix(rbinom(333, 1, rev(p)), 111, 3, TRUE)
dat <- as.data.frame(rbind(y1, y2))
# fit the PPM
fitb <- profBinary(~0+., data=dat)
# plot the data ordered by cluster
image(t(as.matrix(fitb$model)[order(fitb$clust),]),
xaxt="n", yaxt="n", col=0:1)
axis(3, labels=paste("V", 1:3, sep=""), at=0:2/2)
# plot the data ordered and colored by cluster
image(t(as.matrix(fitb$model) * fitb$clust)[, order(fitb$clust)],
xaxt="n", yaxt="n", col=0:length(unique(fitb$clust)))
axis(3, labels=paste("V", 1:3, sep=""), at=0:2/2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.