Description Usage Arguments Details Value Author(s) References See Also Examples
This function finds the most probable cluster partition in a linear product partition model (PPM). The Dirichlet process mixture of linear models is the default PPM.
1 2 | profLinear(formula, data, group, clust, param, method="agglomerative",
maxiter=1000, crit=1e-6, verbose=FALSE)
|
formula |
a formula. |
data |
a dataframe where |
group |
optional vector of factors (or coercible to factors) indicating grouping among observations. Observations that are grouped will be clustered together. This is useful if several values form a single longitudinal observation. |
clust |
optional vector of factors (or coercible to factors) indicating initial clustering among observations. Grouped observations (see |
param |
optional list containing the any of the named elements |
method |
character string indicating the optimization method to be used. Meaningful values for this string are
|
maxiter |
integer value specifying the maximum number of iterations for the optimization algorithm. |
crit |
numeric scalar constituting a stopping criterion for the |
verbose |
logical value indicating whether the routine should be verbose in printing. |
This function fits a Dirichlet process mixture of linear models (DPMLM) using the profile method. This method will group the observations into clusters. The clusters are determined by maximizing the marginal posterior distribution over the space of possible clusters. Each cluster has an associated linear model. Notationally, the linear model for cluster k has the form
y = g'x + e,
where y and x are the observation vector and covariate matrix for a particular cluster, e has a multivariate normal distribution with mean zero and precision matrix tI , and g is the vector of linear coefficients. In the DPMLM, conditional on the clustering, g and t have a joint normal-gamma posterior distribution of the form
p(m, t| y, x) = N( g | m, tS ) G( t | a, b ),
where N() is the multivariate normal density function with mean vector m and precision matrix tS and G() is the gamma density function with shape and scale parameters a and b . In addition to the cluster indicators, the posterior quantities S , m , a , and b are provided for each cluster in the return value.
Missing observations (NA
) are removed automatically and a warning is issued.
An instance of the class profLinear
containing the following objects
y |
the numeric outcome vector, where missing observations ( |
x |
the numeric design matrix, where missing covariates ( |
group |
the grouping vector, where missing group values ( |
param |
the list of prior parameters |
clust |
a numeric vector of integers indicating cluster membership for each non-missing observation |
a |
a numeric vector containing the posterior parameter a for each cluster |
b |
a numeric vector containing the posterior parameter b for each cluster |
m |
a list of numeric vectors containing the posterior vector m for each cluster |
s |
a list of numeric matrices containing the posterior matrix S for each cluster |
logp |
the unnormalized log value of the marginal posterior mass function for the cluster partition evaluated at |
model |
a model frame, resulting from a call to |
Matt Shotwell
Ward, J. H. (1963) Heirarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association 58:236-244 MacEachern, S. N. (1994) Estimating Normal Means with Conjugate Style Dirichlet Process Prior. Communications in Statistics B 23:727-741
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | library(profdpm)
set.seed(42)
# set up some data
# linear model 0
x0 <- rnorm(50, 0, 3)
y0 <- x0 + rnorm(50, 0, 1)
# linear model 1
x1 <- rnorm(50, 0, 3)
y1 <- 10 - x1 + rnorm(50, 0, 1)
# add a column of ones to the covariate matrix (intercept)
dat <- data.frame(x=c(x0, x1), y=c(y0,y1))
# indicate grouping within each linear model
grp <- c(rep(seq(0,4),10), rep(seq(5,9),10))
# fit the DPMLM
fit <- profLinear(y ~ x, data=dat, group=grp)
# plot the resulting fit(s)
plot(dat$x, dat$y, xlab='x', ylab='y')
for(i in 1:length(fit$m)) {
abline( a=fit$m[[i]][1], b=fit$m[[i]][2] )
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.