disclapmix | R Documentation |
disclapmix
makes inference in a mixture of Discrete Laplace
distributions using the EM algorithm. After the EM algorithm has converged,
the centers are moved if the marginal likelihood increases by doing so. And
then the EM algorithm is run again. This continues until the centers are not
moved.
disclapmix( x, clusters, init_y = NULL, iterations = 100L, eps = 0.001, verbose = 0L, glm_method = "internal_coef", glm_control_maxit = 50L, glm_control_eps = 1e-06, init_y_method = "pam", init_v = NULL, ret_x = FALSE, ... )
x |
Dataset. |
clusters |
The number of clusters/components to fit the model for. |
init_y |
Initial central haplotypes, if NULL, these will be estimated
as described under the |
iterations |
Maximum number of iterations in the EM-algorithm. |
eps |
Convergence stop criteria in the EM algorithm which is compared
to | max
(v\_new - v\_old) | / max(v\_old), where |
verbose |
from 0 to 2 (both including): 0 for silent, 2 for extra verbose. |
glm_method |
|
glm_control_maxit |
Integer giving the maximal number of IWLS iterations. |
glm_control_eps |
Positive convergence tolerance epsilon; the
iterations converge when |
init_y_method |
Which cluster method to use for finding initial central
haplotypes, y: |
init_v |
Matrix with 'nrow(x)' rows and 'clusters' columns specifying initial posterior probabilities to get EM started, if none specified, then 'matrix(1/clusters, nrow = nrow(x), ncol = clusters)' is used. |
ret_x |
Return data 'x' |
... |
Used to detect obsolete usage (when using parameters
|
glm_method
: internal_coef
is the fastest as it uses the
relative changes in the coefficients as a stopping criterium, hence it does
not need to compute the deviance until the very end. In normal situations,
it would not be a problem to use this method. internal_dev
is the
reasonably fast method that uses the deviance as a stopping criterium (like
glm.fit
). glm.fit
to use the traditional glm.fit
IWLS
implementation and is slow compared to the other two methods.
init_y_method
: For init_y_method = 'clara'
, the sampling
parameters are: samples = 100
, sampsize =
min(ceiling(nrow(x)/2), 100 + 2*clusters)
and the random number generator
in R is used.
A disclapmixfit
object:
The supplied GLM method.
The supplied initial central haplotypes,
init_y
.
The supplied method for
choosing initial central haplotypes (only used if init_y
is
NULL
).
Whether the estimation converged or not.
Dataset used to fit the model if 'ret_x' is 'TRUE', else 'NULL'.
The
central haplotypes, y
.
The prior probabilities of
belonging to a cluster, tau
.
The matrix
v
of each observation's probability of belonging to a certain
cluster. The rows are in the same order as the observations in x
used
to generate this fit.
A matrix with the estimated dicrete Laplace parameters.
The
coefficients from the last GLM fit (used to calculate
disclap_parameters
).
Number of observations.
Number of parameters in the model.
Number of iterations performed in total (including moving centers and re-estimating using the EM algorithm).
Full log likelihood of the final model.
Marginal log likelihood of the final model.
BIC based on the full log likelihood of the final model.
BIC based on the marginal log likelihood of the final model.
The gain | max (v\_new - v\_old) | / max(v\_old),
where v
is vic_matrix
mentioned above, during the iterations.
The prior probability of belonging to the centers during the iterations.
Full log likelihood of the models during
the iterations (only calculated when verbose = 2L
).
Marginal log likelihood of the
models during the iterations (only calculated when verbose = 2L
).
BIC based on full log likelihood of the
models during the iterations (only calculated when verbose = 2L
).
BIC based on marginal log likelihood
of the models during the iterations (only calculated when verbose =
2L
).
disclapmix-package
disclapmix
disclapmixfit
predict.disclapmixfit
print.disclapmixfit
summary.disclapmixfit
simulate.disclapmixfit
clusterdist
clusterprob
glm.fit
disclap
pam
clara
# Generate sample database db <- matrix(disclap::rdisclap(1000, 0.3), nrow = 250, ncol = 4) # Add location parameters db <- sapply(1:ncol(db), function(i) as.integer(db[, i]+13+i)) head(db) fit1 <- disclapmix(db, clusters = 1L, verbose = 1L, glm_method = "glm.fit") fit1$disclap_parameters fit1$y fit1b <- disclapmix(db, clusters = 1L, verbose = 1L, glm_method = "internal_coef") fit1b$disclap_parameters fit1b$y max(abs(fit1$disclap_parameters - fit1b$disclap_parameters)) # Generate another type of database db2 <- matrix(disclap::rdisclap(2000, 0.1), nrow = 500, ncol = 4) db2 <- sapply(1:ncol(db2), function(i) as.integer(db2[, i]+14+i)) fit2 <- disclapmix(rbind(db, db2), clusters = 2L, verbose = 1L) fit2$disclap_parameters fit2$y fit2$tau
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.