fmglm | R Documentation |
A finite mixture of generalized linear model (GLM)
Returns R6 object of class fmglm.
fmglm
classThe fmglm
class is designed to fit the finite mixture of generalized linear
models including the multinomial regression models.
It contains three major methods:
$new()
creates a fmglm
object given data
and formula
$fit()
fits the fmglm
model. The algorithm used to fit and
the starting method can also be specified in the method. In addition,
this method returns a result list containing coefficients, loglikelihood value,
and information criterion such as AIC and BIC.
$summarize()
generates the table output of the results including
the standard errors and p-values.
The Finite Mixture Model (FMM) with K components can be written in the following form:
f(y|x, φ) = ∑_{k=1}^K{π_k f_k(y|x, θ_k)}
Based on the above equation, the log-likelihood function is:
\log(L) = ∑_{i=1}^N \log{f(y_i|x_i, φ)} = ∑_{i=1}^N \log {∑_{k=1}^K{π_k f_k(y_i|x_i, θ_k)}}
The log-likelihood function can be maximized separately for each component in the M-step, given the posterior probabilities as weights:
\max_{θ_k} ∑_{i=1}^N \hat{p_{ik}} \log f_k(y_i|x_i, θ_k)
where p_{ik} is the posterior probabilty estimated in the E-step. This approach is used in other packages, such as flexmix.
Similar to flexmix, we use glm.fit()
and lm.wfit()
, to fit the log-likelihood function by components.
To use it, set optim_method
to either glm
or lm
and use_llc
to FALSE
.
Alternative, FMM can be viewed as a model with incomplete data where the variable to determine individual's class is missing. The imputed variable z_ik = 1 or 0 captures the classification of each sample. Therefore, the complete-data log-likelihood is:
\log L_c = ∑_{k=1}^K ∑_{i=1}^N z_ik \{ \log{π} + \log{f_k(y_i|x_i, θ_k)} \}
In this package, we use optim
with the BFGS algorithm to maximize the complete-data log-likelihood.
And this method is the default method to fit FMM. As this method involves a mixed log-likelihood function,
we use Rcpp to speed up the computation.
In addition to the normal EM-algorithm, $fit()
can also choose two extended algorithms of the EM-algorithm.
The Classification EM (cem
) assigns each sample to a component based on the maximum value of its posterior probabilities.
The Stocastic EM (sem
) randomly assigns each sample to a component based on its posterior probabilities.
fmglm
to Fit Finite Mixture of Generalized Linear ModelsThe main practice of fmglm
is to fit finite mixture of generalized linear models
such as linear regressions with Gaussian distributions or Poisson distribution.
To fit a linear regression with Gaussian distribution, run the code similar to the following:
model1 <- fmglm$new(formula1, data, family="gaussian", latent=2) result <- model1$fit() output <- model1$summarize()
Benefit from the chain feature of R6, the code can be written in one line like the following:
result <-fmglm$new(formula1, data, family="gaussian", latent=2)$fit()$summarize()
fmglm
to Fit Finite Mixture of Multinomial Regression Modelsfmglm
can fit finite mixture of multinomial regression models as well.
In general the code is similar to fitting the mixture of generalized linear models.
model_mn <- fmglm$new(formula, data, family="multinom", latent=2, method="em")
The difference is that one should prepare the dependent variable as a factor
variable.
In addition, the parameter mn_base
is available in the constructor for identifying
the base group of the dependent variable. The default value is 1
.
fmmr6::fmmr6
-> fmglm
data_model
(DataModel()
)
The DataModel Object that stores the data using in the fmmr6.
family
(character(1)|character()
)
The distribution family which can be either a string like "gaussian"
or a vector like c("gaussian", "gaussian")
.
latent
(integer(1)
)
The number of latent classes.
method
(character(1)
)
The estimation method to fit the fmglm.
start
(matrix()
)
The starting values for the fmglm.
constraint
(matrix()
)
The constraint matrix.
concomitant
(formula(1)
)
The formula to model the concomitant model.
The default value is NULL.
optim_method
(character(1)
)
The optimization method to use to fit the model.
new()
Create a new instance of this R6 R6::R6Class class.
fmglm$new( formula, data, data_str = "default", data_var = NULL, family = "gaussian", latent = 2, method = "em", start = NULL, optim_method = "base", concomitant = NULL, use_llc = TRUE, mn_base = 1, constraint = matrix(1) )
formula
(formula(1)
)
The formula/expression of the model to fit in the fmglm.
data
(data.frame()
)
The Data used in the fmglm.
family
(character(1)|character()
)
The distribution family which can be either a string like "gaussian".
or a vector like c("gaussian", "gaussian")
.
latent
(integer(1)
)
The number of latent classes.
method
(character(1)
)
The estimation method to fit the fmglm.
start
(matrix()
)
The starting values for the fmglm.
optim_method
(character(1)
)
The optimization method to use to fit the model.
The default is base
.
concomitant
(formula(1)
)
The formula for the concomitant model. E.g. ~ z1 + z2 + z3
.
use_llc
(boolean(1)
)
Whether to use the complete log-likelihood or the normal log-likelihood.
The default is TRUE
.
mn_base
(integer(1)
)
Determine which column of the multinomial variable is set to be the base group.
constraint
(matrix()
)
The constraint matrix.
Return a R6 object of class fmglm
fit()
Fit the fmglm model
fmglm$fit(algo = "em", max_iter = 500, start = "random", rep = 1, verbose = F)
algo
(character(1)
)
The algorithm used in fitting the fmglm model.
The default algorithm is em
standing for the normal EM algorithm.
One can choose from c("em", "cem", "sem")
.
cem
is the classification EM algorithm.
sem
is the stochastic EM algorithm.
max_iter
(integer(1)
)
Specify the maximum number of iterations for the E-step-M-step loop.
The default number is 500.
start
(character(1)
)
Specify the starting method of the EM algorithm.
Can either start from kmeans
or random
.
kmeans
use the K-mean methods to put samples into latent classes.
random
randomly assigns samples into latent classes.
The default method is kmeans
.
rep
(integer(1)
)
Specify the number of reps EM-algorithm runs.
This parameter is designed for preventing the local maximum.
Each rep, the EM_algorithm generates a start.
It is only useful when start
is random
.
After all reps, the algorithm will pick the rep with maximum log likelihood.
The default value is 1
verbose
(boolean(1)
)
Print the converging log-likelihood for all steps.
summarize()
Generate a summary for the result.
fmglm$summarize(digits = 3)
digits
(integer(1)
)
Determine how many digits presented in the output.
clone()
The objects of this class are cloneable with this method.
fmglm$clone(deep = FALSE)
deep
Whether to make a deep clone.
Dongjie Wu
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.