fmglm: Finite Mixture of Generalized Linear Models (fmglm) Object

fmglmR Documentation

Finite Mixture of Generalized Linear Models (fmglm) Object

Description

A finite mixture of generalized linear model (GLM)

Value

Returns R6 object of class fmglm.

Introduction of fmglm class

The fmglm class is designed to fit the finite mixture of generalized linear models including the multinomial regression models. It contains three major methods:

  • $new() creates a fmglm object given data and formula

  • $fit() fits the fmglm model. The algorithm used to fit and the starting method can also be specified in the method. In addition, this method returns a result list containing coefficients, loglikelihood value, and information criterion such as AIC and BIC.

  • $summarize() generates the table output of the results including the standard errors and p-values.

The Normal Likelihood Function Vs. The Complete-Data Likelihood Function

The Finite Mixture Model (FMM) with K components can be written in the following form:

f(y|x, φ) = ∑_{k=1}^K{π_k f_k(y|x, θ_k)}

Based on the above equation, the log-likelihood function is:

\log(L) = ∑_{i=1}^N \log{f(y_i|x_i, φ)} = ∑_{i=1}^N \log {∑_{k=1}^K{π_k f_k(y_i|x_i, θ_k)}}

The log-likelihood function can be maximized separately for each component in the M-step, given the posterior probabilities as weights:

\max_{θ_k} ∑_{i=1}^N \hat{p_{ik}} \log f_k(y_i|x_i, θ_k)

where p_{ik} is the posterior probabilty estimated in the E-step. This approach is used in other packages, such as flexmix.

Similar to flexmix, we use glm.fit() and lm.wfit(), to fit the log-likelihood function by components. To use it, set optim_method to either glm or lm and use_llc to FALSE.

Alternative, FMM can be viewed as a model with incomplete data where the variable to determine individual's class is missing. The imputed variable z_ik = 1 or 0 captures the classification of each sample. Therefore, the complete-data log-likelihood is:

\log L_c = ∑_{k=1}^K ∑_{i=1}^N z_ik \{ \log{π} + \log{f_k(y_i|x_i, θ_k)} \}

In this package, we use optim with the BFGS algorithm to maximize the complete-data log-likelihood. And this method is the default method to fit FMM. As this method involves a mixed log-likelihood function, we use Rcpp to speed up the computation.

Alternative Algorithm to EM-algorithm

In addition to the normal EM-algorithm, $fit() can also choose two extended algorithms of the EM-algorithm.

  • The Classification EM (cem) assigns each sample to a component based on the maximum value of its posterior probabilities.

  • The Stocastic EM (sem) randomly assigns each sample to a component based on its posterior probabilities.

Use fmglm to Fit Finite Mixture of Generalized Linear Models

The main practice of fmglm is to fit finite mixture of generalized linear models such as linear regressions with Gaussian distributions or Poisson distribution. To fit a linear regression with Gaussian distribution, run the code similar to the following:

model1 <- fmglm$new(formula1, data, family="gaussian", latent=2)
result <- model1$fit()
output <- model1$summarize()

Benefit from the chain feature of R6, the code can be written in one line like the following:

result <-fmglm$new(formula1, data, family="gaussian", latent=2)$fit()$summarize()

Use fmglm to Fit Finite Mixture of Multinomial Regression Models

fmglm can fit finite mixture of multinomial regression models as well. In general the code is similar to fitting the mixture of generalized linear models.

model_mn <- fmglm$new(formula, data, family="multinom",
                      latent=2, method="em")

The difference is that one should prepare the dependent variable as a factor variable. In addition, the parameter mn_base is available in the constructor for identifying the base group of the dependent variable. The default value is 1.

Super class

fmmr6::fmmr6 -> fmglm

Public fields

data_model

(DataModel())
The DataModel Object that stores the data using in the fmmr6.

family

(character(1)|character())
The distribution family which can be either a string like "gaussian" or a vector like c("gaussian", "gaussian").

latent

(integer(1))
The number of latent classes.

method

(character(1))
The estimation method to fit the fmglm.

start

(matrix())
The starting values for the fmglm.

constraint

(matrix())
The constraint matrix.

concomitant

(formula(1))
The formula to model the concomitant model. The default value is NULL.

optim_method

(character(1))
The optimization method to use to fit the model.

Methods

Public methods

Inherited methods

Method new()

Create a new instance of this R6 R6::R6Class class.

Usage
fmglm$new(
  formula,
  data,
  data_str = "default",
  data_var = NULL,
  family = "gaussian",
  latent = 2,
  method = "em",
  start = NULL,
  optim_method = "base",
  concomitant = NULL,
  use_llc = TRUE,
  mn_base = 1,
  constraint = matrix(1)
)
Arguments
formula

(formula(1))
The formula/expression of the model to fit in the fmglm.

data

(data.frame())
The Data used in the fmglm.

family

(character(1)|character())
The distribution family which can be either a string like "gaussian". or a vector like c("gaussian", "gaussian").

latent

(integer(1))
The number of latent classes.

method

(character(1))
The estimation method to fit the fmglm.

start

(matrix())
The starting values for the fmglm.

optim_method

(character(1))
The optimization method to use to fit the model. The default is base.

concomitant

(formula(1))
The formula for the concomitant model. E.g. ~ z1 + z2 + z3.

use_llc

(boolean(1))
Whether to use the complete log-likelihood or the normal log-likelihood. The default is TRUE.

mn_base

(integer(1))
Determine which column of the multinomial variable is set to be the base group.

constraint

(matrix())
The constraint matrix.

Returns

Return a R6 object of class fmglm


Method fit()

Fit the fmglm model

Usage
fmglm$fit(algo = "em", max_iter = 500, start = "random", rep = 1, verbose = F)
Arguments
algo

(character(1))
The algorithm used in fitting the fmglm model. The default algorithm is em standing for the normal EM algorithm. One can choose from c("em", "cem", "sem"). cem is the classification EM algorithm. sem is the stochastic EM algorithm.

max_iter

(integer(1))
Specify the maximum number of iterations for the E-step-M-step loop. The default number is 500.

start

(character(1))
Specify the starting method of the EM algorithm. Can either start from kmeans or random. kmeans use the K-mean methods to put samples into latent classes. random randomly assigns samples into latent classes. The default method is kmeans.

rep

(integer(1))
Specify the number of reps EM-algorithm runs. This parameter is designed for preventing the local maximum. Each rep, the EM_algorithm generates a start. It is only useful when start is random. After all reps, the algorithm will pick the rep with maximum log likelihood. The default value is 1

verbose

(boolean(1))
Print the converging log-likelihood for all steps.


Method summarize()

Generate a summary for the result.

Usage
fmglm$summarize(digits = 3)
Arguments
digits

(integer(1))
Determine how many digits presented in the output.


Method clone()

The objects of this class are cloneable with this method.

Usage
fmglm$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Author(s)

Dongjie Wu


wudongjie/fmmr6 documentation built on June 24, 2022, 2:48 p.m.