gllim: EM Algorithm for Gaussian Locally Linear Mapping

View source: R/gllim.R

gllimR Documentation

EM Algorithm for Gaussian Locally Linear Mapping

Description

EM Algorithm for Gaussian Locally Linear Mapping

Usage

gllim(tapp,yapp,in_K,in_r=NULL,maxiter=100,Lw=0,cstr=NULL,verb=0,in_theta=NULL,...)

Arguments

tapp

An Lt x N matrix of training responses with variables in rows and subjects in columns

yapp

An D x N matrix of training covariates with variables in rows and subjects in columns

in_K

Initial number of components

in_r

Initial assignments (default NULL)

maxiter

Maximum number of iterations (default 100). The algorithm stops if the number of iterations exceeds maxiter or if the difference of likelihood between two iterations is smaller than a threshold fixed to 0.001 (max(LL)-min(LL)) where LL is the vector of log-likelihoods at the successive iterations.

Lw

Number of hidden components (default 0)

cstr

Constraints on error covariance matrices. Must be a list as following cstr=list(Sigma="i") constraints \Sigma_k to be diagonal and isotropic, which is the default. See details section hereafter to see the other available options to constraint the covariance matrices.

verb

Verbosity: print out the progression of the algorithm. If verb=0, there is no print, if verb=1, the progression is printed out. Default is 0.

in_theta

The EM algorithm can be initialized either with initial assignments or initial parameters values. In that case, the initial parameters (default NULL) must have the same structure as the output theta of this function.

...

other arguments to be passed for internal use only

Details

The GLLiM model implemented in this function adresses the following non-linear mapping issue:

E(Y | X=x) = g(x),

where Y is a L-vector of multivariate responses and X is a large D-vector of covariates' profiles such that D \gg L. The methods implemented in this package aims at estimating the non linear regression function g.

First, the methods of this package are based on an inverse regression strategy. The inverse conditional relation p(X | Y) is specified in a way that the forward relation of interest p(Y | X) can be deduced in closed-from. Under some hypothesis on covariance structures, the large number D of covariates is handled by this inverse regression trick, which acts as a dimension reduction technique. The number of parameters to estimate is therefore drastically reduced. Second, we propose to approximate the non linear g regression function by a piecewise affine function. Therefore, a hidden discrete variable Z is introduced, in order to divide the space into K regions such that an affine model holds between responses Y and variables X in each region k:

X = \sum_{k=1}^K I_{Z=k} (A_k Y + b_k + E_k)

where A_k is a D \times L matrix of coeffcients for regression k, b_k is a D-vector of intercepts and E_k is a Gaussian noise with covariance matrix \Sigma_k.

GLLiM is defined as the following hierarchical Gaussian mixture model for the inverse conditional density (X | Y):

p(X | Y=y,Z=k;\theta) = N(X; A_kx+b_k,\Sigma_k)

p(Y | Z=k; \theta) = N(Y; c_k,\Gamma_k)

p(Z=k)=\pi_k

where \theta is the set of parameters \theta=(\pi_k,c_k,\Gamma_k,A_k,b_k,\Sigma_k)_{k=1}^K. The forward conditional density of interest p(Y | X) is deduced from these equations and is also a Gaussian mixture of regression model.

gllim allows the addition of L_w latent variables in order to account for correlation among covariates or if it is supposed that responses are only partially observed. Adding latent factors is known to improve prediction accuracy, if L_w is not too large with regard to the number of covariates. When latent factors are added, the dimension of the response is L=L_t+L_w and L=L_t otherwise.

For GLLiM, the number of parameters to estimate is:

(K-1)+ K(DL+D+L_t+ nbpar_{\Sigma}+nbpar_{\Gamma})

where L=L_w+L_t and nbpar_{\Sigma} (resp. nbpar_{\Gamma}) is the number of parameters in each of the large (resp. small) covariance matrix \Sigma_k (resp. \Gamma_k). For example,

  • if the constraint on \Sigma is cstr$Sigma="i", then nbpar_{\Sigma}=1,which is the default constraint in the gllim function

  • if the constraint on \Sigma is cstr$Sigma="d", then nbpar_{\Sigma}=D,

  • if the constraint on \Sigma is cstr$Sigma="", then nbpar_{\Sigma}=D(D+1)/2,

  • if the constraint on \Sigma is cstr$Sigma="*", then nbpar_{\Sigma}=D(D+1)/(2K).

The rule to compute the number of parameters of \Gamma is the same as \Sigma, replacing D by L_t. Currently the \Gamma_k matrices are not constrained and nbpar_{\Gamma}=L_t(L_t+1)/2 because for indentifiability reasons the L_w part is set to the identity matrix.

The user must choose the number of mixtures components K and, if needed, the number of latent factors L_w. For small datasets (less than 100 observations), it is suggested to select both (K,L_w) by minimizing the BIC criterion. For larger datasets, it is suggested to save computational time, to set L_w using BIC while setting K to an arbitrary value large enough to catch non linear relations between responses and covariates and small enough to have several observations (at least 10) in each clusters. Indeed, for large datasets, the number of clusters should not have a strong impact on the results while it is sufficiently large.

Value

Returns a list with the following elements:

LLf

Final log-likelihood

LL

Log-likelihood value at each iteration of the EM algorithm

pi

A vector of length K of mixture weights i.e. prior probabilities for each component

c

An (L x K) matrix of means of responses (Y) where L=Lt+Lw

Gamma

An (L x L x K) array of K matrices of covariances of responses (Y) where L=Lt+Lw

A

An (D x L x K) array of K matrices of linear transformation matrices where L=Lt+Lw

b

An (D x K) matrix in which affine transformation vectors are in columns

Sigma

An (D x D x K) array of covariances of X

r

An (N x K) matrix of posterior probabilities

nbpar

The number of parameters estimated in the model

Author(s)

Emeline Perthame (emeline.perthame@inria.fr), Florence Forbes (florence.forbes@inria.fr), Antoine Deleforge (antoine.deleforge@inria.fr)

References

[1] A. Deleforge, F. Forbes, and R. Horaud. High-dimensional regression with Gaussian mixtures and partially-latent response variables. Statistics and Computing, 25(5):893–911, 2015.

[2] E. Perthame, F. Forbes, and A. Deleforge. Inverse regression approach to robust nonlinear high-to-low dimensional mapping. Journal of Multivariate Analysis, 163(C):1–14, 2018. https://doi.org/10.1016/j.jmva.2017.09.009

Converted to R from the Matlab code of the GLLiM toolbox available on: https://team.inria.fr/perception/gllim_toolbox/

See Also

xLLiM-package, emgm, gllim_inverse_map, sllim

Examples

data(data.xllim)

## Setting 5 components in the model
K =5

## the model can be initialized by running an EM algorithm for Gaussian Mixtures (EMGM)
r = emgm(data.xllim, init=K); 
## and then the gllim model is estimated
responses = data.xllim[1:2,] # 2 responses in rows and 100 observations in columns
covariates = data.xllim[3:52,] # 50 covariates in rows and 100 observations in columns
mod = gllim(responses,covariates,in_K=K,in_r=r);

## if initialization is not specified, the model is automatically initialized by EMGM
## mod = gllim(responses,covariates,in_K=K)

## Adding 1 latent factor 
## mod = gllim(responses,covariates,in_K=K,in_r=r,Lw=1)

## Some constraints on the covariance structure of \eqn{X} can be added
## mod = gllim(responses,covariates,in_K=K,in_r=r,cstr=list(Sigma="i")) 
# Isotropic covariances
# (same variance among covariates but different in each component)

## mod = gllim(responses,covariates,in_K=K,in_r=r,cstr=list(Sigma="d")) 
# Heteroskedastic covariances
# (variances are different among covariates and in each component)

## mod = gllim(responses,covariates,in_K=K,in_r=r,cstr=list(Sigma="")) 
# Unconstrained full matrix

## mod = gllim(responses,covariates,in_K=K,in_r=r,cstr=list(Sigma="*")) 
# Full matrix but equal between components

xLLiM documentation built on Nov. 2, 2023, 5:17 p.m.