sgdgmf.init: Initialize the parameters of a generalized matrix...

sgdgmf.initR Documentation

Initialize the parameters of a generalized matrix factorization model

Description

Provide four initialization methods to set the initial values of a generalized matrix factorization (GMF) model identified by a glm family and a linear predictor of the form g(\mu) = \eta = X B^\top + A Z^\top + U V^\top, with bijective link function g(\cdot). See sgdgmf.fit for more details on the model specification.

Usage

sgdgmf.init(
  Y,
  X = NULL,
  Z = NULL,
  ncomp = 2,
  family = gaussian(),
  weights = NULL,
  offset = NULL,
  method = c("ols", "glm", "random", "values"),
  type = c("deviance", "pearson", "working", "link"),
  niter = 0,
  values = list(),
  verbose = FALSE,
  parallel = FALSE,
  nthreads = 1,
  savedata = TRUE
)

sgdgmf.init.ols(
  Y,
  X = NULL,
  Z = NULL,
  ncomp = 2,
  family = gaussian(),
  weights = NULL,
  offset = NULL,
  type = c("deviance", "pearson", "working", "link"),
  verbose = FALSE
)

sgdgmf.init.glm(
  Y,
  X = NULL,
  Z = NULL,
  ncomp = 2,
  family = gaussian(),
  weights = NULL,
  offset = NULL,
  type = c("deviance", "pearson", "working", "link"),
  verbose = FALSE,
  parallel = FALSE,
  nthreads = 1
)

sgdgmf.init.random(
  Y,
  X = NULL,
  Z = NULL,
  ncomp = 2,
  family = gaussian(),
  weights = NULL,
  offset = NULL,
  sigma = 1
)

sgdgmf.init.custom(
  Y,
  X = NULL,
  Z = NULL,
  ncomp = 2,
  family = gaussian(),
  values = list(),
  verbose = FALSE
)

Arguments

Y

matrix of responses (n \times m)

X

matrix of row-specific fixed effects (n \times p)

Z

matrix of column-specific fixed effects (q \times m)

ncomp

rank of the latent matrix factorization

family

a model family, as in the glm interface

weights

matrix of constant weights (n \times m)

offset

matrix of constant offset (n \times m)

method

optimization method to be used for the initial fit

type

type of residuals to be used for initializing U via incomplete SVD decomposition

niter

number of iterations to refine the initial estimate (only if method="ols" or "svd")

values

a list of custom initial values for B, A, U and V

verbose

if TRUE, prints the status of the initialization process

parallel

if TRUE, allows for parallel computing using the foreach package (only if method="glm")

nthreads

number of cores to be used in parallel (only if parallel=TRUE and method="glm")

savedata

if TRUE, stores a copy of the input data

Details

If method = "ols", the initialization is performed fitting a sequence of linear regressions followed by a residual SVD decomposition. To account for non-Gaussian distribution of the data, regression and decomposition are applied on the transformed response matrix Y_h = (g \circ h)(Y), where h(\cdot) is a function which prevent Y_h to take infinite values. For instance, in the Binomial case h(y) = 2 (1-\epsilon) y + \epsilon, while in the Poisson case h(y) = y + \epsilon, where \epsilon is a small positive constant, typically 0.1 or 0.01.

If method = "glm", the initialization is performed by fitting a sequence of generalized linear models followed by a residual SVD decomposition. In particular, to set \beta_j, we use independent GLM fit with y_j \sim X \beta_j. Similarly, to set \alpha_i, we fit the model y_i \sim Z \alpha_i + o_i, with offset o_i = B x_i. Then, we obtain U via SVD on the residuals. Finally, we obtain V via independent GLM fit under the model y_j \sim U v_j + o_j, with offset o_i = X \beta_j + A z_j.

Both under method = "ols" and method = "glm", it is possible to specify the parameter type to change the type of residuals used for the SVD decomposition.

If method = "random", the initialization is performed using independent Gaussian random values for all the parameters in the model.

If method = "values", the initialization is performed using user-specified values provided as an input, which must have compatible dimensions.

Value

An initgmf object, namely a list, containing the initial estimates of the GMF parameters. In particular, the returned object collects the following information:

  • Y: response matrix (only if savedata=TRUE)

  • X: row-specific covariate matrix (only if savedata=TRUE)

  • Z: column-specific covariate matrix (only if savedata=TRUE)

  • B: the estimated col-specific coefficient matrix

  • A: the estimated row-specific coefficient matrix

  • U: the estimated factor matrix

  • V: the estimated loading matrix

  • phi: the estimated dispersion parameter

  • method: the selected estimation method

  • family: the model family

  • ncomp: rank of the latent matrix factorization

  • type: type of residuals used for the initialization of U

  • verbose: if TRUE, print the status of the initialization process

  • parallel: if TRUE, allows for parallel computing

  • nthreads: number of cores to be used in parallel

  • savedata: if TRUE, stores a copy of the input data

Examples

library(sgdGMF)

# Set the data dimensions
n = 100; m = 20; d = 5

# Generate data using Poisson, Binomial and Gamma models
data_pois = sim.gmf.data(n = n, m = m, ncomp = d, family = poisson())
data_bin = sim.gmf.data(n = n, m = m, ncomp = d, family = binomial())
data_gam = sim.gmf.data(n = n, m = m, ncomp = d, family = Gamma(link = "log"), dispersion = 0.25)

# Initialize the GMF parameters assuming 3 latent factors
init_pois = sgdgmf.init(data_pois$Y, ncomp = 3, family = poisson(), method = "ols")
init_bin = sgdgmf.init(data_bin$Y, ncomp = 3, family = binomial(), method = "ols")
init_gam = sgdgmf.init(data_gam$Y, ncomp = 3, family = Gamma(link = "log"), method = "ols")

# Get the fitted values in the link and response scales
mu_hat_pois = fitted(init_pois, type = "response")
mu_hat_bin = fitted(init_bin, type = "response")
mu_hat_gam = fitted(init_gam, type = "response")

# Compare the results
oldpar = par(no.readonly = TRUE)
par(mfrow = c(3,3), mar = c(1,1,3,1))
image(data_pois$Y, axes = FALSE, main = expression(Y[Pois]))
image(data_pois$mu, axes = FALSE, main = expression(mu[Pois]))
image(mu_hat_pois, axes = FALSE, main = expression(hat(mu)[Pois]))
image(data_bin$Y, axes = FALSE, main = expression(Y[Bin]))
image(data_bin$mu, axes = FALSE, main = expression(mu[Bin]))
image(mu_hat_bin, axes = FALSE, main = expression(hat(mu)[Bin]))
image(data_gam$Y, axes = FALSE, main = expression(Y[Gam]))
image(data_gam$mu, axes = FALSE, main = expression(mu[Gam]))
image(mu_hat_gam, axes = FALSE, main = expression(hat(mu)[Gam]))
par(oldpar)


sgdGMF documentation built on April 3, 2025, 7:37 p.m.