R/fit.R
In sgdGMF: Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

#' @title Factorize a matrix of non-Gaussian observations using GMF
#'
#' @description
#' Fit a generalized matrix factorization (GMF) model for non-Gaussian data using
#' either deterministic or stochastic optimization methods.
#' It is an alternative to PCA when the observed data are binary, counts, and positive
#' scores or, more generally, when the conditional distribution of the observations
#' can be appropriately described using a dispersion exponential family
#' or a quasi-likelihood model.
#' Some examples are Gaussian, Gamma, Binomial and Poisson probability laws.
#'
#' The dependence among the observations and the variables in the sample can be
#' taken into account through appropriate row- and column-specific regression effects.
#' The residual variability is then modeled through a low-rank matrix factorization.
#'
#' For the estimation, the package implements two deterministic optimization methods,
#' (AIRWLS and Newton) and two stochastic optimization algorithms (adaptive SGD with
#' coordinate-wise and block-wise sub-sampling).
#'
#' @param Y matrix of responses (\eqn{n \times m})
#' @param X matrix of row fixed effects (\eqn{n \times p})
#' @param Z matrix of column fixed effects (\eqn{q \times m})
#' @param family a \code{glm} family (see \code{\link{family}} for more details)
#' @param ncomp rank of the latent matrix factorization (default 2)
#' @param weights an optional matrix of weights (\eqn{n \times m})
#' @param offset an optional matrix of offset values (\eqn{n \times m}), that specify a known component to be included in the linear predictor
#' @param method estimation method to minimize the negative penalized log-likelihood
#' @param sampling sub-sampling strategy to use if \code{method = "sgd"}
#' @param penalty list of penalty parameters (see \code{\link{set.penalty}} for more details)
#' @param control.init list of control parameters for the initialization (see \code{\link{set.control.init}} for more details)
#' @param control.alg list of control parameters for the optimization (see \code{\link{set.control.alg}} for more details)
#'
#' @return
#' An \code{sgdgmf} object, namely a list, containing the estimated parameters of the GMF model.
#' In particular, the returned object collects the following information:
#' \itemize{
#'   \item \code{method}: the selected estimation method
#'   \item \code{family}: the model family
#'   \item \code{ncomp}: rank of the latent matrix factorization
#'   \item \code{npar}: number of unknown parameters to be estimated
#'   \item \code{control.init}: list of control parameters used for the initialization
#'   \item \code{control.alg}: list of control parameters used for the optimization
#'   \item \code{control.cv}: list of control parameters used for the cross.validation
#'   \item \code{Y}: response matrix
#'   \item \code{X}: row-specific covariate matrix
#'   \item \code{Z}: column-specific covariate matrix
#'   \item \code{B}: the estimated col-specific coefficient matrix
#'   \item \code{A}: the estimated row-specific coefficient matrix
#'   \item \code{U}: the estimated factor matrix
#'   \item \code{V}: the estimated loading matrix
#'   \item \code{weights}: weighting matrix
#'   \item \code{offset}: offset matrix
#'   \item \code{eta}: the estimated linear predictor
#'   \item \code{mu}: the estimated mean matrix
#'   \item \code{var}: the estimated variance matrix
#'   \item \code{phi}: the estimated dispersion parameter
#'   \item \code{penalty}: the penalty value at the end of the optimization
#'   \item \code{deviance}: the deviance value at the end of the optimization
#'   \item \code{objective}: the penalized objective function at the end of the optimization
#'   \item \code{aic}: Akaike information criterion
#'   \item \code{bic}: Bayesian information criterion
#'   \item \code{names}: list of row and column names for all the output matrices
#'   \item \code{exe.time}: the total execution time in seconds
#'   \item \code{trace}: a trace matrix recording the optimization history
#'   \item \code{summary.cv}:
#' }
#'
#' @details
#' \strong{Model specification}
#'
#' The model we consider is defined as follows.
#' Let \eqn{Y = (y_{ij})} be a matrix of observed data of dimension \eqn{n \times m}.
#' We assume for the \eqn{(i,j)}th observation in the matrix a dispersion exponential family law
#' \eqn{(y_{ij} \mid \theta_{ij}) \sim EF(\theta_{ij}, \phi)}, where \eqn{\theta_{ij}} is the
#' natural parameter and \eqn{\phi} is the dispersion parameter.
#' Recall that the conditional probability density function of \eqn{y_{ij}} is given by
#' \deqn{f (y_{ij}; \psi) = \exp \big[ w_{ij} \{(y_{ij} \theta_{ij} - b(\theta_{ij})\} / \phi - c(y_{ij}, \phi / w_{ij}) \big],}
#' where \eqn{\psi} is the vector of unknown parameters to be estimated,
#' \eqn{b(\cdot)} is a convex twice differentiable log-partition function,
#' and \eqn{c(\cdot,\cdot)} is the cumulant function of the family.
#'
#' The conditional mean of \eqn{y_{ij}}, say \eqn{\mu_{ij}}, is then modeled as
#' \deqn{g(\mu_{ij}) = \eta_{ij} = x_i^\top \beta_j + \alpha_i^\top z_j + u_i^\top v_j,}
#' where \eqn{g(\cdot)} is a bijective twice differentiable link function, \eqn{\eta_{ij}} is
#' a linear predictor, \eqn{x_i \in \mathbb{R}^p} and \eqn{z_j \in \mathbb{R}^q} are
#' observed covariate vectors, \eqn{\beta_j \in \mathbb{R}^p} and \eqn{\alpha_j \in \mathbb{R}^q}
#' are unknown regression parameters and, finally, \eqn{u_i \in \mathbb{R}^d} and
#' \eqn{v_j \in \mathbb{R}^d} are latent vector explaining the residual varibility
#' not captured by the regression effects.
#' Equivalently, in matrix form, we have
#' \eqn{g(\mu) = \eta = X B^\top + A Z^\top + U V^\top.}
#'
#' The natural parameter \eqn{\theta_{ij}} is linked to the conditional mean of \eqn{y_{ij}}
#' through the equation \eqn{E(y_{ij}) = \mu_{ij} = b'(\theta_{ij})}.
#' Similarly, the variance of \eqn{y_{ij}} is given by
#' \eqn{\text{Var}(y_{ij}) = (\phi / w_{ij}) \,\nu(\mu_{ij}) = (\phi / w_{ij}) \,b''(\mu_{ij})},
#' where \eqn{\nu(\cdot)} is the so-called variance function of the family.
#' Finally, we denote by \eqn{D_\phi(y,\mu)} the deviance function of the family, which
#' is defined as \eqn{D_\phi(y,\mu) = - 2 \log\{ f(y, \psi) / f_0 (y) \}},
#' where \eqn{f_0(y)} is the likelihood of the saturated model.
#'
#' The estimation of the model parameters is performed by minimizing the penalized deviance function
#' \deqn{\displaystyle \ell_\lambda (\psi; y) = - \sum_{i = 1}^{n} \sum_{j = 1}^{m} D_\phi(y_{ij}, \mu_{ij}) + \frac{\lambda_{\scriptscriptstyle U}}{2} \| U \|_F^2 + \frac{\lambda_{\scriptscriptstyle V}}{2} \| V \|_F^2,}
#' where \eqn{\lambda_{\scriptscriptstyle U} > 0} and \eqn{\lambda_{\scriptscriptstyle V} > 0} are regularization parameters and \eqn{\|\cdot\|_F} is the Frobenious norm.
#' Additional \eqn{\ell_2} penalization terms can be introduced to regularize \eqn{B} and \eqn{A}.
#' Quasi-likelihood models can be considered as well, where \eqn{D_\phi(y, \mu)} is substituted by
#' \eqn{Q_\phi(y, \mu) = - \log (\phi/w) - (w / \phi) \int_y^\mu \{(y - t) / \nu(t) \} \,dt,}
#' under an appropriate specification of mean, variance and link functions.
#'
#' \strong{Identifiability constraints}
#'
#' The GMF model is not identifiable being invariant with respect to rotation, scaling
#' and sign-flip transformations of \eqn{U} and \eqn{V}. To enforce the uniqueness of the
#' solution, we impose the following identifiability constraints:
#' \itemize{
#'   \item \eqn{\text{Cov}(U) = U^\top (I_n - 1_n 1_n^\top / n) U / n = I_d},
#'   \item \eqn{V} is lower triangular, with positive diagonal entries,
#' }
#' where \eqn{I_n} and \eqn{1_n} are, respectively, the \eqn{n}-dimensional identity
#' matrix and unitary vector.
#'
#' Alternative identifiability constraints on \eqn{U} and \eqn{V} can be easily obtained
#' by post processing. For instance, a PCA-like solution, say \eqn{U^\top U} is diagonal
#' and \eqn{V^\top V = I_d}, can by obtained by applying the truncated SVD decomposition
#' \eqn{U V^\top = \tilde{U} \tilde{D} \tilde{V}^\top}, and setting
#' \eqn{U = \tilde{U} \tilde{D}} and \eqn{V = \tilde{V}}.
#'
#' \strong{Estimation algorithms}
#'
#' To obtain the penalized maximum likelihood estimate, we here employs
#' four different algorithms
#' \itemize{
#'   \item AIRWLS: alternated iterative re-weighted least squares (\code{method="airwls"});
#'   \item Newton: quasi-Newton algorithm with diagonal Hessian (\code{method="newton"});
#'   \item C-SGD: adaptive stochastic gradient descent with coordinate-wise sub-sampling (\code{method="sgd", sampling="coord"});
#'   \item B-SGD: adaptive stochastic gradient descent with block-wise sub-sampling (\code{method="sgd", sampling="block"});
#'   \item RB-SGD: as B-SGD but with an alternative rule to scan randomly the minibatch blocks (\code{method="sgd", sampling="rnd-block"}).
#' }
#'
#' \strong{Likelihood families}
#'
#' Currently, all standard \code{glm} families are supported, including \code{neg.bin}
#' and \code{negative.binomial} families from the \code{MASS} package.
#' In such a case, the deviance function we consider takes the form
#' \eqn{D_\phi(y, \mu) = 2 w \big[ y \log(y / \mu) - (y + \phi) \log\{ (y + \phi) / (\mu + \phi) \} \big]}.
#' This corresponds to a Negative Binomial model with variance function \eqn{\nu(\mu) = \mu + \mu^2 / \phi}.
#' Then, for \eqn{\phi \rightarrow \infty}, the Negative Binomial likelihood converges
#' to a Poisson likelihood, having linear variance function, say \eqn{\nu(\mu) = \mu}.
#' Notice that the over-dispersion parameter, that here is denoted as \eqn{\phi},
#' in the \code{MASS} package is referred to as \eqn{\theta}.
#' If the Negative Binomial family is selected, a global over-dispersion parameter
#' \eqn{\phi} is estimated from the data using the method of moments.
#'
#' \strong{Parallelization}
#'
#' Parallel execution is implemented in \code{C++} using \code{OpenMP}. When installing
#' and compiling the \code{sgdGMF} package, the compiler check whether \code{OpenMP}
#' is installed in the system. If it is not, the package is compiled excluding all
#' the \code{OpenMP} functionalities and no parallel execution is allowed at \code{C++}
#' level.
#'
#' Notice that \code{OpenMP} is not compatible with \code{R} parallel computing packages,
#' such as \code{parallel} and \code{foreach}. Therefore, when \code{parallel=TRUE},
#' it is not possible to run the \code{sgdgmf.fit} function within \code{R} level
#' parallel functions, e.g., \code{foreach} loop.
#'
#' @references
#' Kidzinnski, L., Hui, F.K.C., Warton, D.I. and Hastie, J.H. (2022).
#' \emph{Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.}
#' Journal of Machine Learning Research, 23: 1-29.
#'
#' Wang, L. and Carvalho, L. (2023).
#' \emph{Deviance matrix factorization.}
#' Electronic Journal of Statistics, 17(2): 3762-3810.
#'
#' Castiglione, C., Segers, A., Clement, L, Risso, D. (2024).
#' \emph{Stochastic gradient descent estimation of generalized matrix factorization models with application to single-cell RNA sequencing data.}
#' arXiv preprint: arXiv:2412.20509.
#'
#' @seealso
#' \code{\link{set.control.init}}, \code{\link{set.control.alg}},
#' \code{\link{sgdgmf.init}}, \code{\link{sgdgmf.rank}},
#' \code{\link{refit.sgdgmf}}, \code{\link{coef.sgdgmf}}, \code{\link{resid.sgdgmf}},
#' \code{\link{fitted.sgdgmf}}, \code{\link{predict.sgdgmf}}, \code{\link{plot.sgdgmf}},
#' \code{\link{screeplot.sgdgmf}}, \code{\link{biplot.sgdgmf}}, \code{\link{image.sgdgmf}}
#'
#' @examples
#' # Load the sgdGMF package
#' library(sgdGMF)
#'
#' # Set the data dimensions
#' n = 100; m = 20; d = 5
#'
#' # Generate data using Poisson, Binomial and Gamma models
#' data = sim.gmf.data(n = n, m = m, ncomp = d, family = poisson())
#'
#' # Estimate the GMF parameters assuming 3 latent factors
#' gmf = sgdgmf.fit(data$Y, ncomp = 3, family = poisson(), method = "airwls")
#'
#' # Get the fitted values in the link and response scales
#' mu_hat = fitted(gmf, type = "response")
#'
#' # Compare the results
#' oldpar = par(no.readonly = TRUE)
#' par(mfrow = c(1,3), mar = c(1,1,3,1))
#' image(data$Y, axes = FALSE, main = expression(Y))
#' image(data$mu, axes = FALSE, main = expression(mu))
#' image(mu_hat, axes = FALSE, main = expression(hat(mu)))
#' par(oldpar)
#'
#' @export sgdgmf.fit
sgdgmf.fit = function (
    Y,
    X = NULL,
    Z = NULL,
    family = gaussian(),
    ncomp = 2,
    weights = NULL,
    offset = NULL,
    method = c("airwls", "newton", "sgd"),
    sampling = c("block", "coord", "rnd-block"),
    penalty = list(),
    control.init = list(),
    control.alg = list()
) {

  # Set the names of the model matrices
  names = list()
  names$Y = list(rows = rownames(Y), cols = colnames(Y))
  names$X = list(rows = rownames(Y), cols = colnames(X))
  names$Z = list(rows = colnames(Y), cols = colnames(Z))
  names$A = list(rows = rownames(Y), cols = colnames(Z))
  names$B = list(rows = colnames(Y), cols = colnames(X))
  names$U = list(rows = rownames(Y), cols = paste0("PC", 1:ncomp))
  names$V = list(rows = colnames(Y), cols = paste0("PC", 1:ncomp))

  # Check and set the model matrices
  Y = set.mat.Y(Y)
  X = set.mat.X(X, nrow(Y), ncol(Y))
  Z = set.mat.Z(Z, nrow(Y), ncol(Y))

  # Check and set weights and offset
  weights = set.mat.weights(weights, nrow(Y), ncol(Y))
  offset = set.mat.offset(offset, nrow(Y), ncol(Y))

  # Set the model dimensions
  n = nrow(Y)
  m = ncol(Y)
  p = ncol(X)
  q = ncol(Z)

  # Check the model family
  family = set.family(family)
  familyname = family$family
  linkname = family$link
  varfname = family$varfun

  # Check the optimization method
  method = match.arg(method)
  sampling = match.arg(sampling)

  # Check the penalty terms
  lambda = do.call("set.penalty", penalty)

  # Check the control parameters for the initialization
  control.init = do.call("set.control.init", control.init)

  # Check the control parameters for the optimization
  control.alg = set.control.alg(method, sampling, control.alg)

  # Initialize the parameters
  time.init = proc.time()
  init = sgdgmf.init(
    Y = Y, X = X, Z = Z, ncomp = ncomp,
    family = family, weights = weights, offset = offset,
    method = control.init$method, type = control.init$type,
    niter = control.init$niter, values = control.init$values,
    verbose = control.init$verbose, parallel = control.init$parallel,
    nthreads = control.init$threads, savedata = FALSE)
  time.init = as.numeric(proc.time() - time.init)[3]

  # Select the correct estimation method
  time.optim = proc.time()

  alg = control.alg
  if (method == "airwls") {
    # AIRWLS algorithm
    fit = cpp.fit.airwls(
      Y = Y, X = X, B = init$B, A = init$A, Z = Z,
      U = init$U, V = init$V, O = offset, W = weights,
      familyname = familyname, linkname = linkname, varfname = varfname,
      ncomp = ncomp, lambda = lambda, maxiter = alg$maxiter, nsteps = alg$nstep,
      stepsize = alg$stepsize, eps = alg$eps, nafill = alg$nafill, tol = alg$tol,
      damping = alg$damping, verbose = alg$verbose, frequency = alg$frequency,
      parallel = alg$parallel, nthreads = alg$nthreads
    )
  }
  if (method == "newton") {
    # Quasi-Newton algorithm
    fit = cpp.fit.newton(
      Y = Y, X = X, B = init$B, A = init$A, Z = Z,
      U = init$U, V = init$V, O = offset, W = weights,
      familyname = familyname, linkname = linkname, varfname = varfname,
      ncomp = ncomp, lambda = lambda, maxiter = alg$maxiter,
      stepsize = alg$stepsize, eps = alg$eps, nafill = alg$nafill,
      tol = alg$tol, damping = alg$damping, verbose = alg$verbose,
      frequency = alg$frequency, parallel = alg$parallel, nthreads = alg$nthreads
    )
  }
  if (method == "sgd" & sampling == "block") {
    # Block-wise adaptive SGD algorithm
    fit = cpp.fit.block.sgd(
      Y = Y, X = X, B = init$B, A = init$A, Z = Z,
      U = init$U, V = init$V, O = offset, W = weights,
      familyname = familyname, linkname = linkname, varfname = varfname,
      ncomp = ncomp, lambda = lambda, maxiter = alg$maxiter, eps = alg$eps,
      nafill = alg$nafill, tol = alg$tol, size1 = alg$size[1], size2 = alg$size[2],
      burn = alg$burn, rate0 = alg$rate0, decay = alg$decay, damping = alg$damping,
      rate1 = alg$rate1, rate2 = alg$rate2, parallel = FALSE, nthreads = 1,
      verbose = alg$verbose, frequency = alg$frequency, progress = alg$progress
    )
  }
  if (method == "sgd" & sampling == "coord") {
    # Coordinate-wise adaptive SGD algorithm
    fit = cpp.fit.coord.sgd(
      Y = Y, X = X, B = init$B, A = init$A, Z = Z,
      U = init$U, V = init$V, O = offset, W = weights,
      familyname = familyname, linkname = linkname, varfname = varfname,
      ncomp = ncomp, lambda = lambda, maxiter = alg$maxiter, eps = alg$eps,
      nafill = alg$nafill, tol = alg$tol, size1 = alg$size[1], size2 = alg$size[2],
      burn = alg$burn, rate0 = alg$rate0, decay = alg$decay, damping = alg$damping,
      rate1 = alg$rate1, rate2 = alg$rate2, parallel = FALSE, nthreads = 1,
      verbose = alg$verbose, frequency = alg$frequency, progress = alg$progress
    )
  }
  if (method == "sgd" & sampling == "rnd-block") {
    # Block-wise adaptive SGD algorithm
    fit = cpp.fit.random.block.sgd(
      Y = Y, X = X, B = init$B, A = init$A, Z = Z,
      U = init$U, V = init$V, O = offset, W = weights,
      familyname = familyname, linkname = linkname, varfname = varfname,
      ncomp = ncomp, lambda = lambda, maxiter = alg$maxiter, eps = alg$eps,
      nafill = alg$nafill, tol = alg$tol, size1 = alg$size[1], size2 = alg$size[2],
      burn = alg$burn, rate0 = alg$rate0, decay = alg$decay, damping = alg$damping,
      rate1 = alg$rate1, rate2 = alg$rate2, parallel = FALSE, nthreads = 1,
      verbose = alg$verbose, frequency = alg$frequency, progress = alg$progress
    )
  }

  time.optim = as.numeric(proc.time() - time.optim)[3]
  time.tot = time.init + time.optim
  exe.time = c(init = time.init, optim = time.optim, tot = time.tot)

  # Model dimensions
  df = p * m + q * n + (n + m) * ncomp
  nm = n * m - sum(is.na(Y))

  # Set the indices of A, B, U and V
  idxA = seq(from = p+1, to = p+q)
  idxB = seq(from = 1, to = p)
  idxU = seq(from = p+q+1, to = p+q+ncomp)
  idxV = seq(from = p+q+1, to = p+q+ncomp)

  # Set the optimization history
  colnames(fit$trace) = c("iter", "dev", "pen", "pdev", "change", "time")

  # Set the overdispersion parameter of a Negative Binomial family
  if (family$family == "negbinom" | substring(family$family, 1, 17) == "Negative Binomial") {
    family = MASS::negative.binomial(theta = fit$phi)
    family = set.family(family)
    family$theta = fit$phi
  }

  # Output list
  out = list()
  out$method = method
  out$sampling = sampling
  out$family = family
  out$ncomp = ncomp
  out$npar = m*p + n*q + (n+m)*ncomp
  out$control.init = control.init
  out$control.alg = control.alg
  out$control.cv = list()
  out$Y = Y
  out$X = X
  out$Z = Z
  out$A = fit$U[, idxA, drop = FALSE]
  out$B = fit$V[, idxB, drop = FALSE]
  out$U = fit$U[, idxU, drop = FALSE]
  out$V = fit$V[, idxV, drop = FALSE]
  out$weights = weights
  out$offset = offset
  out$eta = fit$eta
  out$mu = fit$mu
  out$var = fit$var
  out$phi = fit$phi
  out$penalty = fit$penalty
  out$deviance = fit$deviance
  out$objective = fit$objective
  out$aic = fit$deviance + 2 * df
  out$bic = fit$deviance + df * log(nm)
  out$exe.time = exe.time
  out$trace = as.data.frame(fit$trace)
  out$summary.cv = data.frame()
  out$names = names

  # Normalize the latent factors
  if (alg$normalize) {
    uv = normalize.uv(out$U, out$V, method = "qr")
    out$U = uv$U
    out$V = uv$V
  }

  # Set the S3 class of the output model
  class(out) = "sgdgmf"

  # Return the estimated model
  return (out)
}
Any scripts or data that you put into this service are public.
sgdGMF documentation built on April 3, 2025, 7:37 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
sgdGMF
Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

R/fit.R
In sgdGMF: Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

Try the sgdGMF package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

sgdGMF Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

R/fit.R In sgdGMF: Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

Try the sgdGMF package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

sgdGMF
Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

R/fit.R
In sgdGMF: Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent