sgdgmf.cv: Model selection via cross-validation for generalized matrix...
In sgdGMF: Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

sgdgmf.cv

R Documentation

Model selection via cross-validation for generalized matrix factorization models

Description

K-fold cross-validation for generalized matrix factorization (GMF) models.

Usage

sgdgmf.cv(
  Y,
  X = NULL,
  Z = NULL,
  family = gaussian(),
  ncomps = seq(from = 1, to = 10, by = 1),
  weights = NULL,
  offset = NULL,
  method = c("airwls", "newton", "sgd"),
  sampling = c("block", "coord", "rnd-block"),
  penalty = list(),
  control.init = list(),
  control.alg = list(),
  control.cv = list()
)

Arguments

`Y`	matrix of responses (`n \times m`)
`X`	matrix of row fixed effects (`n \times p`)
`Z`	matrix of column fixed effects (`q \times m`)
`family`	a `glm` family (see `family` for more details)
`ncomps`	ranks of the latent matrix factorization used in cross-validation (default 1 to 10)
`weights`	an optional matrix of weights (`n \times m`)
`offset`	an optional matrix of offset values (`n \times m`), that specify a known component to be included in the linear predictor.
`method`	estimation method to minimize the negative penalized log-likelihood
`sampling`	sub-sampling strategy to use if `method = "sgd"`
`penalty`	list of penalty parameters (see `set.penalty` for more details)
`control.init`	list of control parameters for the initialization (see `set.control.init` for more details)
`control.alg`	list of control parameters for the optimization (see `set.control.alg` for more details)
`control.cv`	list of control parameters for the cross-validation (see `set.control.cv` for more details)

Details

Cross-validation is performed by minimizing the estimated out-of-sample error, which can be measured in terms of averaged deviance, AIC or BIC calculated on fold-specific test sets. Within each fold, the test set is defined as a fixed proportion of entries in the response matrix which are held out from the estimation process. To this end, the test set entries are hidden by NA values when training the model. Then, the predicted, i.e. imputed, values are used to compute the fold-specific out-of-sample error.

Value

If refit = FALSE (see set.control.cv), the function returns a list containing control.init, control.alg, control.cv and summary.cv. The latter is a matrix collecting the cross-validation results for each combination of fold and latent dimension.

If refit = TRUE (see set.control.cv), the function returns an object of class sgdgmf, obtained by refitting the model on the whole data matrix using the latent dimension selected via cross-validation. The returned object also contains the summary.cv information along with the other standard output of the sgdgmf.fit function.

Examples

# Load the sgdGMF package
library(sgdGMF)

# Set the data dimensions
n = 100; m = 20; d = 5

# Generate data using Poisson, Binomial and Gamma models
data_pois = sim.gmf.data(n = n, m = m, ncomp = d, family = poisson())
data_bin = sim.gmf.data(n = n, m = m, ncomp = d, family = binomial())
data_gam = sim.gmf.data(n = n, m = m, ncomp = d, family = Gamma(link = "log"), dispersion = 0.25)

# Set RUN = TRUE to run the example, it may take some time. To speed up
# the computation it is possible to run CV in parallel specifying
# control.cv = list(parallel = TRUE, nthreads = <number_of_workers>)
# as an argument of sgdgmf.cv()
RUN = FALSE
if (RUN) {
  # Initialize the GMF parameters assuming 3 latent factors
  gmf_pois = sgdgmf.cv(data_pois$Y, ncomp = 1:10, family = poisson())
  gmf_bin = sgdgmf.cv(data_bin$Y, ncomp = 3, family = binomial())
  gmf_gam = sgdgmf.cv(data_gam$Y, ncomp = 3, family = Gamma(link = "log"))

  # Get the fitted values in the link and response scales
  mu_hat_pois = fitted(gmf_pois, type = "response")
  mu_hat_bin = fitted(gmf_bin, type = "response")
  mu_hat_gam = fitted(gmf_gam, type = "response")

  # Compare the results
  oldpar = par(no.readonly = TRUE)
  par(mfrow = c(1,3), mar = c(1,1,3,1))
  image(data_pois$Y, axes = FALSE, main = expression(Y[Pois]))
  image(data_pois$mu, axes = FALSE, main = expression(mu[Pois]))
  image(mu_hat_pois, axes = FALSE, main = expression(hat(mu)[Pois]))
  image(data_bin$Y, axes = FALSE, main = expression(Y[Bin]))
  image(data_bin$mu, axes = FALSE, main = expression(mu[Bin]))
  image(mu_hat_bin, axes = FALSE, main = expression(hat(mu)[Bin]))
  image(data_gam$Y, axes = FALSE, main = expression(Y[Gam]))
  image(data_gam$mu, axes = FALSE, main = expression(mu[Gam]))
  image(mu_hat_gam, axes = FALSE, main = expression(hat(mu)[Gam]))
  par(oldpar)
}

sgdGMF documentation built on June 8, 2025, 12:05 p.m.

sgdGMF index

Package overview README.md Algorithm comparison" Analysis of the residuals" Initialization algorithms" Introduction to the sgdGMF package"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sgdGMF
Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

sgdgmf.cv: Model selection via cross-validation for generalized matrix...
In sgdGMF: Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

Model selection via cross-validation for generalized matrix factorization models

Description

Usage

Arguments

Details

Value

Examples

Related to sgdgmf.cv in sgdGMF...

R Package Documentation

Browse R Packages

We want your feedback!

sgdGMF Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

sgdgmf.cv: Model selection via cross-validation for generalized matrix... In sgdGMF: Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

Model selection via cross-validation for generalized matrix factorization models

Description

Usage

Arguments

Details

Value

Examples

Related to sgdgmf.cv in sgdGMF...

R Package Documentation

Browse R Packages

We want your feedback!

sgdGMF
Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent

sgdgmf.cv: Model selection via cross-validation for generalized matrix...
In sgdGMF: Estimation of Generalized Matrix Factorization Models via Stochastic Gradient Descent