sgdgmf.cv | R Documentation |
K-fold cross-validation for generalized matrix factorization (GMF) models.
sgdgmf.cv(
Y,
X = NULL,
Z = NULL,
family = gaussian(),
ncomps = seq(from = 1, to = 10, by = 1),
weights = NULL,
offset = NULL,
method = c("airwls", "newton", "sgd"),
sampling = c("block", "coord", "rnd-block"),
penalty = list(),
control.init = list(),
control.alg = list(),
control.cv = list()
)
Y |
matrix of responses ( |
X |
matrix of row fixed effects ( |
Z |
matrix of column fixed effects ( |
family |
a |
ncomps |
ranks of the latent matrix factorization used in cross-validation (default 1 to 10) |
weights |
an optional matrix of weights ( |
offset |
an optional matrix of offset values ( |
method |
estimation method to minimize the negative penalized log-likelihood |
sampling |
sub-sampling strategy to use if |
penalty |
list of penalty parameters (see |
control.init |
list of control parameters for the initialization (see |
control.alg |
list of control parameters for the optimization (see |
control.cv |
list of control parameters for the cross-validation (see |
Cross-validation is performed by minimizing the estimated out-of-sample error, which
can be measured in terms of averaged deviance, AIC or BIC calculated on fold-specific
test sets. Within each fold, the test set is defined as a fixed proportion of entries
in the response matrix which are held out from the estimation process.
To this end, the test set entries are hidden by NA
values when training the
model. Then, the predicted, i.e. imputed, values are used to compute the fold-specific
out-of-sample error.
If refit = FALSE
(see set.control.cv
), the function returns a list containing control.init
,
control.alg
, control.cv
and summary.cv
. The latter is a matrix
collecting the cross-validation results for each combination of fold and latent
dimension.
If refit = TRUE
(see set.control.cv
), the function returns an object of class sgdgmf
,
obtained by refitting the model on the whole data matrix using the latent dimension
selected via cross-validation. The returned object also contains the summary.cv
information along with the other standard output of the sgdgmf.fit
function.
# Load the sgdGMF package
library(sgdGMF)
# Set the data dimensions
n = 100; m = 20; d = 5
# Generate data using Poisson, Binomial and Gamma models
data_pois = sim.gmf.data(n = n, m = m, ncomp = d, family = poisson())
data_bin = sim.gmf.data(n = n, m = m, ncomp = d, family = binomial())
data_gam = sim.gmf.data(n = n, m = m, ncomp = d, family = Gamma(link = "log"), dispersion = 0.25)
# Set RUN = TRUE to run the example, it may take some time. To speed up
# the computation it is possible to run CV in parallel specifying
# control.cv = list(parallel = TRUE, nthreads = <number_of_workers>)
# as an argument of sgdgmf.cv()
RUN = FALSE
if (RUN) {
# Initialize the GMF parameters assuming 3 latent factors
gmf_pois = sgdgmf.cv(data_pois$Y, ncomp = 1:10, family = poisson())
gmf_bin = sgdgmf.cv(data_bin$Y, ncomp = 3, family = binomial())
gmf_gam = sgdgmf.cv(data_gam$Y, ncomp = 3, family = Gamma(link = "log"))
# Get the fitted values in the link and response scales
mu_hat_pois = fitted(gmf_pois, type = "response")
mu_hat_bin = fitted(gmf_bin, type = "response")
mu_hat_gam = fitted(gmf_gam, type = "response")
# Compare the results
oldpar = par(no.readonly = TRUE)
par(mfrow = c(1,3), mar = c(1,1,3,1))
image(data_pois$Y, axes = FALSE, main = expression(Y[Pois]))
image(data_pois$mu, axes = FALSE, main = expression(mu[Pois]))
image(mu_hat_pois, axes = FALSE, main = expression(hat(mu)[Pois]))
image(data_bin$Y, axes = FALSE, main = expression(Y[Bin]))
image(data_bin$mu, axes = FALSE, main = expression(mu[Bin]))
image(mu_hat_bin, axes = FALSE, main = expression(hat(mu)[Bin]))
image(data_gam$Y, axes = FALSE, main = expression(Y[Gam]))
image(data_gam$mu, axes = FALSE, main = expression(mu[Gam]))
image(mu_hat_gam, axes = FALSE, main = expression(hat(mu)[Gam]))
par(oldpar)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.