sgdgmf.init | R Documentation |
Provide four initialization methods to set the initial values of
a generalized matrix factorization (GMF) model identified by a glm
family
and a linear predictor of the form g(\mu) = \eta = X B^\top + A Z^\top + U V^\top
,
with bijective link function g(\cdot)
.
See sgdgmf.fit
for more details on the model specification.
sgdgmf.init(
Y,
X = NULL,
Z = NULL,
ncomp = 2,
family = gaussian(),
weights = NULL,
offset = NULL,
method = c("ols", "glm", "random", "values"),
type = c("deviance", "pearson", "working", "link"),
niter = 0,
values = list(),
verbose = FALSE,
parallel = FALSE,
nthreads = 1,
savedata = TRUE
)
sgdgmf.init.ols(
Y,
X = NULL,
Z = NULL,
ncomp = 2,
family = gaussian(),
weights = NULL,
offset = NULL,
type = c("deviance", "pearson", "working", "link"),
verbose = FALSE
)
sgdgmf.init.glm(
Y,
X = NULL,
Z = NULL,
ncomp = 2,
family = gaussian(),
weights = NULL,
offset = NULL,
type = c("deviance", "pearson", "working", "link"),
verbose = FALSE,
parallel = FALSE,
nthreads = 1
)
sgdgmf.init.random(
Y,
X = NULL,
Z = NULL,
ncomp = 2,
family = gaussian(),
weights = NULL,
offset = NULL,
sigma = 1
)
sgdgmf.init.custom(
Y,
X = NULL,
Z = NULL,
ncomp = 2,
family = gaussian(),
values = list(),
verbose = FALSE
)
Y |
matrix of responses ( |
X |
matrix of row-specific fixed effects ( |
Z |
matrix of column-specific fixed effects ( |
ncomp |
rank of the latent matrix factorization |
family |
a model family, as in the |
weights |
matrix of constant weights ( |
offset |
matrix of constant offset ( |
method |
optimization method to be used for the initial fit |
type |
type of residuals to be used for initializing |
niter |
number of iterations to refine the initial estimate (only if |
values |
a list of custom initial values for |
verbose |
if |
parallel |
if |
nthreads |
number of cores to be used in parallel (only if |
savedata |
if |
If method = "ols"
, the initialization is performed fitting a sequence of linear
regressions followed by a residual SVD decomposition.
To account for non-Gaussian distribution of the data, regression and
decomposition are applied on the transformed response matrix Y_h = (g \circ h)(Y)
,
where h(\cdot)
is a function which prevent Y_h
to take infinite values.
For instance, in the Binomial case h(y) = 2 (1-\epsilon) y + \epsilon
,
while in the Poisson case h(y) = y + \epsilon
, where \epsilon
is a small
positive constant, typically 0.1
or 0.01
.
If method = "glm"
, the initialization is performed by fitting a sequence of
generalized linear models followed by a residual SVD decomposition.
In particular, to set \beta_j
, we use independent GLM fit with y_j \sim X \beta_j
.
Similarly, to set \alpha_i
, we fit the model y_i \sim Z \alpha_i + o_i
, with offset o_i = B x_i
.
Then, we obtain U
via SVD on the residuals. Finally, we obtain V
via independent GLM fit
under the model y_j \sim U v_j + o_j
, with offset o_i = X \beta_j + A z_j
.
Both under method = "ols"
and method = "glm"
, it is possible to specify the
parameter type
to change the type of residuals used for the SVD decomposition.
If method = "random"
, the initialization is performed using independent Gaussian
random values for all the parameters in the model.
If method = "values"
, the initialization is performed using user-specified
values provided as an input, which must have compatible dimensions.
An initgmf
object, namely a list, containing the initial estimates of the GMF parameters.
In particular, the returned object collects the following information:
Y
: response matrix (only if savedata=TRUE
)
X
: row-specific covariate matrix (only if savedata=TRUE
)
Z
: column-specific covariate matrix (only if savedata=TRUE
)
B
: the estimated col-specific coefficient matrix
A
: the estimated row-specific coefficient matrix
U
: the estimated factor matrix
V
: the estimated loading matrix
phi
: the estimated dispersion parameter
method
: the selected estimation method
family
: the model family
ncomp
: rank of the latent matrix factorization
type
: type of residuals used for the initialization of U
verbose
: if TRUE
, print the status of the initialization process
parallel
: if TRUE
, allows for parallel computing
nthreads
: number of cores to be used in parallel
savedata
: if TRUE
, stores a copy of the input data
library(sgdGMF)
# Set the data dimensions
n = 100; m = 20; d = 5
# Generate data using Poisson, Binomial and Gamma models
data_pois = sim.gmf.data(n = n, m = m, ncomp = d, family = poisson())
data_bin = sim.gmf.data(n = n, m = m, ncomp = d, family = binomial())
data_gam = sim.gmf.data(n = n, m = m, ncomp = d, family = Gamma(link = "log"), dispersion = 0.25)
# Initialize the GMF parameters assuming 3 latent factors
init_pois = sgdgmf.init(data_pois$Y, ncomp = 3, family = poisson(), method = "ols")
init_bin = sgdgmf.init(data_bin$Y, ncomp = 3, family = binomial(), method = "ols")
init_gam = sgdgmf.init(data_gam$Y, ncomp = 3, family = Gamma(link = "log"), method = "ols")
# Get the fitted values in the link and response scales
mu_hat_pois = fitted(init_pois, type = "response")
mu_hat_bin = fitted(init_bin, type = "response")
mu_hat_gam = fitted(init_gam, type = "response")
# Compare the results
oldpar = par(no.readonly = TRUE)
par(mfrow = c(3,3), mar = c(1,1,3,1))
image(data_pois$Y, axes = FALSE, main = expression(Y[Pois]))
image(data_pois$mu, axes = FALSE, main = expression(mu[Pois]))
image(mu_hat_pois, axes = FALSE, main = expression(hat(mu)[Pois]))
image(data_bin$Y, axes = FALSE, main = expression(Y[Bin]))
image(data_bin$mu, axes = FALSE, main = expression(mu[Bin]))
image(mu_hat_bin, axes = FALSE, main = expression(hat(mu)[Bin]))
image(data_gam$Y, axes = FALSE, main = expression(Y[Gam]))
image(data_gam$mu, axes = FALSE, main = expression(mu[Gam]))
image(mu_hat_gam, axes = FALSE, main = expression(hat(mu)[Gam]))
par(oldpar)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.