sgdgmf.rank: Rank selection via eigenvalue-gap methods

sgdgmf.rankR Documentation

Rank selection via eigenvalue-gap methods

Description

Select the number of significant principal components of a GMF model via exploitation of eigenvalue-gap methods

Usage

sgdgmf.rank(
  Y,
  X = NULL,
  Z = NULL,
  maxcomp = ncol(Y),
  family = gaussian(),
  weights = NULL,
  offset = NULL,
  method = c("onatski", "act", "oht"),
  type.reg = c("ols", "glm"),
  type.res = c("deviance", "pearson", "working", "link"),
  normalize = FALSE,
  maxiter = 10,
  parallel = FALSE,
  nthreads = 1,
  return.eta = FALSE,
  return.mu = FALSE,
  return.res = FALSE,
  return.cov = FALSE
)

Arguments

Y

matrix of responses (n \times m)

X

matrix of row-specific fixed effects (n \times p)

Z

matrix of column-specific fixed effects (q \times m)

maxcomp

maximum number of eigenvalues to compute

family

a family as in the glm interface (default gaussian())

weights

matrix of optional weights (n \times m)

offset

matrix of optional offsets (n \times m)

method

rank selection method

type.reg

regression method to be used to profile out the covariate effects

type.res

residual type to be decomposed

normalize

if TRUE, standardize column-by-column the residual matrix

maxiter

maximum number of iterations

parallel

if TRUE, allows for parallel computing using foreach

nthreads

number of cores to be used in parallel (only if parallel=TRUE)

return.eta

if TRUE, return the linear predictor martix

return.mu

if TRUE, return the fitted value martix

return.res

if TRUE, return the residual matrix

return.cov

if TRUE, return the covariance matrix of the residuals

Value

A list containing the method, the selected latent rank ncomp, and the eigenvalues used to select the latent rank lambdas. Additionally, if required, in the output list will also provide the linear predictor eta, the predicted mean matrix mu, the residual matrix res, and the implied residual covariance matrix covmat.

References

Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. Review of Economics and Statistics, 92(4): 1004-1016

Gavish, M., Donoho, D.L. (2014) The optimal hard thresholding for singular values is 4/sqrt(3). IEEE Transactions on Information Theory, 60(8): 5040–5053

Fan, J., Guo, J. and Zheng, S. (2020). Estimating number of factors by adjusted eigenvalues thresholding. Journal of the American Statistical Association, 117(538): 852–861

Wang, L. and Carvalho, L. (2023). Deviance matrix factorization. Electronic Journal of Statistics, 17(2): 3762-3810

Examples

library(sgdGMF)

# Set the data dimensions
n = 100; m = 20; d = 5

# Generate data using Poisson, Binomial and Gamma models
data_pois = sim.gmf.data(n = n, m = m, ncomp = d, family = poisson())
data_bin = sim.gmf.data(n = n, m = m, ncomp = d, family = binomial())
data_gam = sim.gmf.data(n = n, m = m, ncomp = d, family = Gamma(link = "log"), dispersion = 0.25)

# Initialize the GMF parameters assuming 3 latent factors
ncomp_pois = sgdgmf.rank(data_pois$Y, family = poisson(), normalize = TRUE)
ncomp_bin = sgdgmf.rank(data_bin$Y, family = binomial(), normalize = TRUE)
ncomp_gam = sgdgmf.rank(data_gam$Y, family = Gamma(link = "log"), normalize = TRUE)

# Get the selected number of components
print(paste("Poisson:", ncomp_pois$ncomp))
print(paste("Binomial:", ncomp_bin$ncomp))
print(paste("Gamma:", ncomp_gam$ncomp))

# Plot the screeplot used for the component determination
oldpar = par(no.readonly = TRUE)
par(mfrow = c(3,1))
barplot(ncomp_pois$lambdas, main = "Poisson screeplot")
barplot(ncomp_bin$lambdas, main = "Binomial screeplot")
barplot(ncomp_gam$lambdas, main = "Gamma screeplot")
par(oldpar)


sgdGMF documentation built on April 3, 2025, 7:37 p.m.