generalizedMF: Exponential Family Matrix Factorization

Description Usage Arguments Value References Examples

Description

Collins et al. (2001)'s Exponential Family PCA

Usage

1
2
3
4
generalizedMF(x, k = 2, family = c("gaussian", "binomial", "poisson"),
  weights, quiet = TRUE, max_iters = 1000, conv_criteria = 1e-05,
  partial_decomp = FALSE, random_start = FALSE, start_A, start_B, mu,
  main_effects = TRUE, method = c("als", "svd"))

Arguments

x

matrix of either binary, proportions, count, or continuous data

k

dimension

family

exponential family distribution of data

weights

an optional matrix of the same size as the x with data weights

quiet

logical; whether the calculation should give feedback

max_iters

maximum number of iterations

conv_criteria

convergence criteria

partial_decomp

logical; if TRUE, the function uses the RSpectra package to more quickly calculate the SVD. When the number of columns is small, the approximation may be less accurate and slower

random_start

whether to randomly initialize A and B

start_A

initial value for A

start_B

initial value for B

mu

specific value for mu, the mean vector of x

main_effects

logical; whether to include main effects in the model

method

which algorithm to use. "als" uses alternating least squares. It has the benefit of majozing row-wise and column-wise for each of the updates. "svd" uses singular value decomposition (similar to de Leeuw, 2006). It has to a more gereral majorization, which may not work well for heterogeneous matrices.

Value

An S3 object of class gmf which is a list with the following components:

mu

the main effects for dimensionality reduction

A

the nxk-dimentional matrix with the scores

B

the dxk-dimentional matrix with the loadings

family

the exponential family of the data

iters

number of iterations required for convergence

loss_trace

the trace of the average deviance of the algorithm. Should be non-increasing

prop_deviance_expl

the proportion of deviance explained by this model. If main_effects = TRUE, the null model is just the main effects, otherwise the null model estimates 0 for all natural parameters.

References

de Leeuw, Jan, 2006. Principal component analysis of binary data by iterated singular value decomposition. Computational Statistics & Data Analysis 50 (1), 21–39.

Collins, M., Dasgupta, S., & Schapire, R. E., 2001. A generalization of principal components analysis to the exponential family. In NIPS, 617–624.

Examples

1
2
3
4
5
6
7
8
9
rows = 100
cols = 10
set.seed(1)
mat_np = outer(rnorm(rows), rnorm(cols))

# generate a count matrix and binary response
mat = matrix(rpois(rows * cols, c(exp(mat_np))), rows, cols)

mod = generalizedMF(mat, k = 1, family = "poisson", quiet = FALSE)

andland/generalizedPCA documentation built on May 12, 2019, 2:42 a.m.