generalizedPCA: Generalized Principal Component Analysis

Description Usage Arguments Value Examples

Description

Dimension reduction for exponential family data by extending Pearson's PCA formulation

Usage

1
2
3
4
5
generalizedPCA(x, k = 2, M = 4, family = c("gaussian", "binomial",
  "poisson", "multinomial"), weights, quiet = TRUE, majorizer = c("row",
  "all"), partial_decomp = FALSE, max_iters = 1000, conv_criteria = 1e-05,
  random_start = FALSE, start_U, start_mu, main_effects = TRUE,
  normalize = FALSE, validation, val_weights)

Arguments

x

matrix of either binary, proportions, count, or continuous data

k

number of principal components to return

M

value to approximate the saturated model

family

exponential family distribution of data

weights

an optional matrix of the same size as the x with data weights

quiet

logical; whether the calculation should give feedback

majorizer

how to majorize the deviance. "row" gives tighter majorization, but may take longer to calculate each iteration. "all" may be faster per iteration, but take more iterations

partial_decomp

logical; if TRUE, the function uses the RSpectra package to more quickly calculate the SVD. When the number of columns is small, the approximation may be less accurate and slower

max_iters

number of maximum iterations

conv_criteria

convergence criteria. The difference between average deviance in successive iterations

random_start

logical; whether to randomly inititalize the parameters. If FALSE, function will use an eigen-decomposition as starting value

start_U

starting value for the orthogonal matrix

start_mu

starting value for mu. Only used if main_effects = TRUE

main_effects

logical; whether to include main effects in the model

normalize

logical; whether to weight the variables to they all have equal influence

validation

a validation dataset to select m with

val_weights

weights associated with validation data

Value

An S3 object of class gpca which is a list with the following components:

mu

the main effects

U

a k-dimentional orthonormal matrix with the loadings

PCs

the princial component scores

M

the parameter inputed

family

the exponential family used

iters

number of iterations required for convergence

loss_trace

the trace of the average deviance of the algorithm. Should be non-increasing

prop_deviance_expl

the proportion of deviance explained by this model. If main_effects = TRUE, the null model is just the main effects, otherwise the null model estimates 0 for all natural parameters.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# construct a low rank matrix in the natural parameter space
rows = 100
cols = 10
set.seed(1)
mat_np = outer(rnorm(rows), rnorm(cols))

# generate a count matrix
mat = matrix(rpois(rows * cols, c(exp(mat_np))), rows, cols)

# run Poisson PCA on it
gpca = generalizedPCA(mat, k = 1, M = 4, family = "poisson")

andland/generalizedPCA documentation built on May 12, 2019, 2:42 a.m.