convexGeneralizedPCA: Convex Generalized Principal Component Analysis

Description Usage Arguments Value Examples

Description

Dimensionality reduction for exponential family data by extending Pearson's PCA formulation to minimize deviance. The convex relaxation to projection matrices, the Fantope, is used.

Usage

1
2
3
4
convexGeneralizedPCA(x, k = 2, M = 4, family = c("gaussian", "binomial",
  "poisson", "multinomial"), weights, quiet = TRUE, partial_decomp = FALSE,
  max_iters = 1000, conv_criteria = 1e-06, random_start = FALSE, start_H,
  mu, main_effects = TRUE, normalize = FALSE, ss_factor = 1)

Arguments

x

matrix of either binary, proportions, count, or continuous data

k

number of principal components to return

M

value to approximate the saturated model

family

exponential family distribution of data

weights

an optional matrix of the same size as the x with non-negative weights

quiet

logical; whether the calculation should give feedback

partial_decomp

logical; if TRUE, the function uses the RSpectra package to more quickly calculate the eigen-decomposition. When the number of columns is small, the approximation may be less accurate and slower

max_iters

number of maximum iterations

conv_criteria

convergence criteria. The difference between average deviance in successive iterations

random_start

logical; whether to randomly inititalize the parameters. If FALSE, function will use an eigen-decomposition as starting value

start_H

starting value for the Fantope matrix

mu

main effects vector. Only used if main_effects = TRUE

main_effects

logical; whether to include main effects in the model

normalize

logical; whether to weight the variables to they all have equal influence

ss_factor

step size multiplier. Amount by which to multiply the step size. Quadratic convergence rate can be proven for ss_factor = 1, but I have found higher values sometimes work better. The default is ss_factor = 4. If it is not converging, try ss_factor = 1.

Value

An S3 object of class cgpca which is a list with the following components:

mu

the main effects

H

a rank k Fantope matrix

U

a ceiling(k)-dimentional orthonormal matrix with the loadings

PCs

the princial component scores

M

the parameter inputed

iters

number of iterations required for convergence

loss_trace

the trace of the average deviance using the Fantope matrix

proj_loss_trace

the trace of the average deviance using the projection matrix

prop_deviance_expl

the proportion of deviance explained by this model. If main_effects = TRUE, the null model is just the main effects, otherwise the null model estimates 0 for all natural parameters.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# construct a low rank matrix in the logit scale
rows = 100
cols = 10
set.seed(1)
mat_logit = outer(rnorm(rows), rnorm(cols))

# generate a binary matrix
mat = (matrix(runif(rows * cols), rows, cols) <= inv.logit.mat(mat_logit)) * 1.0

# run convex generalized PCA on it
cgpca = convexGeneralizedPCA(mat, k = 1, M = 4, family = "binomial")

andland/generalizedPCA documentation built on May 12, 2019, 2:42 a.m.