convexGeneralizedPCA: Convex Generalized Principal Component Analysis
In andland/generalizedPCA: Generalized PCA

Description Usage Arguments Value Examples

Dimensionality reduction for exponential family data by extending Pearson's PCA formulation to minimize deviance. The convex relaxation to projection matrices, the Fantope, is used.

convexGeneralizedPCA(x, k = 2, M = 4, family = c("gaussian", "binomial",
  "poisson", "multinomial"), weights, quiet = TRUE, partial_decomp = FALSE,
  max_iters = 1000, conv_criteria = 1e-06, random_start = FALSE, start_H,
  mu, main_effects = TRUE, normalize = FALSE, ss_factor = 1)

`x`	matrix of either binary, proportions, count, or continuous data
`k`	number of principal components to return
`M`	value to approximate the saturated model
`family`	exponential family distribution of data
`weights`	an optional matrix of the same size as the `x` with non-negative weights
`quiet`	logical; whether the calculation should give feedback
`partial_decomp`	logical; if `TRUE`, the function uses the RSpectra package to more quickly calculate the eigen-decomposition. When the number of columns is small, the approximation may be less accurate and slower
`max_iters`	number of maximum iterations
`conv_criteria`	convergence criteria. The difference between average deviance in successive iterations
`random_start`	logical; whether to randomly inititalize the parameters. If `FALSE`, function will use an eigen-decomposition as starting value
`start_H`	starting value for the Fantope matrix
`mu`	main effects vector. Only used if `main_effects = TRUE`
`main_effects`	logical; whether to include main effects in the model
`normalize`	logical; whether to weight the variables to they all have equal influence
`ss_factor`	step size multiplier. Amount by which to multiply the step size. Quadratic convergence rate can be proven for `ss_factor = 1`, but I have found higher values sometimes work better. The default is `ss_factor = 4`. If it is not converging, try `ss_factor = 1`.

An S3 object of class cgpca which is a list with the following components:

`mu`	the main effects
`H`	a rank `k` Fantope matrix
`U`	a `ceiling(k)`-dimentional orthonormal matrix with the loadings
`PCs`	the princial component scores
`M`	the parameter inputed
`iters`	number of iterations required for convergence
`loss_trace`	the trace of the average deviance using the Fantope matrix
`proj_loss_trace`	the trace of the average deviance using the projection matrix
`prop_deviance_expl`	the proportion of deviance explained by this model. If `main_effects = TRUE`, the null model is just the main effects, otherwise the null model estimates 0 for all natural parameters.

# construct a low rank matrix in the logit scale
rows = 100
cols = 10
set.seed(1)
mat_logit = outer(rnorm(rows), rnorm(cols))

# generate a binary matrix
mat = (matrix(runif(rows * cols), rows, cols) <= inv.logit.mat(mat_logit)) * 1.0

# run convex generalized PCA on it
cgpca = convexGeneralizedPCA(mat, k = 1, M = 4, family = "binomial")