convexLogisticPCA: Convex Logistic Principal Component Analysis
In andland/logisticPCA: Binary Dimensionality Reduction

Description Usage Arguments Value References Examples

Dimensionality reduction for binary data by extending Pearson's PCA formulation to minimize Binomial deviance. The convex relaxation to projection matrices, the Fantope, is used.

convexLogisticPCA(
  x,
  k = 2,
  m = 4,
  quiet = TRUE,
  partial_decomp = FALSE,
  max_iters = 1000,
  conv_criteria = 1e-06,
  random_start = FALSE,
  start_H,
  mu,
  main_effects = TRUE,
  ss_factor = 4,
  weights,
  M
)

`x`	matrix with all binary entries
`k`	number of principal components to return
`m`	value to approximate the saturated model
`quiet`	logical; whether the calculation should give feedback
`partial_decomp`	logical; if `TRUE`, the function uses the RSpectra package to quickly initialize `H` and project onto the Fantope when `ncol(x)` is large and `k` is small
`max_iters`	number of maximum iterations
`conv_criteria`	convergence criteria. The difference between average deviance in successive iterations
`random_start`	logical; whether to randomly inititalize the parameters. If `FALSE`, function will use an eigen-decomposition as starting value
`start_H`	starting value for the Fantope matrix
`mu`	main effects vector. Only used if `main_effects = TRUE`
`main_effects`	logical; whether to include main effects in the model
`ss_factor`	step size multiplier. Amount by which to multiply the step size. Quadratic convergence rate can be proven for `ss_factor = 1`, but I have found higher values sometimes work better. The default is `ss_factor = 4`. If it is not converging, try `ss_factor = 1`.
`weights`	an optional matrix of the same size as the `x` with non-negative weights
`M`	depricated. Use `m` instead

An S3 object of class clpca which is a list with the following components:

`mu`	the main effects
`H`	a rank `k` Fantope matrix
`U`	a `ceiling(k)`-dimentional orthonormal matrix with the loadings
`PCs`	the princial component scores
`m`	the parameter inputed
`iters`	number of iterations required for convergence
`loss_trace`	the trace of the average negative log likelihood using the Fantope matrix
`proj_loss_trace`	the trace of the average negative log likelihood using the projection matrix
`prop_deviance_expl`	the proportion of deviance explained by this model. If `main_effects = TRUE`, the null model is just the main effects, otherwise the null model estimates 0 for all natural parameters.
`rank`	the rank of the Fantope matrix `H`

Landgraf, A.J. & Lee, Y., 2020. Dimensionality reduction for binary data through the projection of natural parameters. Journal of Multivariate Analysis, 180, p.104668. https://arxiv.org/abs/1510.06112 https://doi.org/10.1016/j.jmva.2020.104668

# construct a low rank matrix in the logit scale
rows = 100
cols = 10
set.seed(1)
mat_logit = outer(rnorm(rows), rnorm(cols))

# generate a binary matrix
mat = (matrix(runif(rows * cols), rows, cols) <= inv.logit.mat(mat_logit)) * 1.0

# run convex logistic PCA on it
clpca = convexLogisticPCA(mat, k = 1, m = 4)