logisticPCA: Logistic Principal Component Analysis

Description Usage Arguments Value References Examples

View source: R/logisticPCA.R

Description

Dimensionality reduction for binary data by extending Pearson's PCA formulation to minimize Binomial deviance

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
logisticPCA(
  x,
  k = 2,
  m = 4,
  quiet = TRUE,
  partial_decomp = FALSE,
  max_iters = 1000,
  conv_criteria = 1e-05,
  random_start = FALSE,
  start_U,
  start_mu,
  main_effects = TRUE,
  validation,
  M,
  use_irlba
)

Arguments

x

matrix with all binary entries

k

number of principal components to return

m

value to approximate the saturated model. If m = 0, m is solved for

quiet

logical; whether the calculation should give feedback

partial_decomp

logical; if TRUE, the function uses the RSpectra package to more quickly calculate the eigen-decomposition. This is usually faster than standard eigen-decomponsition when ncol(x) > 100 and k is small

max_iters

number of maximum iterations

conv_criteria

convergence criteria. The difference between average deviance in successive iterations

random_start

logical; whether to randomly inititalize the parameters. If FALSE, function will use an eigen-decomposition as starting value

start_U

starting value for the orthogonal matrix

start_mu

starting value for mu. Only used if main_effects = TRUE

main_effects

logical; whether to include main effects in the model

validation

optional validation matrix. If supplied and m = 0, the validation data is used to solve for m

M

depricated. Use m instead

use_irlba

depricated. Use partial_decomp instead

Value

An S3 object of class lpca which is a list with the following components:

mu

the main effects

U

a k-dimentional orthonormal matrix with the loadings

PCs

the princial component scores

m

the parameter inputed or solved for

iters

number of iterations required for convergence

loss_trace

the trace of the average negative log likelihood of the algorithm. Should be non-increasing

prop_deviance_expl

the proportion of deviance explained by this model. If main_effects = TRUE, the null model is just the main effects, otherwise the null model estimates 0 for all natural parameters.

References

Landgraf, A.J. & Lee, Y., 2020. Dimensionality reduction for binary data through the projection of natural parameters. Journal of Multivariate Analysis, 180, p.104668. https://arxiv.org/abs/1510.06112 https://doi.org/10.1016/j.jmva.2020.104668

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# construct a low rank matrix in the logit scale
rows = 100
cols = 10
set.seed(1)
mat_logit = outer(rnorm(rows), rnorm(cols))

# generate a binary matrix
mat = (matrix(runif(rows * cols), rows, cols) <= inv.logit.mat(mat_logit)) * 1.0

# run logistic PCA on it
lpca = logisticPCA(mat, k = 1, m = 4, main_effects = FALSE)

# Logistic PCA likely does a better job finding latent features
# than standard PCA
plot(svd(mat_logit)$u[, 1], lpca$PCs[, 1])
plot(svd(mat_logit)$u[, 1], svd(mat)$u[, 1])

andland/logisticPCA documentation built on Sept. 13, 2020, 12:24 a.m.