pesel: Automatic estimation of number of principal components in PCA...

View source: R/pesel.R

peselR Documentation

Automatic estimation of number of principal components in PCA with PEnalized SEmi-integrated Likelihood (PESEL)

Description

Underlying assumption is that only small number of principal components, associated with largest singular values, is relevent, while the rest of them is noise. For a given numeric data set, function estimates the number of PCs according to penalized likelihood criterion. Function adjusts the model used to the case when number of variables is larger than the number of observations.

Usage

pesel(
  X,
  npc.min = 0,
  npc.max = 10,
  prior = NULL,
  scale = TRUE,
  method = c("heterogenous", "homogenous"),
  asymptotics = NULL
)

Arguments

X

a data frame or a matrix contatining only continuous variables

npc.min

minimal number of principal components, for all the possible number of PCs between npc.min and npc.max criterion is computed

npc.max

maximal number of principal components, if greater than dimensions of X, min(ncol(X), nrow(X))-1 is used, for all the possible number of PCs between npc.min and npc.max criterion is computed

prior

a numeric positive vector of length npc.max-ncp.min+1. Prior distribution on number of principal components. Defaults to uniform distibution

scale

a boolean, if TRUE (default value) then data is scaled before applying criterion

method

name of criterion to be used

asymptotics

a character, asymptotics ('n' or 'p') to be used. Default is NULL for which asymptotics is selected based on dimensions of X

Details

Please note that no categorical variables and missing values are allowed.

Value

number of components

Examples

# EXAMPLE 1 - noise
with(set.seed(23), pesel(matrix(rnorm(10000), ncol = 100), npc.min = 0))

# EXAMPLE 2 - fixed effects PCA model
sigma <- 0.5
k <-  5
n <- 100
numb.vars <- 10
# factors are drawn from normal distribution
factors <- replicate(k, rnorm(n, 0, 1))
# coefficients are drawn from uniform distribution
coeff <- replicate(numb.vars, rnorm(k, 0, 1))
SIGNAL <- scale(factors %*% coeff)
X <- SIGNAL + replicate(numb.vars, sigma * rnorm(n))
pesel(X)


pesel documentation built on Oct. 17, 2023, 5:14 p.m.