Description Usage Arguments Details Value Note Author(s) References See Also Examples
dimLRT
is a function that evaluates a likelihood ratio test on the factor model.
It can be used to choose the number of latent factors.
1 2 |
R |
(Regularized) correlation |
X |
A (possibly centered and scaled and possibly subsetted) data |
maxdim |
A |
rankDOF |
A |
graph |
A |
alpha |
A |
Bartlett |
A |
verbose |
A |
The most formal approach to factor analytic dimensionality assessment is through likelihood ratio (LR) testing.
The basic idea is to test the m-factor model against the saturated model.
The corresponding LR criterion then converges, under the standard correlation matrix and corresponding parameter estimates under m-factors, to (n - 1) times a certain discrepancy function evaluated at the maximum-likelihood-parameters under the m-factor model.
This quantity is approximately χ^{2}-distributed under certain regularity conditions (Amemiya & Anderson, 1990).
The general strategy would then be to sequentially test solutions of increasing dimensionality m = 1, …, \mbox{maxdim} until the null hypothesis (stating that the m-factor model holds) is not rejected at Type-I error level alpha
.
The degrees of freedom for the LRT under the m-factor model equals the number of parameters in the saturated model (i.e., the unstructured sample correlation) minus the number of freely estimable parameters in the m-factor model.
Note that the general stategy above makes use of asymptotic results.
In our setting, however, the observation dimension (n) is usually small relative to the feature dimension (p).
Hence, the standard test will in a sense overestimate the degrees of freedom.
One simple option dealing with this observation would be to adapt the degrees of freedom to incorporate the rank deficiency of R
.
This road is taken when rankDOF = TRUE
.
Bartlett (1950) proposed a correction factor when the sample size is small to make the test statistic behave more χ^{2}-like.
This correction factor is used when Bartlett = TRUE
.
When graph = TRUE
the LRT results are visualized.
The graph plots the LRT p-values against the consecutive dimensions of the factor solution.
A horizontal line is plotted at the value provided in the alpha
argument.
Unless the number of observations is much larger than the number of features, the LRT is not recommended for inference in general. In Peeters et al. (2019) the LRT was assessed in a comparative setting inviolving high-dimensional factor models.
The function returns an object of class data.frame
.
The first column represents the assessed dimensions running from 1 to maxdim
.
The second column represents the observed values of the LRT statistic.
The third column represents the corresponding p-values.
Note that, for argument X
, the observations are expected to be in the rows and the features are expected to be in the columns.
The argument maxdim
cannot exceed the Ledermann-bound (Ledermann, 1937): \lfloor [2p + 1 - (8p + 1)^{1/2}]/2\rfloor, where p indicates the observed-feature dimension.
Usually, one wants to set maxdim
much lower than this bound.
note that, if p > n, then the maximum rank of the raw correlation matrix is n - 1. In this case there is an alternative Ledermann-bound when rankDOF = TRUE
. The number of information points in the correlation matrix is then given as n\times (n-1)/2 and this number must exceed p\times \mbox{maxdim} + p - (\mbox{maxdim} \times (\mbox{maxdim} - 1))/2, putting more restrictions on maxdim
.
Other functions for factor analytic dimensionality assessment are dimGB
and dimIC
. In high-dimensional situations usage of dimGB
on the regularized correlation matrix is recommended.
Carel F.W. Peeters <cf.peeters@vumc.nl>, Caroline Ubelhor
Amemiya, Y., & Anderson, T.W. (1990). Asymptotic chi-square tests for a large class of factor analysis models. The Annals of Statistics, 18:1453–1463.
Bartlett, M.S. (1950). Tests of significance in factor analysis. British Journal of Psychology (Statistics Section), 3:77–85.
Ledermann, W. (1937). On the rank of the reduced correlational matrix in multiple factor analysis. Psychometrika, 2:85–93.
Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].
1 2 3 4 5 6 7 8 | ## Simulate some data according to the factor model
## $cormatrix gives the correlation matrix on the generated data
simDAT <- FAsim(p = 50, m = 5, n = 500)
simDAT$cormatrix
## Calculate the LRT for models of factor dimension 1 to 20
LRT <- dimLRT(simDAT$cormatrix, simDAT$data, maxdim = 20, rankDOF = FALSE)
print(LRT)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.