knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", fig.width = 5, fig.height=4 )
library(l1rotation)
The l1rotation
package offers functionality to simplify the loading matrix in factor models. It can find the most sparse rotation of the loading matrix using the l1-rotation criterion of Freyaldenhoven (2025).
Under the assumption of sparsity in the true loading matrix, it also solves the problem of rotational indeterminacy inherent to factor models. That is, suppose the data follows a factor model:
$$ X = F \Lambda^{*'} + e $$ where
$X$ is a $T \times n$ data matrix, where there are $T$ rows and $n$ variables, or columns
$F$ is a $T \times r$ matrix of $r$ factors that the data is decomposed into
$\Lambda^{*T}$ is an $r \times n$ matrix of true loadings
$e$ is a $T \times n$ error matrix.
Then, the assumption of sparsity in the loading matrix solves the problem of rotational indeterminacy inherent to factor models: $\Lambda^{*}$ will be the most sparse rotation and can be identified using the l1-rotation criterion Freyaldenhoven (2025).
We will use the example_data
data that ships with the package to show its basic functionality. This data is a matrix containing numeric information with $n = 224$, $T = 207$. In general, data.frames, tibbles, or other data types can also be used with l1rotation
functions, as long as all columns are numeric. Note that the package cannot handle missing values in the data matrix.
To start, let's look at the first seven columns of the example data:
head(example_data[,1:7])
We assume that the number of underlying factors can be learned from the data (e.g., following the procedure in Bai and Ng (2002) or Ahn and Horenstein (2013)). Note that this package does not include functionality to calculate the number of factors - we simply take the number of factors as a user input.
For the example_data
we will use two factors. With just the data, $X$, and the number of factors, $r$, we can start estimating the loadings with local_factors()
. This function estimates $\Lambda^{*}$ and provides helpful diagnostics and figures.
Below is an example using example_data
(note that estimation can also be run in parallel with a selected number of cores, n_cores
):
set.seed(916) lf <- local_factors( X = example_data, r = 2, parallel = FALSE, n_cores = NULL # Runs non-parallel by default )
In the estimation, the only required arguments for local_factors()
are the data to be decomposed and the number of factors. We use the principal components estimator as the initial estimate of the loadings, $\Lambda^0$, which can be accessed via the initial_loadings
item of the output. The function also computes a quick diagnostic to check whether local factors are present in the data which is given in the output item has_local_factors
.
Additionally, there are several rotation diagnostics accessible via rotation_diagnostics
: (1) The rotation matrix, $R$, that when multiplied by $\Lambda^0$ produces $\hat{\Lambda}^*$, (2) the value of the l1 norm for each vector, and (3) the frequency with which the minimization problem converges at each of the estimated loading vectors.
lf$rotation_diagnostics
For a visual interpretation of this rotation, we provide tile plots contrasting the initial estimate $\Lambda^0$, pc_plot
and the rotated estimate, $\hat{\Lambda}^*$, rotated_plot
.
lf$pc_plot
In the initial principal component estimate, recall that each factor is simply a principal component. Of the 207 variables in $X$, the first factor loads most negatively along the variables between 90 and 120, and slightly negatively almost everywhere else. Along the second factor, there are positive loadings between variables 122 and 207 and slightly negative loadings elsewhere. However, it may be difficult to interpret the relationship between variables and factors when all loadings are nonzero.
lf$rotated_plot
The second estimate is a rotated version of the loading matrix optimized for sparsity using the l1-rotation criterion. This estimate is easier to interpret as most variables between 90-207 load negatively on the first factor, variables 0-120 load negatively on the second factor, and all other loadings are close to zero.
l1rotation
supplies two additional functions, find_local_factors()
and test_local_factors()
which provide additional functionality to support the main local_factors()
function.
find_local_factors()
{#flf}This function takes the same inputs as local_factors()
, X
and r
, and has an additional argument, initial_loadings
, that allows the user to specify any orthonormal basis of the loadings rather than defaulting to the principal component estimator. Alternative initial estimates may include Maximum Likelihood based estimation
or Sparse Orthogonal Factor Regression (Uematsu et al. (2019)), for example.
test_local_factors()
{#tlf}This function tests for the presence of local factors given a sparse basis of the loading space. It takes as input X
and r
, and an additional optional argument, loadings
, that allows the user to specify the loading matrix that is to be tested. This argument is set to NULL
by default, which estimates $\hat{\Lambda}^*$ (by maximizing the l1-rotation criterion) and tests it for local factors.
To construct this diagnostic for a given loading matrix estimate, $\hat{\Lambda}$, we find the column with the largest number of entries smaller than some threshold $h_n$:
$$ \mathcal{L(\hat{\Lambda})} = \max_k\left(\sum_{i=1}^n 1{\hat{|\lambda}_{ik}| < h_n }\right) $$ We can then check whether the number of these "small" loadings is larger than $\gamma n$
$$ \texttt{has_local_factors} = 1{\mathcal{L}(\hat{\Lambda}) \geq \gamma n }. $$
Returning to our lf
results, we can take a look at the value of has_local_factors
.
lf$has_local_factors
This value is the result of test_local_factors()
. To verify, we can call test_local_factors()
on two different estimates: the principal components estimate, initial_loadings
, and the l1rotation estimate, rotated_loadings
.
# Check for local factors in PC estimate... test_pc_estimate <- test_local_factors(X = example_data, r = 2, loadings = lf$initial_loadings) # And rotated estimate test_rot_estimate <- test_local_factors(X = example_data, r = 2, loadings = lf$rotated_loadings) test_pc_estimate$has_local_factors test_rot_estimate$has_local_factors
Thus, no local factors are detected using the principal components estimate, while the rotated estimate reveals two local factors. We can also visualize the number of small loadings using the small_loadings_plot
element in our `lf' results:
round_hn <- round(test_rot_estimate$h_n, digits = 3) lf$small_loadings_plot + ggplot2::labs( title = 'Number of "small" loadings per factor', caption = paste('"Small" is defined as loadings less than', round_hn) )
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.