lsspca: Computes LS SPCA components using different variable...
In merolagio/LSSPCA: Computes Least Squares Sparse Principal Components

Description Usage Arguments Details Value Author(s) References Examples

For each component, the variables are selected so as to explain a percentage alpha of the variance explained by the corresponding principal component.

lsspca(X, alpha = 0.95, maxcard = 0, ncomps = 4,
spcaMethod = "u", scalex = FALSE,
variableSelection = c("exhaustive", "seqrep", "backward", "forward", "lasso"),
really.big = FALSE, force.in = NULL, force.out = NULL, selectfromthese = NULL,
lsspca_forLasso = TRUE, lasso_penalty = 0.5)

`X`	The data matrix.
`alpha`	Real in [0,1]. percentage of variance of the PCs explained by the sparse component.
`maxcard`	a vector or an integer. Missing values filled with last value.
`ncomps`	number of components to compute
`spcaMethod`	character vector how LS SPCA components are computed: "u" for uncorrelated, "c" for correlated and "p" for projection. If only one value, the same method is used for all components.
`scalex`	= FALSE, whether to scale the variables to unit variance. Variables are scaled to zero mean (if needed) even if scaleX = FALSE
`variableSelection`	how the variables for each components are selected 'seqrep' stepwise, 'exhaustive' all subsets 'backward', 'forward', 'lasso'
`really.big`	logical, set to true if the matrix is large for faster variable selection no exhaustive search, of course
`force.in`	NULL or list of indeces that must be in component. not for lasso. [NULL]
`force.out`	NULL or list of indeces cannot be in component. [NULL]
`selectfromthese`	NULL or list of indeces from which model chosen. [NULL]
`lsspca_forLasso`	use lsspca with indeces selected with lasso or just the lasso regression
`lasso_penalty`	real between 0 and 1. 0-> ridge regression, 1 -> lasso

for USPCA, maxcard cannot be smaller than the order of the components computed, so maxcard = c(1, 1, 1) will be automatically changed to maxcard = c(1, 2, 3). Exhaustive search can be slow for matrices with 30 or more variables. See the documentation for leaps::regsubset and glmnet::glmnet for the options.

a list

loadings: Matrix with the loadings scaled to unit L_2 norm.
contributions: Matrix of loadings scaled to unit L_1 norm.
ncomps: integer number of components computed. Default is 4.
cardinality: Vector with the cardinalities of each loadings.
ind: List with the indices of the non-zero loadings for each component.
loadingslist: A list with only the nonzero ladings for each component.
vexp: Vector with the % variance explained by each component.
vexpPC: Vector with the % variance explained by each principal component.
cvexp: Vector with the % cumulative variance explained by each component.
rcvexp: Vector with the % proportion of cumulative variance explained by each component to that explained by the PCs.
scores: the SPCs scores.
PCloadings: Matrix with the PCs loadings scaled to unit L_2 norm.
PCscores: the PCs scores.
spcaMethod: method used to compute the sparse loadings
corComp: Matrix of correlations among the sparse components. Only if spcaMethod != "u" and ncomps > 1.
Call: The called with its arguments.

Giovanni Merola

Giovanni M. Merola. 2014. Least Squares Sparse Principal Component Analysis: a Backward Elimination approach to attain large loadings. Austr.&NZ Jou. Stats. 57, pp 391-429

Giovanni M. Merola and Gemai Chen. 2019. Sparse Principal Component Analysis: an efficient Least Squares approach. Jou. Multiv. Analysis 173, pp 366–382 http://arxiv.org/abs/1406.1381

## Not run: 
library(LSSPCA)
data(hitters)

dim(hitters)
## USPCA 95
hit_uspca95 = lsspca(X = hitters, alpha = 0.95, ncomps = 4,
                     spcaMethod = "u", subsectSelection = "e")
#> Warning message:
#>  In log(vr) : NaNs produced
## the warnings come from the variable selection, don't worry

##  print contributions (only.nonzero)
print_spca(hit_uspca95)

## summaries
summary_spca(hit_uspca95, contributions = TRUE, digits = 1)

## print loadings individually
lapply(hit_uspca95$loadingslist, function(x) round(x, 2))
## print contributions individually
lapply(hit_uspca95$loadingslist, function(x) round(x/sum(abs(x)), 2))

## plot PC and USPC loadings
par(mfrow = c(1, 2))
barplot(-hit_uspca95$PCloadings[, 1], main = "PCA")
barplot(-hit_uspca95$loadings[, 1], main = "USPCA")
par(mfrow = c(1,1))

## Holzinger data
data(holzinger)
dim(holzinger)

## CSPCA
hol_cspca95 = lsspca(X = holzinger, alpha = 0.95, ncomps = 4,
                     spcaMethod = "c", subsectSelection = "e")

## summaries
t(data.frame(card = hol_cspca95$cardinality,
             cvexp = round(hol_cspca95$cvexp, 2),
             rcvexp = round(hol_cspca95$rcvexp, 2)))

## print loadings
lapply(hol_cspca95$loadingslist, function(x) round(x, 2))
## print contributions
lapply(hol_cspca95$loadingslist, function(x) round(x/sum(abs(x)), 2))

## correlation between SPCs
round(hol_cspca95$corComp, 2)

## plot contributions
barplot(-hol_cspca95$contributions[, 1])

## SPCs scores against PC scores
plot(hol_cspca95$scores[, 1], hol_cspca95$PCscores[, 1], pch = 16)
regline = lm(hol_cspca95$PCscores[, 1] ~ hol_cspca95$scores[, 1]- 1)$coef
abline(a = 0, b = regline, col = 2)


## SPCA on each ability separately
h_groups = lapply(seq(1, 10, 3), function(x) x:(x + 2))

## projection SPCA
hol_block_spca95 = lsspca(X = holzinger, alpha = 0.95, ncomps = 4,
                     spcaMethod = "p", subsectSelection = "e",
                     selectfromthese = h_groups)

## summaries
t(data.frame(card = hol_block_spca95$cardinality,
             cvexp = round(hol_block_spca95$cvexp, 2),
             rcvexp = round(hol_block_spca95$rcvexp, 2)))

## print loadings
lapply(hol_block_spca95$loadingslist, function(x) round(x, 2))

## print contributions
lapply(hol_block_spca95$loadingslist, function(x) round(x/sum(abs(x)), 2))

## correlation between SPCs
round(hol_block_spca95$corComp, 2)

## plot the contributions for each SPC
par(mfrow = c(2, 2))
for(k in 1:4){
  barplot(-hol_block_spca95$contributions[, k])
}
par(mfrow = c(1, 1))

## End(Not run)