lsspca: Computes LS SPCA components using different variable...

Description Usage Arguments Details Value Author(s) References Examples

View source: R/lsspca.R

Description

For each component, the variables are selected so as to explain a percentage alpha of the variance explained by the corresponding principal component.

Usage

1
2
3
4
5
lsspca(X, alpha = 0.95, maxcard = 0, ncomps = 4,
spcaMethod = "u", scalex = FALSE,
variableSelection = c("exhaustive", "seqrep", "backward", "forward", "lasso"),
really.big = FALSE, force.in = NULL, force.out = NULL, selectfromthese = NULL,
lsspca_forLasso = TRUE, lasso_penalty = 0.5)

Arguments

X

The data matrix.

alpha

Real in [0,1]. percentage of variance of the PCs explained by the sparse component.

maxcard

a vector or an integer. Missing values filled with last value.

ncomps

number of components to compute

spcaMethod

character vector how LS SPCA components are computed: "u" for uncorrelated, "c" for correlated and "p" for projection. If only one value, the same method is used for all components.

scalex

= FALSE, whether to scale the variables to unit variance. Variables are scaled to zero mean (if needed) even if scaleX = FALSE

variableSelection

how the variables for each components are selected 'seqrep' stepwise, 'exhaustive' all subsets 'backward', 'forward', 'lasso'

really.big

logical, set to true if the matrix is large for faster variable selection no exhaustive search, of course

force.in

NULL or list of indeces that must be in component. not for lasso. [NULL]

force.out

NULL or list of indeces cannot be in component. [NULL]

selectfromthese

NULL or list of indeces from which model chosen. [NULL]

lsspca_forLasso

use lsspca with indeces selected with lasso or just the lasso regression

lasso_penalty

real between 0 and 1. 0-> ridge regression, 1 -> lasso

Details

for USPCA, maxcard cannot be smaller than the order of the components computed, so maxcard = c(1, 1, 1) will be automatically changed to maxcard = c(1, 2, 3). Exhaustive search can be slow for matrices with 30 or more variables. See the documentation for leaps::regsubset and glmnet::glmnet for the options.

Value

a list

loadings

Matrix with the loadings scaled to unit L_2 norm.

contributions

Matrix of loadings scaled to unit L_1 norm.

ncomps

integer number of components computed. Default is 4.

cardinality

Vector with the cardinalities of each loadings.

ind

List with the indices of the non-zero loadings for each component.

loadingslist

A list with only the nonzero ladings for each component.

vexp

Vector with the % variance explained by each component.

vexpPC

Vector with the % variance explained by each principal component.

cvexp

Vector with the % cumulative variance explained by each component.

rcvexp

Vector with the % proportion of cumulative variance explained by each component to that explained by the PCs.

scores

the SPCs scores.

PCloadings

Matrix with the PCs loadings scaled to unit L_2 norm.

PCscores

the PCs scores.

spcaMethod

method used to compute the sparse loadings

corComp

Matrix of correlations among the sparse components. Only if spcaMethod != "u" and ncomps > 1.

Call

The called with its arguments.

Author(s)

Giovanni Merola

References

Giovanni M. Merola. 2014. Least Squares Sparse Principal Component Analysis: a Backward Elimination approach to attain large loadings. Austr.&NZ Jou. Stats. 57, pp 391-429

Giovanni M. Merola and Gemai Chen. 2019. Sparse Principal Component Analysis: an efficient Least Squares approach. Jou. Multiv. Analysis 173, pp 366–382 http://arxiv.org/abs/1406.1381

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
## Not run: 
library(LSSPCA)
data(hitters)

dim(hitters)
## USPCA 95
hit_uspca95 = lsspca(X = hitters, alpha = 0.95, ncomps = 4,
                     spcaMethod = "u", subsectSelection = "e")
#> Warning message:
#>  In log(vr) : NaNs produced
## the warnings come from the variable selection, don't worry

##  print contributions (only.nonzero)
print_spca(hit_uspca95)

## summaries
summary_spca(hit_uspca95, contributions = TRUE, digits = 1)

## print loadings individually
lapply(hit_uspca95$loadingslist, function(x) round(x, 2))
## print contributions individually
lapply(hit_uspca95$loadingslist, function(x) round(x/sum(abs(x)), 2))

## plot PC and USPC loadings
par(mfrow = c(1, 2))
barplot(-hit_uspca95$PCloadings[, 1], main = "PCA")
barplot(-hit_uspca95$loadings[, 1], main = "USPCA")
par(mfrow = c(1,1))

## Holzinger data
data(holzinger)
dim(holzinger)

## CSPCA
hol_cspca95 = lsspca(X = holzinger, alpha = 0.95, ncomps = 4,
                     spcaMethod = "c", subsectSelection = "e")

## summaries
t(data.frame(card = hol_cspca95$cardinality,
             cvexp = round(hol_cspca95$cvexp, 2),
             rcvexp = round(hol_cspca95$rcvexp, 2)))

## print loadings
lapply(hol_cspca95$loadingslist, function(x) round(x, 2))
## print contributions
lapply(hol_cspca95$loadingslist, function(x) round(x/sum(abs(x)), 2))

## correlation between SPCs
round(hol_cspca95$corComp, 2)

## plot contributions
barplot(-hol_cspca95$contributions[, 1])

## SPCs scores against PC scores
plot(hol_cspca95$scores[, 1], hol_cspca95$PCscores[, 1], pch = 16)
regline = lm(hol_cspca95$PCscores[, 1] ~ hol_cspca95$scores[, 1]- 1)$coef
abline(a = 0, b = regline, col = 2)


## SPCA on each ability separately
h_groups = lapply(seq(1, 10, 3), function(x) x:(x + 2))

## projection SPCA
hol_block_spca95 = lsspca(X = holzinger, alpha = 0.95, ncomps = 4,
                     spcaMethod = "p", subsectSelection = "e",
                     selectfromthese = h_groups)

## summaries
t(data.frame(card = hol_block_spca95$cardinality,
             cvexp = round(hol_block_spca95$cvexp, 2),
             rcvexp = round(hol_block_spca95$rcvexp, 2)))

## print loadings
lapply(hol_block_spca95$loadingslist, function(x) round(x, 2))

## print contributions
lapply(hol_block_spca95$loadingslist, function(x) round(x/sum(abs(x)), 2))

## correlation between SPCs
round(hol_block_spca95$corComp, 2)

## plot the contributions for each SPC
par(mfrow = c(2, 2))
for(k in 1:4){
  barplot(-hol_block_spca95$contributions[, k])
}
par(mfrow = c(1, 1))

## End(Not run)

merolagio/LSSPCA documentation built on April 29, 2021, 4:17 p.m.