scads: scads: Sparse SCA/PCA with constraints on the component...

Description Usage Arguments Value References Examples

View source: R/RcppExports.R

Description

This function performs sparse SCA/PCA with constraints on the component weights and/or ridge and lasso regularization.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
scads(
  X,
  ncomp,
  ridge,
  lasso,
  constraints,
  itr,
  Wstart,
  tol = 1e-07,
  nStarts = 1L,
  printLoss = TRUE
)

Arguments

X

A data matrix of class matrix

ncomp

The number of components to estimate (an integer)

ridge

A numeric value containing the ridge parameter for ridge regularization on the component weight matrix W

lasso

A vector containing a ridge parameter for each column of W separately, to set the same lasso penalty for the component weights W, specify: lasso = rep(value, ncomp)

constraints

A matrix of the same dimensions as the component weights matrix W (ncol(X) x ncomp). A zero entry corresponds in constraints corresponds to an element in the same location in W that needs to be constraint to zero. A non-zero entry corresponds to an element in the same location in W that needs to be estimated.

itr

The maximum number of iterations (an integer)

Wstart

A matrix of ncomp columns and nrow(X) rows with starting values for the component weight matrix W, if Wstart only contains zeros, a warm start is used: the first ncomp right singular vectors of X

tol

The convergence is determined by comparing the loss function value after each iteration, if the difference is smaller than tol, the analysis is converged. The default value is 10e-8.

nStarts

The number of random starts the analysis should perform. The first start will be performed with the values given by Wstart. The consecutive starts will be Wstart plus a matrix with random uniform values times the current start number (the first start has index zero).

printLoss

A boolean: TRUE will print the lossfunction value each 10th iteration.

Value

A list containing:
W A matrix containing the component weights
P A matrix containing the loadings
loss A numeric variable containing the minimum loss function value of all the nStarts starts
converged A boolean containing TRUE if converged FALSE if not converged.

References

De Schipper, N. C., & Van Deun, K. (2018). Revealing the Joint Mechanisms in Traditional Data Linked With Big Data. Zeitschrift Für Psychologie, 226(4), 212–231. doi:10.1027/2151-2604/a000341

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
J <- 30
X <- matrix(rnorm(100*J), 100, J)
ncomp <- 3 
constraints <- matrix(1, J, ncomp) # No constraints 

scads(X, ncomp = ncomp, ridge = 10e-8, lasso = rep(1, ncomp), 
        constraints = constraints, Wstart = matrix(0, J, ncomp), itr = 10e5)
        
# Extended examples:
# Example 1: Perform PCA with elistastic net regularization no constraints 
#create sample dataset
ncomp <- 3 
J <- 30
comdis <- matrix(1, J, ncomp)
comdis <- sparsify(comdis, 0.7) #set 70% of the 1's to zero
variances <- makeVariance(varianceOfComps = c(100, 80, 70), J = J, error = 0.05) #create realistic eigenvalues
dat <- makeDat(n = 100, comdis = comdis, variances = variances)
X <- dat$X

results <- scads(X = X, ncomp = ncomp, ridge = 0.1, lasso = rep(0.1, ncomp),
                constraints = matrix(1, J, ncomp), Wstart = matrix(0, J, ncomp),
                itr = 100000, nStarts = 1, printLoss = TRUE , tol = 10^-8)

head(results$W) #inspect results of the estimation
head(dat$P[, 1:ncomp]) #inspect data generating model


# Example 2: Perform SCA with lasso regularization try out all common dinstinctive structures
# create sample data, with common and distinctive structure
ncomp <- 3 
J <- 30
comdis <- matrix(1, J, ncomp)
comdis[1:15, 1] <- 0 
comdis[15:30, 2] <- 0 

comdis <- sparsify(comdis, 0.2) #set 20 percent of the 1's to zero
variances <- makeVariance(varianceOfComps = c(100, 80, 90), J = J, error = 0.05) #create realistic eigenvalues
dat <- makeDat(n = 100, comdis = comdis, variances = variances)
X <- dat$X

#generate all possible common and distinctive structures
allstructures <- allCommonDistinctive(vars = c(15, 15), ncomp = 3, allPermutations = TRUE, filterZeroSegments = TRUE)

#Use cross-validation to look for the data generating structure 
index <- rep(NA, length(allstructures))
for (i in 1:length(allstructures)) {
    print(i)
    index[i] <- CVforPCAwithSparseWeights(X = X, nrFolds = 10, FUN = scads, ncomp, ridge = 0, lasso = rep(0.01, ncomp),
                constraints = allstructures[[i]], Wstart = matrix(0, J, ncomp),
                itr = 100000, nStarts = 1, printLoss = FALSE, tol = 10^-5)$MSPE
}

#Do the analysis with the "winning" structure
results <- scads(X = X, ncomp = ncomp, ridge = 0.1, lasso = rep(0.1, ncomp),
                constraints = allstructures[[which.min(index)]], Wstart = matrix(0, J, ncomp),
                itr = 100000, nStarts = 1, printLoss = TRUE , tol = 10^-5)

head(results$W) #inspect results of the estimation
head(dat$P[, 1:ncomp]) #inspect data generating model

trbKnl/sparseWeightBasedPCA documentation built on July 22, 2020, 10:29 p.m.