sparseWeightBasedPCA: Sparse weight based PCA

Description Usage Arguments Value References Examples

This function performs sparse SCA/PCA with constraints on the component weights and/or ridge and lasso regularization.

scads(
  X,
  ncomp,
  ridge,
  lasso,
  constraints,
  itr,
  Wstart,
  tol = 1e-07,
  nStarts = 1L,
  printLoss = TRUE
)

`X`	A data matrix of class `matrix`
`ncomp`	The number of components to estimate (an integer)
`ridge`	A numeric value containing the ridge parameter for ridge regularization on the component weight matrix W
`lasso`	A vector containing a ridge parameter for each column of W separately, to set the same lasso penalty for the component weights W, specify: lasso = `rep(value, ncomp)`
`constraints`	A matrix of the same dimensions as the component weights matrix W (`ncol(X)` x `ncomp`). A zero entry corresponds in constraints corresponds to an element in the same location in W that needs to be constraint to zero. A non-zero entry corresponds to an element in the same location in W that needs to be estimated.
`itr`	The maximum number of iterations (an integer)
`Wstart`	A matrix of `ncomp` columns and `nrow(X)` rows with starting values for the component weight matrix W, if `Wstart` only contains zeros, a warm start is used: the first `ncomp` right singular vectors of X
`tol`	The convergence is determined by comparing the loss function value after each iteration, if the difference is smaller than tol, the analysis is converged. The default value is `10e-8`.
`nStarts`	The number of random starts the analysis should perform. The first start will be performed with the values given by `Wstart`. The consecutive starts will be `Wstart` plus a matrix with random uniform values times the current start number (the first start has index zero).
`printLoss`	A boolean: `TRUE` will print the lossfunction value each 10th iteration.

A list containing:
W A matrix containing the component weights
P A matrix containing the loadings
loss A numeric variable containing the minimum loss function value of all the nStarts starts
converged A boolean containing TRUE if converged FALSE if not converged.

De Schipper, N. C., & Van Deun, K. (2018). Revealing the Joint Mechanisms in Traditional Data Linked With Big Data. Zeitschrift Für Psychologie, 226(4), 212–231. doi:10.1027/2151-2604/a000341

J <- 30
X <- matrix(rnorm(100*J), 100, J)
ncomp <- 3 
constraints <- matrix(1, J, ncomp) # No constraints 

scads(X, ncomp = ncomp, ridge = 10e-8, lasso = rep(1, ncomp), 
        constraints = constraints, Wstart = matrix(0, J, ncomp), itr = 10e5)
        
# Extended examples:
# Example 1: Perform PCA with elistastic net regularization no constraints 
#create sample dataset
ncomp <- 3 
J <- 30
comdis <- matrix(1, J, ncomp)
comdis <- sparsify(comdis, 0.7) #set 70% of the 1's to zero
variances <- makeVariance(varianceOfComps = c(100, 80, 70), J = J, error = 0.05) #create realistic eigenvalues
dat <- makeDat(n = 100, comdis = comdis, variances = variances)
X <- dat$X

results <- scads(X = X, ncomp = ncomp, ridge = 0.1, lasso = rep(0.1, ncomp),
                constraints = matrix(1, J, ncomp), Wstart = matrix(0, J, ncomp),
                itr = 100000, nStarts = 1, printLoss = TRUE , tol = 10^-8)

head(results$W) #inspect results of the estimation
head(dat$P[, 1:ncomp]) #inspect data generating model


# Example 2: Perform SCA with lasso regularization try out all common dinstinctive structures
# create sample data, with common and distinctive structure
ncomp <- 3 
J <- 30
comdis <- matrix(1, J, ncomp)
comdis[1:15, 1] <- 0 
comdis[15:30, 2] <- 0 

comdis <- sparsify(comdis, 0.2) #set 20 percent of the 1's to zero
variances <- makeVariance(varianceOfComps = c(100, 80, 90), J = J, error = 0.05) #create realistic eigenvalues
dat <- makeDat(n = 100, comdis = comdis, variances = variances)
X <- dat$X

#generate all possible common and distinctive structures
allstructures <- allCommonDistinctive(vars = c(15, 15), ncomp = 3, allPermutations = TRUE, filterZeroSegments = TRUE)

#Use cross-validation to look for the data generating structure 
index <- rep(NA, length(allstructures))
for (i in 1:length(allstructures)) {
    print(i)
    index[i] <- CVforPCAwithSparseWeights(X = X, nrFolds = 10, FUN = scads, ncomp, ridge = 0, lasso = rep(0.01, ncomp),
                constraints = allstructures[[i]], Wstart = matrix(0, J, ncomp),
                itr = 100000, nStarts = 1, printLoss = FALSE, tol = 10^-5)$MSPE
}

#Do the analysis with the "winning" structure
results <- scads(X = X, ncomp = ncomp, ridge = 0.1, lasso = rep(0.1, ncomp),
                constraints = allstructures[[which.min(index)]], Wstart = matrix(0, J, ncomp),
                itr = 100000, nStarts = 1, printLoss = TRUE , tol = 10^-5)

head(results$W) #inspect results of the estimation
head(dat$P[, 1:ncomp]) #inspect data generating model