pca_full: A wrapper for PCAMV (MATLAB) function implementations

Description Usage Arguments Details Value References Examples

View source: R/pca_full.R

Description

Implements the PPCA algorithms from See Ilin and Raiko (2010), previously only available in MATLAB. One element of the outputs is a pcaRes object, providing an interface between PCAMV and pcaMethods.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
pca_full(
  X,
  ncomp = NA,
  algorithm = "vb",
  maxiters = 1000,
  bias = TRUE,
  rotate2pca = TRUE,
  loglike = TRUE,
  verbose = TRUE
)

Arguments

X

matrix – Data matrix with observations in columns and variables in rows. The data may contain missing values, denoted as NA, or NaN.

ncomp

numeric – Number of components used for re-estimation. Choosing few components may decrease the estimation precision. Setting to NA results in ncomp = min(n, p) -1, which will be slow for large data.

algorithm

c("ppca", "map", "vb") – the algorithm to be used for estimation, see Details.

maxiters

numeric – Maximum number of estimation steps.

bias

logical – should the mean be estimated?

rotate2pca

logical – should the solution be rotated to a PCA basis? See Details.

loglike

logical – should the log-likelihood of the estimated parameters be returned? See Details.

verbose

logical – verbose intermediary algorithm output.

Details

The algorithm argument provides the option of performing either 'ppca' for PPCA, 'vb' for BPCA using a variational approximation, or 'map' for a variational approximation ignoring posterior uncertainty (for faster computation). See Ilin and Raiko (2010) for the full models. Setting rotate2pca will perform a post-estimation rotation of the scores and loadings matrices so that they satisfy the PCA conditions of orthonormality, see See Ilin and Raiko (2010) for the derivations. loglike indicates whether log-likelihood values for the resulting estimates should be computed. This can be useful to compare different algorithms.

Value

A list of 6 or 8 elements, depending on the value of loglike:

W

matrix – the estimated loadings.

sigmaSq

numeric – the estimated isotropic variance.

Sigma

matrix – the estimated covariance matrix.

m

numeric – the estimated mean vector.

logLikeObs

numeric – the log-likelihood value of the observed data given the estimated parameters.

logLikeImp

numeric – the log-likelihood value of the imputed data given the estimated parameters.

m

numeric – the number of iterations taken to converge.

pcaMethodsRes

class – see pcaRes.

References

Ilin, A. and Raiko, T., 2010. link

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# simulate a dataset from a zero mean factor model X = Wz + epsilon
# start off by generating a random binary connectivity matrix
n.factors <- 5
n.genes <- 200
# with dense connectivity
# set.seed(20)
conn.mat <- matrix(rbinom(n = n.genes*n.factors,
                          size = 1, prob = 0.7), c(n.genes, n.factors))

# now generate a loadings matrix from this connectivity
loading.gen <- function(x){
  ifelse(x==0, 0, rnorm(1, 0, 1))
}

W <- apply(conn.mat, c(1, 2), loading.gen)

# generate factor matrix
n.samples <- 100
z <- replicate(n.samples, rnorm(n.factors, 0, 1))

# generate a noise matrix
sigma.sq <- 0.1
epsilon <- replicate(n.samples, rnorm(n.genes, 0, sqrt(sigma.sq)))

# by the ppca equations this gives us the data matrix
X <- W%*%z + epsilon
WWt <- tcrossprod(W)
Sigma <- WWt + diag(sigma.sq, n.genes)

# select 10% of entries to make missing values
missFrac <- 0.1
inds <- sample(x = 1:length(X),
               size = ceiling(length(X)*missFrac),
               replace = FALSE)

# replace them with NAs in the dataset
missing.dataset <- X
missing.dataset[inds] <- NA

# run ppca
ppf <- pca_full(missing.dataset, ncomp=5, algorithm="vb", maxiters=5,
bias=TRUE, rotate2pca=FALSE, loglike=TRUE, verbose=TRUE)

HGray384/pcaNet documentation built on Nov. 14, 2020, 11:11 a.m.