bpcapM: Bayesian PCA ('pcaMethods' version)

Description Usage Arguments Details Value References See Also Examples

View source: R/bpcapM.R

Description

Implements a Bayesian PCA missing value estimator, as in pcaMethods. Use of Rcpp makes this version faster and the emphasised output is the covariance matrix Sigma, which can be used for network reconstruction.

Usage

1
2
3
4
5
6
7
8
bpcapM(
  myMat,
  nPcs = NA,
  threshold = 1e-04,
  maxIterations = 100,
  loglike = TRUE,
  verbose = TRUE
)

Arguments

myMat

matrix – Pre-processed matrix (centered, scaled) with variables in columns and observations in rows. The data may contain missing values, denoted as NA.

nPcs

numeric – Number of components used for re-estimation. Choosing few components may decrease the estimation precision.

threshold

numeric – Convergence threshold. If the increase in precision of an update falls below this then the algorithm is stopped.

maxIterations

numeric – Maximum number of estimation steps.

loglike

logical – should the log-likelihood of the estimated parameters be returned? See Details.

verbose

logical – verbose intermediary algorithm output.

Details

Details about the probabilistic model underlying BPCA are found in Oba et. al 2003. The algorithm uses an expectation maximation approach together with a Bayesian model to approximate the principal axes (eigenvectors of the covariance matrix in PCA). The estimation is done iteratively, the algorithm terminates if either the maximum number of iterations is reached or if the estimated increase in precision falls below 1e^-4.

Value

A list of 4 elements:

W

matrix – the estimated loadings.

sigmaSq

numeric – the estimated isotropic variance.

Sigma

matrix – the estimated covariance matrix.

pcaMethodsRes

class – see pcaRes.

References

Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K.I. and Ishii, S., 2003. doi.

Stacklies, W., Redestig, H., Scholz, M., Walther, D. and Selbig, J., 2007. doi.

See Also

pcapM

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# simulate a dataset from a zero mean factor model X = Wz + epsilon
# start off by generating a random binary connectivity matrix
n.factors <- 5
n.genes <- 200
# with dense connectivity
# set.seed(20)
conn.mat <- matrix(rbinom(n = n.genes*n.factors,
                          size = 1, prob = 0.7), c(n.genes, n.factors))

# now generate a loadings matrix from this connectivity
loading.gen <- function(x){
  ifelse(x==0, 0, rnorm(1, 0, 1))
}

W <- apply(conn.mat, c(1, 2), loading.gen)

# generate factor matrix
n.samples <- 100
z <- replicate(n.samples, rnorm(n.factors, 0, 1))

# generate a noise matrix
sigma.sq <- 0.1
epsilon <- replicate(n.samples, rnorm(n.genes, 0, sqrt(sigma.sq)))

# by the ppca equations this gives us the data matrix
X <- W%*%z + epsilon
WWt <- tcrossprod(W)
Sigma <- WWt + diag(sigma.sq, n.genes)

# select 10% of entries to make missing values
missFrac <- 0.1
inds <- sample(x = 1:length(X),
               size = ceiling(length(X)*missFrac),
               replace = FALSE)

# replace them with NAs in the dataset
missing.dataset <- X
missing.dataset[inds] <- NA

# run bpca
bp <- bpcapM(t(missing.dataset), nPcs = 5)
names(bp)

# sigmasq estimation
abs(bp$sigmaSq-sigma.sq)

# X reconstruction
recon.X <- bp$pcaMethodsRes@loadings%*%t(bp$pcaMethodsRes@scores)
norm(recon.X-X, type="F")^2/(length(X))

# covariance estimation
norm(bp$Sigma-Sigma, type="F")^2/(length(X))

HGray384/pcaNet documentation built on Nov. 14, 2020, 11:11 a.m.