Description Usage Arguments Details Value References See Also Examples
Implements a Bayesian PCA missing value estimator, as in pcaMethods.
Use of Rcpp makes this version faster and
the emphasised output is the covariance matrix Sigma
, which
can be used for network reconstruction.
1 2 3 4 5 6 7 8 |
myMat |
|
nPcs |
|
threshold |
|
maxIterations |
|
loglike |
|
verbose |
|
Details about the probabilistic model underlying BPCA are found in Oba et. al 2003. The algorithm uses an expectation maximation approach together with a Bayesian model to approximate the principal axes (eigenvectors of the covariance matrix in PCA). The estimation is done iteratively, the algorithm terminates if either the maximum number of iterations is reached or if the estimated increase in precision falls below 1e^-4.
A list
of 4 elements:
matrix
– the estimated loadings.
numeric
– the estimated isotropic variance.
matrix
– the estimated covariance matrix.
class
–
see pcaRes
.
Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K.I. and Ishii, S., 2003. doi.
Stacklies, W., Redestig, H., Scholz, M., Walther, D. and Selbig, J., 2007. doi.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | # simulate a dataset from a zero mean factor model X = Wz + epsilon
# start off by generating a random binary connectivity matrix
n.factors <- 5
n.genes <- 200
# with dense connectivity
# set.seed(20)
conn.mat <- matrix(rbinom(n = n.genes*n.factors,
size = 1, prob = 0.7), c(n.genes, n.factors))
# now generate a loadings matrix from this connectivity
loading.gen <- function(x){
ifelse(x==0, 0, rnorm(1, 0, 1))
}
W <- apply(conn.mat, c(1, 2), loading.gen)
# generate factor matrix
n.samples <- 100
z <- replicate(n.samples, rnorm(n.factors, 0, 1))
# generate a noise matrix
sigma.sq <- 0.1
epsilon <- replicate(n.samples, rnorm(n.genes, 0, sqrt(sigma.sq)))
# by the ppca equations this gives us the data matrix
X <- W%*%z + epsilon
WWt <- tcrossprod(W)
Sigma <- WWt + diag(sigma.sq, n.genes)
# select 10% of entries to make missing values
missFrac <- 0.1
inds <- sample(x = 1:length(X),
size = ceiling(length(X)*missFrac),
replace = FALSE)
# replace them with NAs in the dataset
missing.dataset <- X
missing.dataset[inds] <- NA
# run bpca
bp <- bpcapM(t(missing.dataset), nPcs = 5)
names(bp)
# sigmasq estimation
abs(bp$sigmaSq-sigma.sq)
# X reconstruction
recon.X <- bp$pcaMethodsRes@loadings%*%t(bp$pcaMethodsRes@scores)
norm(recon.X-X, type="F")^2/(length(X))
# covariance estimation
norm(bp$Sigma-Sigma, type="F")^2/(length(X))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.