Description Usage Arguments Details Value References Examples
Implements a probabilistic PCA missing value estimator, as in pcaMethods.
Use of Rcpp makes this version faster and
the emphasised output is the covariance matrix Sigma
, which
can be used for network reconstruction.
1 2 3 4 5 6 7 8 9 |
myMat |
|
nPcs |
|
seed |
|
threshold |
|
maxIterations |
|
loglike |
|
verbose |
|
Details about the probabilistic model underlying PPCA are found in Bishop 1999. The algorithm (Porta, 2005) uses an expectation maximisation approach together with a probabilistic model to approximate the principal axes (eigenvectors of the covariance matrix in PCA). The estimation is done iteratively, the algorithm terminates if either the maximum number of iterations is reached or if the estimated increase in precision falls below 1e^-4.
A list
of 4 elements:
matrix
– the estimated loadings.
numeric
– the estimated isotropic variance.
matrix
– the estimated covariance matrix.
class
–
see pcaRes
.
Porta, J.M., Verbeek, J.J. and Kroese, B.J., 2005. link
Stacklies, W., Redestig, H., Scholz, M., Walther, D. and Selbig, J., 2007. doi.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | # simulate a dataset from a zero mean factor model X = Wz + epsilon
# start off by generating a random binary connectivity matrix
n.factors <- 5
n.genes <- 200
# with dense connectivity
# set.seed(20)
conn.mat <- matrix(rbinom(n = n.genes*n.factors,
size = 1, prob = 0.7), c(n.genes, n.factors))
# now generate a loadings matrix from this connectivity
loading.gen <- function(x){
ifelse(x==0, 0, rnorm(1, 0, 1))
}
W <- apply(conn.mat, c(1, 2), loading.gen)
# generate factor matrix
n.samples <- 100
z <- replicate(n.samples, rnorm(n.factors, 0, 1))
# generate a noise matrix
sigma.sq <- 0.1
epsilon <- replicate(n.samples, rnorm(n.genes, 0, sqrt(sigma.sq)))
# by the ppca equations this gives us the data matrix
X <- W%*%z + epsilon
WWt <- tcrossprod(W)
Sigma <- WWt + diag(sigma.sq, n.genes)
# select 10% of entries to make missing values
missFrac <- 0.1
inds <- sample(x = 1:length(X),
size = ceiling(length(X)*missFrac),
replace = FALSE)
# replace them with NAs in the dataset
missing.dataset <- X
missing.dataset[inds] <- NA
# run ppca
pp <- ppcapM(t(missing.dataset), nPcs = 5)
names(pp)
# sigmasq estimation
abs(pp$sigmaSq-sigma.sq)
# X reconstruction
recon.X <- pp$pcaMethodsRes@loadings%*%t(pp$pcaMethodsRes@scores)
norm(recon.X-X, type="F")^2/(length(X))
# covariance estimation
norm(pp$Sigma-Sigma, type="F")^2/(length(X))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.