pcapM: A wrapper for pcaMethods function implementations

Description Usage Arguments Details Value Examples

View source: R/pcapM.R

Description

Implements the equivalent of pca. This function preprocesses the data as specified by the user, then calls ppcapM or bpcapM, and finally handles this output to return a list. One element of the output is a pcaRes object.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
pcapM(
  myMat,
  nPcs = 2,
  method = "ppca",
  seed = NA,
  threshold = 1e-04,
  maxIterations = 1000,
  center = TRUE,
  scale = c("none"),
  loglike = TRUE,
  verbose = TRUE
)

Arguments

myMat

matrix – Data matrix with variables in columns and observations in rows. The data may contain missing values, denoted as NA.

nPcs

numeric – Number of components used for re-estimation. Choosing few components may decrease the estimation precision.

method

c("ppca", "bpca") – frequentist or Bayesian estimation of model parameters.

seed

numeric – the random number seed used, useful to specify when comparing algorithms.

threshold

numeric – Convergence threshold. If the increase in precision of an update falls below this then the algorithm is stopped.

maxIterations

numeric – Maximum number of estimation steps.

center

logical – should the data be centered?

scale

c("none", "pareto", "vector", "uv") – which method of scaling should be used? See pca. Defaults to "none".

loglike

logical – should the log-likelihood of the estimated parameters be returned? See Details.

verbose

logical – verbose intermediary algorithm output.

Details

See ppcapM and bpcapM for the algorithm specifics. loglike indicates whether log-likelihood values for the resulting estimates should be computed. This can be useful to compare different algorithms.

Value

A list of 5 or 7 elements, depending on the value of loglike:

W

matrix – the estimated loadings.

sigmaSq

numeric – the estimated isotropic variance.

Sigma

matrix – the estimated covariance matrix.

m

numeric – the estimated mean vector.

logLikeObs

numeric – the log-likelihood value of the observed data given the estimated parameters.

logLikeImp

numeric – the log-likelihood value of the imputed data given the estimated parameters.

pcaMethodsRes

class – see pcaRes.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# simulate a dataset from a zero mean factor model X = Wz + epsilon
# start off by generating a random binary connectivity matrix
n.factors <- 5
n.genes <- 200
# with dense connectivity
# set.seed(20)
conn.mat <- matrix(rbinom(n = n.genes*n.factors,
                          size = 1, prob = 0.7), c(n.genes, n.factors))

# now generate a loadings matrix from this connectivity
loading.gen <- function(x){
  ifelse(x==0, 0, rnorm(1, 0, 1))
}

W <- apply(conn.mat, c(1, 2), loading.gen)

# generate factor matrix
n.samples <- 100
z <- replicate(n.samples, rnorm(n.factors, 0, 1))

# generate a noise matrix
sigma.sq <- 0.1
epsilon <- replicate(n.samples, rnorm(n.genes, 0, sqrt(sigma.sq)))

# by the ppca equations this gives us the data matrix
X <- W%*%z + epsilon
WWt <- tcrossprod(W)
Sigma <- WWt + diag(sigma.sq, n.genes)

# select 10% of entries to make missing values
missFrac <- 0.1
inds <- sample(x = 1:length(X),
               size = ceiling(length(X)*missFrac),
               replace = FALSE)

# replace them with NAs in the dataset
missing.dataset <- X
missing.dataset[inds] <- NA

# run ppca
ppm <- pcapM(t(missing.dataset), nPcs=5, method="bpca", seed=2009, 
maxIterations=1000, center=TRUE, loglike=TRUE, verbose=TRUE)

HGray384/pcaNet documentation built on Nov. 14, 2020, 11:11 a.m.