vamf: Varying-Censoring Aware Matrix Factorization.

Description Usage Arguments Value Examples

Description

VAMF is a probabilistic dimension reduction method intended for single cell RNA-Seq datasets.

Usage

1
2
vamf(Y, L, nrestarts = 4, log2trans = TRUE, pseudocount = 0,
  output_samples = 100, save_restarts = FALSE, svmult = 1)

Arguments

Y

Sparse Matrix of gene expression measurements, with G genetic features (genes) in the rows and N samples (typically, individual cells) in the columns.

L

Upper bound on the dimensionality of the latent space to be learned. Automatic relevance determination is used to shrink away unnecessary dimensions.

nrestarts

Number of independent random initializations of the algorithm. Can be parallelized by setting e.g. options(mc.cores=4).

log2trans

Should the data be log transformed prior to analysis? Set to FALSE if the data have already been log transformed.

pseudocount

Optional small offset to be added to data before log transformation.

output_samples

Number of samples from approximate posterior used to estimate the posterior means of all parameters.

save_restarts

If multiple initializations are used, set this to TRUE if you want to return the list of all results. Set to FALSE to choose only the best result based on the highest evidence lower bound (ELBO).

svmult

Scalar or vector of multipliers to increase or decrease the sigma_v scale hyperparameter.

Value

Named list of posterior means for model parameters. The 'factors' and 'loadings' are analogous to PCA. Cell positions in latent space can be plotted by using the 'factors' matrix. If save_restarts is set to TRUE, returns a list of lists, each from a separate VAMF run.

factors

NxL matrix whose columns are analogous to principle components. The L2 norm of each column indicates the significance level of the component. The transposed orientation is to facilitate plotting.

loadings

LxG matrix whose rows are analogous to principle component loadings. The rows are orthonormal.

effdim

Effective dimensionality of the latent space. Computed by L2 norms of the 'factors' matrix

elbo

Evidence lower bound, the objective function for variational inference. See Stan user manual

b0

Censoring mechanism random intercepts for each cell (vector of length N)

b1

Censoring mechanism random slopes for each cell (vector of length N)

U

Raw version of the factors matrix (without rotations and scaling) dimension LxN. Note the factors matrix has a transposed orientation relative to U

V

Raw version of loadings matrix (without rotations and scaling) dimension LxG

w

Vector of length G with row-specific random intercepts

y0

Global intercept (scalar)

sy

Standard deviation of global noise (scalar)

sv

Standard deviations of each latent dimension (vector of length L), interpretable only with U and V, not interpretable with 'factors' and 'loadings'

svmult

Same as the input parameter(s)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
set.seed(100); N<-20; G<-60; Gsignal<-20; Gnoise<-G-Gsignal
theta<-seq(from=0,to=2*pi,length.out=N)
true_cell_positions<-data.frame(dim1=5*cos(theta),dim2=5*sin(theta))
with(true_cell_positions,plot(dim1,dim2))
informative_rows<-as.matrix(true_cell_positions)%*%matrix(2*rnorm(Gsignal*2),nrow=2)
noise_rows<-matrix(.5*rnorm(Gnoise*N),nrow=Gnoise)
Y<-rbind(t(informative_rows),noise_rows)+rnorm(G)+10
Z<-matrix(rbinom(G*N,1,.8),nrow=G)
Y<-Y*Z
pca_factors<-prcomp(t(Y),center=TRUE,scale=TRUE)$x
plot(pca_factors[,1:2])
vamf_factors<-vamf(Y,5,nrestarts=2,log2trans=FALSE)$factors
with(vamf_factors,plot(dim1,dim2))

willtownes/vamf documentation built on May 8, 2019, 9:31 a.m.