vamf: Varying-Censoring Aware Matrix Factorization.
In willtownes/vamf: Varying-Censoring Aware Matrix Factorization

Description Usage Arguments Value Examples

VAMF is a probabilistic dimension reduction method intended for single cell RNA-Seq datasets.

1 2	vamf(Y, L, nrestarts = 4, log2trans = TRUE, pseudocount = 0, output_samples = 100, save_restarts = FALSE, svmult = 1)

`Y`	Sparse Matrix of gene expression measurements, with G genetic features (genes) in the rows and N samples (typically, individual cells) in the columns.
`L`	Upper bound on the dimensionality of the latent space to be learned. Automatic relevance determination is used to shrink away unnecessary dimensions.
`nrestarts`	Number of independent random initializations of the algorithm. Can be parallelized by setting e.g. `options(mc.cores=4)`.
`log2trans`	Should the data be log transformed prior to analysis? Set to FALSE if the data have already been log transformed.
`pseudocount`	Optional small offset to be added to data before log transformation.
`output_samples`	Number of samples from approximate posterior used to estimate the posterior means of all parameters.
`save_restarts`	If multiple initializations are used, set this to TRUE if you want to return the list of all results. Set to FALSE to choose only the best result based on the highest evidence lower bound (ELBO).
`svmult`	Scalar or vector of multipliers to increase or decrease the sigma_v scale hyperparameter.

Named list of posterior means for model parameters. The 'factors' and 'loadings' are analogous to PCA. Cell positions in latent space can be plotted by using the 'factors' matrix. If save_restarts is set to TRUE, returns a list of lists, each from a separate VAMF run.

factors: NxL matrix whose columns are analogous to principle components. The L2 norm of each column indicates the significance level of the component. The transposed orientation is to facilitate plotting.
loadings: LxG matrix whose rows are analogous to principle component loadings. The rows are orthonormal.
effdim: Effective dimensionality of the latent space. Computed by L2 norms of the 'factors' matrix
elbo: Evidence lower bound, the objective function for variational inference. See Stan user manual
b0: Censoring mechanism random intercepts for each cell (vector of length N)
b1: Censoring mechanism random slopes for each cell (vector of length N)
U: Raw version of the factors matrix (without rotations and scaling) dimension LxN. Note the factors matrix has a transposed orientation relative to U
V: Raw version of loadings matrix (without rotations and scaling) dimension LxG
w: Vector of length G with row-specific random intercepts
y0: Global intercept (scalar)
sy: Standard deviation of global noise (scalar)
sv: Standard deviations of each latent dimension (vector of length L), interpretable only with U and V, not interpretable with 'factors' and 'loadings'
svmult: Same as the input parameter(s)

set.seed(100); N<-20; G<-60; Gsignal<-20; Gnoise<-G-Gsignal
theta<-seq(from=0,to=2*pi,length.out=N)
true_cell_positions<-data.frame(dim1=5*cos(theta),dim2=5*sin(theta))
with(true_cell_positions,plot(dim1,dim2))
informative_rows<-as.matrix(true_cell_positions)%*%matrix(2*rnorm(Gsignal*2),nrow=2)
noise_rows<-matrix(.5*rnorm(Gnoise*N),nrow=Gnoise)
Y<-rbind(t(informative_rows),noise_rows)+rnorm(G)+10
Z<-matrix(rbinom(G*N,1,.8),nrow=G)
Y<-Y*Z
pca_factors<-prcomp(t(Y),center=TRUE,scale=TRUE)$x
plot(pca_factors[,1:2])
vamf_factors<-vamf(Y,5,nrestarts=2,log2trans=FALSE)$factors
with(vamf_factors,plot(dim1,dim2))