VIPER: The main function that performs imputation for single cell...

Description Usage Arguments Value

Description

VIPER applies a weighted penalized regression model to actively select a sparse set of local neighborhood cells that are most predictive of the cell of interest. The selection of this sparse set of cells is done in a progressive manner. First, for each given cell, a penalized regression model is applied using a random sampled set of genes to identify a set of candidate cells that are predictive of the expression of the cell of interest. Then we use a nonnegative regression model to further refine the list of neighborhood cells and estimate their imputation weights. In addition, VIPER explicitly accounts for expression measurement uncertainty of the zero values in scRNAseq by modeling the dropout probability in a cell-specific and gene-specific fashion.

Usage

1
2
3
VIPER(gene.expression, num = 5000, percentage.cutoff = 0.1,
  minbool = FALSE, alpha = 0.5, report = FALSE, outdir = NULL,
  prefix = NULL)

Arguments

gene.expression:

A p by n matrix of gene expression levels for p genes from n samples. Our method is not restricted to the units of measurement and is applicable to all normalized measurements such as RPM (reads per million reads), TPM (transcripts per kilobase per millions reads) or RPKM (reads per kilobase per millions reads).

num:

The number of random sampled genes used to fit the penalized regression model to identify the set of candidate cells. The default value is 5000. If gene number p in the dataset is less than specified num, numwill be set as 0.8*p.

percentage.cutoff:

To reduce the influence of missing values in the weight estimation, the nonnegative regression model is fitted using genes with a zero rate less than a certain threshold. The default value is 0.1 (10 percent).

minbool:

The criteria used to select the penalty levellambda in the penalized regression model. VIPER calls cv.glmnet() in glmnet to perform fitting cross validation. Two penalty levels are available for selection: lambda.min, the value of lambda that gives minimum mean cross-validated error, and lambda.1se, the value of lambda that gives the most regularized model such that error is within one standard error of the minimum. The default is lambda.1se, i.e., minbool = FALSE.

alpha:

The elastic net mixing parameter. The default value is 1, which is equivalent to a lasso model.

report:

Whether to save imputed data matrix in CSV files. The default value is FALSE.

outdir:

The directory to save the output.

prefix:

prefix of the result files.

Value

A list of imputed data matrices and summary.

imputed_log

A p by n matrix of log transformed gene expression levels after imputation.

imputed

A p by n matrix of gene expression levels after imputation converted from log transformed values.

sample_weights

A n by n matrix of estimated imputation weights. Each row represents a cell.

outliers

The indexes of cells that have no selected candidate neighbors according to the penalized regression model.

The zero values in these cells are not imputed.


ChenMengjie/VIPER documentation built on June 15, 2019, 2:15 a.m.