EvaluateMethods: Imputation method evaluation on training set

Description Usage Arguments Details Value See Also Examples

View source: R/Wrap.R

Description

EvaluateMethods returns the best-performing imputation method for each gene in the dataset

Usage

1
2
3
4
5
6
7
8
EvaluateMethods(data, sce = NULL, do = c('Baseline', 'DrImpute',
'Network'), write = FALSE, train.ratio = .7, train.only = TRUE,
mask.ratio = .1, outdir = getwd(), scale = 1, pseudo.count = 1,
labels = NULL, cell.clusters = 2, drop_thre = NULL, type = 'count',
cores = BiocParallel::bpworkers(BPPARAM),
BPPARAM = BiocParallel::SnowParam(type = "SOCK"),
net.coef = ADImpute::network.coefficients, net.implementation = 'iteration',
tr.length = ADImpute::transcript_length, bulk = NULL, ...)

Arguments

data

matrix; normalized counts, not logged (genes as rows and samples as columns)

sce

SingleCellExperiment; normalized counts and associated metadata.

do

character; choice of methods to be used for imputation. Currently supported methods are 'Baseline', 'DrImpute' and 'Network'. Not case-sensitive. Can include one or more methods. Non- supported methods will be ignored.

write

logical; write intermediary and imputed objects to files?

train.ratio

numeric; ratio of samples to be used for training

train.only

logical; if TRUE define only a training dataset, if FALSE writes and returns both training and validation sets (defaults to TRUE)

mask.ratio

numeric; ratio of samples to be masked per gene

outdir

character; path to directory where output files are written. Defaults to working directory

scale

integer; scaling factor to divide all expression levels by (defaults to 1)

pseudo.count

integer; pseudo-count to be added to expression levels to avoid log(0) (defaults to 1)

labels

character; vector specifying the cell type of each column of data

cell.clusters

integer; number of cell subpopulations

drop_thre

numeric; between 0 and 1 specifying the threshold to determine dropout values

type

A character specifying the type of values in the expression matrix. Can be 'count' or 'TPM'

cores

integer; number of cores used for paralell computation

BPPARAM

parallel back-end to be used during parallel computation. See BiocParallelParam-class.

net.coef

matrix; network coefficients. Please provide if you don't want to use ADImpute's network model. Must contain one first column 'O' acconting for the intercept of the model and otherwise be an adjacency matrix with hgnc_symbols in rows and columns. Doesn't have to be squared. See ADImpute::demo_net for a small example.

net.implementation

character; either 'iteration', for an iterative solution, or 'pseudoinv', to use Moore-Penrose pseudo-inversion as a solution. 'pseudoinv' is not advised for big data.

tr.length

matrix with at least 2 columns: 'hgnc_symbol' and 'transcript_length'

bulk

vector of reference bulk RNA-seq, if available (average across samples)

...

additional parameters to pass to network-based imputation

Details

For each gene, a fraction (mask.ratio) of the quantified expression values are set to zero and imputed according to 3 different methods: scImpute, baseline (average gene expression across all cells) or a network-based method. The imputation error is computed for each of the values in the original dataset that was set to 0, for each method. The method resulting in a lowest imputation error for each gene is chosen.

Value

See Also

ImputeBaseline, ImputeDrImpute, ImputeNetwork

Examples

1
2
3
4
# Normalize demo data
norm_data <- NormalizeRPM(ADImpute::demo_data)
method_choice <- EvaluateMethods(norm_data, do = c('Baseline','DrImpute'),
cores = 2)

ADImpute documentation built on Nov. 8, 2020, 5:30 p.m.