EvaluateMethods: Imputation method evaluation on training set
In anacarolinaleote/ADImpute: Adaptive Dropout Imputer (ADImpute)

Description Usage Arguments Details Value See Also Examples

EvaluateMethods returns the best-performing imputation method for each gene in the dataset

EvaluateMethods(data, sce = NULL, do = c('Baseline', 'DrImpute',
'Network'), write = FALSE, train.ratio = .7, train.only = TRUE,
mask.ratio = .1, outdir = getwd(), scale = 1, pseudo.count = 1,
labels = NULL, cell.clusters = 2, drop_thre = NULL, type = 'count',
cores = BiocParallel::bpworkers(BPPARAM),
BPPARAM = BiocParallel::SnowParam(type = "SOCK"),
net.coef = ADImpute::network.coefficients, net.implementation = 'iteration',
tr.length = ADImpute::transcript_length, bulk = NULL, ...)

`data`	matrix; normalized counts, not logged (genes as rows and samples as columns)
`sce`	SingleCellExperiment; normalized counts and associated metadata.
`do`	character; choice of methods to be used for imputation. Currently supported methods are `'Baseline'`, `'DrImpute'` and `'Network'`. Not case-sensitive. Can include one or more methods. Non- supported methods will be ignored.
`write`	logical; write intermediary and imputed objects to files?
`train.ratio`	numeric; ratio of samples to be used for training
`train.only`	logical; if TRUE define only a training dataset, if FALSE writes and returns both training and validation sets (defaults to TRUE)
`mask.ratio`	numeric; ratio of samples to be masked per gene
`outdir`	character; path to directory where output files are written. Defaults to working directory
`scale`	integer; scaling factor to divide all expression levels by (defaults to 1)
`pseudo.count`	integer; pseudo-count to be added to expression levels to avoid log(0) (defaults to 1)
`labels`	character; vector specifying the cell type of each column of `data`
`cell.clusters`	integer; number of cell subpopulations
`drop_thre`	numeric; between 0 and 1 specifying the threshold to determine dropout values
`type`	A character specifying the type of values in the expression matrix. Can be 'count' or 'TPM'
`cores`	integer; number of cores used for paralell computation
`BPPARAM`	parallel back-end to be used during parallel computation. See `BiocParallelParam-class`.
`net.coef`	matrix; network coefficients. Please provide if you don't want to use ADImpute's network model. Must contain one first column 'O' acconting for the intercept of the model and otherwise be an adjacency matrix with hgnc_symbols in rows and columns. Doesn't have to be squared. See `ADImpute::demo_net` for a small example.
`net.implementation`	character; either 'iteration', for an iterative solution, or 'pseudoinv', to use Moore-Penrose pseudo-inversion as a solution. 'pseudoinv' is not advised for big data.
`tr.length`	matrix with at least 2 columns: 'hgnc_symbol' and 'transcript_length'
`bulk`	vector of reference bulk RNA-seq, if available (average across samples)
`...`	additional parameters to pass to network-based imputation

For each gene, a fraction (mask.ratio) of the quantified expression values are set to zero and imputed according to 3 different methods: scImpute, baseline (average gene expression across all cells) or a network-based method. The imputation error is computed for each of the values in the original dataset that was set to 0, for each method. The method resulting in a lowest imputation error for each gene is chosen.

if sce is provided: returns a SingleCellExperiment with the best performing method per gene stored as row-features. Access via SingleCellExperiment::int_elementMetadata(sce)$ADImpute$methods.
if sce is not provided: returns a character with the best performing method in the training set for each gene

ImputeBaseline, ImputeDrImpute, ImputeNetwork

# Normalize demo data
norm_data <- NormalizeRPM(ADImpute::demo_data)
method_choice <- EvaluateMethods(norm_data, do = c('Baseline','DrImpute'),
cores = 2)