Description Usage Arguments Details Value See Also Examples
EvaluateMethods
returns the best-performing imputation
method for each gene in the dataset
1 2 3 4 5 6 7 8 | EvaluateMethods(data, sce = NULL, do = c('Baseline', 'DrImpute',
'Network'), write = FALSE, train.ratio = .7, train.only = TRUE,
mask.ratio = .1, outdir = getwd(), scale = 1, pseudo.count = 1,
labels = NULL, cell.clusters = 2, drop_thre = NULL, type = 'count',
cores = BiocParallel::bpworkers(BPPARAM),
BPPARAM = BiocParallel::SnowParam(type = "SOCK"),
net.coef = ADImpute::network.coefficients, net.implementation = 'iteration',
tr.length = ADImpute::transcript_length, bulk = NULL, ...)
|
data |
matrix; normalized counts, not logged (genes as rows and samples as columns) |
sce |
SingleCellExperiment; normalized counts and associated metadata. |
do |
character; choice of methods to be used for imputation. Currently
supported methods are |
write |
logical; write intermediary and imputed objects to files? |
train.ratio |
numeric; ratio of samples to be used for training |
train.only |
logical; if TRUE define only a training dataset, if FALSE writes and returns both training and validation sets (defaults to TRUE) |
mask.ratio |
numeric; ratio of samples to be masked per gene |
outdir |
character; path to directory where output files are written. Defaults to working directory |
scale |
integer; scaling factor to divide all expression levels by (defaults to 1) |
pseudo.count |
integer; pseudo-count to be added to expression levels to avoid log(0) (defaults to 1) |
labels |
character; vector specifying the cell type of each column of
|
cell.clusters |
integer; number of cell subpopulations |
drop_thre |
numeric; between 0 and 1 specifying the threshold to determine dropout values |
type |
A character specifying the type of values in the expression matrix. Can be 'count' or 'TPM' |
cores |
integer; number of cores used for paralell computation |
BPPARAM |
parallel back-end to be used during parallel computation.
See |
net.coef |
matrix; network coefficients. Please provide if you don't
want to use ADImpute's network model. Must contain one first column 'O'
acconting for the intercept of the model and otherwise be an adjacency matrix
with hgnc_symbols in rows and columns. Doesn't have to be squared. See
|
net.implementation |
character; either 'iteration', for an iterative solution, or 'pseudoinv', to use Moore-Penrose pseudo-inversion as a solution. 'pseudoinv' is not advised for big data. |
tr.length |
matrix with at least 2 columns: 'hgnc_symbol' and 'transcript_length' |
bulk |
vector of reference bulk RNA-seq, if available (average across samples) |
... |
additional parameters to pass to network-based imputation |
For each gene, a fraction (mask.ratio
) of the quantified
expression values are set to zero and imputed according to 3 different
methods: scImpute, baseline (average gene expression across all cells) or a
network-based method. The imputation error is computed for each of the
values in the original dataset that was set to 0, for each method. The
method resulting in a lowest imputation error for each gene is chosen.
if sce
is provided: returns a SingleCellExperiment with the
best performing method per gene stored as row-features. Access via
SingleCellExperiment::int_elementMetadata(sce)$ADImpute$methods
.
if sce
is not provided: returns a character with the best
performing method in the training set for each gene
ImputeBaseline
,
ImputeDrImpute
,
ImputeNetwork
1 2 3 4 | # Normalize demo data
norm_data <- NormalizeRPM(ADImpute::demo_data)
method_choice <- EvaluateMethods(norm_data, do = c('Baseline','DrImpute'),
cores = 2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.