Description Usage Arguments Details Value See Also Examples
Impute
performs dropout imputation on normalized data,
based on the choice of imputation methods.
1 2 3 4 5 6 7 8 | Impute(data, sce = NULL, do = 'Ensemble', write = FALSE,
outdir = getwd(), method.choice = NULL, scale = 1, pseudo.count = 1,
labels = NULL, cell.clusters = 2, drop_thre = NULL, type = 'count',
tr.length = ADImpute::transcript_length,
cores = BiocParallel::bpworkers(BPPARAM),
BPPARAM = BiocParallel::SnowParam(type = "SOCK"),
net.coef = ADImpute::network.coefficients, net.implementation = 'iteration',
bulk = NULL, true.zero.thr = NULL, prob.mat = NULL, ...)
|
data |
matrix; raw counts (genes as rows and samples as columns) |
sce |
SingleCellExperiment; normalized counts and associated metadata. |
do |
character; choice of methods to be used for imputation. Currently
supported methods are |
write |
logical; write intermediary and imputed objects to files? |
outdir |
character; path to directory where output files are written. Defaults to working directory |
method.choice |
character; best performing method in training data for each gene |
scale |
integer; scaling factor to divide all expression levels by (defaults to 1) |
pseudo.count |
integer; pseudo-count to be added to expression levels to avoid log(0) (defaults to 1) |
labels |
character; vector specifying the cell type of each column of
|
cell.clusters |
integer; number of cell subpopulations |
drop_thre |
numeric; between 0 and 1 specifying the threshold to determine dropout values |
type |
A character specifying the type of values in the expression matrix. Can be 'count' or 'TPM' |
tr.length |
matrix with at least 2 columns: 'hgnc_symbol' and 'transcript_length' |
cores |
integer; number of cores used for paralell computation |
BPPARAM |
parallel back-end to be used during parallel computation.
See |
net.coef |
matrix; network coefficients. Please provide if you don't
want to use ADImpute's network model. Must contain one first column 'O'
acconting for the intercept of the model and otherwise be an adjacency matrix
with hgnc_symbols in rows and columns. Doesn't have to be squared. See
|
net.implementation |
character; either 'iteration', for an iterative solution, or 'pseudoinv', to use Moore-Penrose pseudo-inversion as a solution. 'pseudoinv' is not advised for big data. |
bulk |
vector of reference bulk RNA-seq, if available (average across samples) |
true.zero.thr |
if set to NULL (default), no true zero estimation is
performed. Set to numeric value between 0 and 1 for estimation. Value
corresponds to the threshold used to determine true zeros: if the probability
of dropout is lower than |
prob.mat |
matrix of the same size as data, filled with the dropout probabilities for each gene in each cell |
... |
additional parameters to pass to network-based imputation |
Values that are 0 in data
are imputed according to the
best-performing methods indicated in method.choice
. Currently
supported methods are:
Baseline
: imputation with average expression across all
cells in the dataset. See ImputeBaseline
.
Previously published approaches: DrImpute
and SAVER
.
Network
: leverages information from a gene regulatory
network to predicted expression of genes that are not quantified based on
quantified interacting genes, in the same cell. See
ImputeNetwork
.
Ensemble
: is based on results on a training subset of the
data at hand, indicating which method best predicts the expression of
each gene. These results are supplied via method.choice
. Applies
the imputation results of the best performing method to the zero entries
of each gene.
If 'Ensemble'
is included in do
, method.choice
has to
be provided (use output from EvaluateMethods()
).
Impute
can create a directory imputation
containing the
imputation results of all methods in do
.
If true.zero.thr
is set, dropout probabilities are computed using
scImpute's framework. Expression values with dropout probabilities below
true.zero.thr
will be set back to 0 if imputed, as they likely
correspond to true biological zeros (genes not expressed in cell) rather than
technical dropouts (genes expressed but not captured).
If sce
is set, imputed values by the different methods are added as
new assays to sce
. Each assay corresponds to one imputation method. If
true.zero.thr
is set, only the values after filtering for biological
zeros will be added. This is different from the output if sce
is not
set, where the original values before filtering and the dropout probability
matrix are returned.
if sce
is not set: returns a list of imputation results
(normalized, log-transformed) for all selected methods in do
. If
true.zero.thr
is defined, returns a list of 3 elements: 1) a list,
imputations
, containing the direct imputation results from each
method; 2) a list, zerofiltered
, containing the results of
imputation in imputations
after setting biological zeros back to
zero; 3) a matrix, dropoutprobabilities
, containing the dropout
probability matrix used to set biological zeros.
if sce
is set: returns a SingleCellExperiment with new
assays, each corresponding to one of the imputation methods applied. If
true.zero.thr
is defined, the assays will contain the results
after imputation and setting biological zeros back to zero.
EvaluateMethods
,
ImputeBaseline
,
ImputeDrImpute
,
ImputeNetwork
,
ImputeSAVER
1 2 3 4 5 6 7 | # Normalize demo data
norm_data <- NormalizeRPM(demo_data)
# Impute with particular method(s)
imputed_data <- Impute(do = 'Network', data = norm_data[,1:10],
net.coef = ADImpute::demo_net)
imputed_data <- Impute(do = 'Network', data = norm_data[,1:10],
net.implementation = 'pseudoinv', net.coef = ADImpute::demo_net)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.