Description Usage Arguments Details Value Handling batches Author(s) See Also Examples
Convenience function to determine which values in a numeric vector are outliers based on the median absolute deviation (MAD).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
metric |
Numeric vector of values. |
nmads |
A numeric scalar, specifying the minimum number of MADs away from median required for a value to be called an outlier. |
type |
String indicating whether outliers should be looked for at both tails ( |
log |
Logical scalar, should the values of the metric be transformed to the log2 scale before computing MADs? |
subset |
Logical or integer vector, which subset of values should be used to calculate the median/MAD?
If |
batch |
Factor of length equal to |
share.medians |
Logical scalar indicating whether the median calculation should be shared across batches.
Only used if |
share.mads |
Logical scalar indicating whether the MAD calculation should be shared across batches.
Only used if |
share.missing |
Logical scalar indicating whether a common MAD/median should be used
for any batch that has no values left after subsetting.
Only relevant when both |
min.diff |
A numeric scalar indicating the minimum difference from the median to consider as an outlier.
Ignored if |
share_medians, share_mads, share_missing, min_diff |
Soft-deprecated equivalents of the arguments above. |
Lower and upper thresholds are stored in the "threshold"
attribute of the returned vector.
By default, this is a numeric vector of length 2 for the threshold on each side.
If type="lower"
, the higher limit is Inf
, while if type="higher"
, the lower limit is -Inf
.
If min.diff
is not NA
, the minimum distance from the median required to define an outlier is set as the larger of nmads
MADs and min.diff
.
This aims to avoid calling many outliers when the MAD is very small, e.g., due to discreteness of the metric.
If log=TRUE
, this difference is defined on the log2 scale.
If subset
is specified, the median and MAD are computed from a subset of cells and the values are used to define the outlier threshold that is applied to all cells.
In a quality control context, this can be handy for excluding groups of cells that are known to be low quality (e.g., failed plates) so that they do not distort the outlier definitions for the rest of the dataset.
Missing values trigger a warning and are automatically ignored during estimation of the median and MAD.
The corresponding entries of the output vector are also set to NA
values.
A logical vector of the same length as the metric
argument, specifying the observations that are considered as outliers.
If batch
is specified, outliers are defined within each batch separately using batch-specific median and MAD values.
This gives the same results as if the input metrics were subsetted by batch and isOutlier
was run on each subset,
and is often useful when batches are known a priori to have technical differences (e.g., in sequencing depth).
If share.medians=TRUE
, a shared median is computed across all cells.
If share.mads=TRUE
, a shared MAD is computed using all cells
(based on either a batch-specific or shared median, depending on share.medians
).
These settings are useful to enforce a common location or spread across batches, e.g., we might set share.mads=TRUE
for log-library sizes if coverage varies across batches but the variance across cells is expected to be consistent across batches.
If a batch does not have sufficient cells to compute the median or MAD (e.g., after applying subset
),
the default setting of share.missing=TRUE
will set these values to the shared median and MAD.
This allows us to define thresholds for low-quality batches based on information in the rest of the dataset.
(Note that the use of shared values only affects this batch and not others unless share.medians
and share.mads
are also set.)
Otherwise, if share.missing=FALSE
, all cells in that batch will have NA
in the output.
If batch
is specified, the "threshold"
attribute in the returned vector is a matrix with one named column per level of batch
and two rows (one per threshold).
Aaron Lun
quickPerCellQC
, a convenience wrapper to perform outlier-based quality control.
perCellQCMetrics
, to compute potential QC metrics.
1 2 3 4 5 6 7 8 9 10 11 | example_sce <- mockSCE()
stats <- perCellQCMetrics(example_sce)
str(isOutlier(stats$sum))
str(isOutlier(stats$sum, type="lower"))
str(isOutlier(stats$sum, type="higher"))
str(isOutlier(stats$sum, log=TRUE))
b <- sample(LETTERS[1:3], ncol(example_sce), replace=TRUE)
str(isOutlier(stats$sum, log=TRUE, batch=b))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.