Description Usage Arguments Details Value Handling batches Author(s) See Also Examples

Convenience function to determine which values in a numeric vector are outliers based on the median absolute deviation (MAD).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |

`metric` |
Numeric vector of values. |

`nmads` |
A numeric scalar, specifying the minimum number of MADs away from median required for a value to be called an outlier. |

`type` |
String indicating whether outliers should be looked for at both tails ( |

`log` |
Logical scalar, should the values of the metric be transformed to the log2 scale before computing MADs? |

`subset` |
Logical or integer vector, which subset of values should be used to calculate the median/MAD?
If |

`batch` |
Factor of length equal to |

`share.medians` |
Logical scalar indicating whether the median calculation should be shared across batches.
Only used if |

`share.mads` |
Logical scalar indicating whether the MAD calculation should be shared across batches.
Only used if |

`share.missing` |
Logical scalar indicating whether a common MAD/median should be used
for any batch that has no values left after subsetting.
Only relevant when both |

`min.diff` |
A numeric scalar indicating the minimum difference from the median to consider as an outlier.
Ignored if |

`share_medians, share_mads, share_missing, min_diff` |
Soft-deprecated equivalents of the arguments above. |

Lower and upper thresholds are stored in the `"threshold"`

attribute of the returned vector.
By default, this is a numeric vector of length 2 for the threshold on each side.
If `type="lower"`

, the higher limit is `Inf`

, while if `type="higher"`

, the lower limit is `-Inf`

.

If `min.diff`

is not `NA`

, the minimum distance from the median required to define an outlier is set as the larger of `nmads`

MADs and `min.diff`

.
This aims to avoid calling many outliers when the MAD is very small, e.g., due to discreteness of the metric.
If `log=TRUE`

, this difference is defined on the log2 scale.

If `subset`

is specified, the median and MAD are computed from a subset of cells and the values are used to define the outlier threshold that is applied to all cells.
In a quality control context, this can be handy for excluding groups of cells that are known to be low quality (e.g., failed plates) so that they do not distort the outlier definitions for the rest of the dataset.

Missing values trigger a warning and are automatically ignored during estimation of the median and MAD.
The corresponding entries of the output vector are also set to `NA`

values.

A logical vector of the same length as the `metric`

argument, specifying the observations that are considered as outliers.

If `batch`

is specified, outliers are defined within each batch separately using batch-specific median and MAD values.
This gives the same results as if the input metrics were subsetted by batch and `isOutlier`

was run on each subset,
and is often useful when batches are known *a priori* to have technical differences (e.g., in sequencing depth).

If `share.medians=TRUE`

, a shared median is computed across all cells.
If `share.mads=TRUE`

, a shared MAD is computed using all cells
(based on either a batch-specific or shared median, depending on `share.medians`

).
These settings are useful to enforce a common location or spread across batches, e.g., we might set `share.mads=TRUE`

for log-library sizes if coverage varies across batches but the variance across cells is expected to be consistent across batches.

If a batch does not have sufficient cells to compute the median or MAD (e.g., after applying `subset`

),
the default setting of `share.missing=TRUE`

will set these values to the shared median and MAD.
This allows us to define thresholds for low-quality batches based on information in the rest of the dataset.
(Note that the use of shared values only affects this batch and not others unless `share.medians`

and `share.mads`

are also set.)
Otherwise, if `share.missing=FALSE`

, all cells in that batch will have `NA`

in the output.

If `batch`

is specified, the `"threshold"`

attribute in the returned vector is a matrix with one named column per level of `batch`

and two rows (one per threshold).

Aaron Lun

`quickPerCellQC`

, a convenience wrapper to perform outlier-based quality control.

`perCellQCMetrics`

, to compute potential QC metrics.

1 2 3 4 5 6 7 8 9 10 11 | ```
example_sce <- mockSCE()
stats <- perCellQCMetrics(example_sce)
str(isOutlier(stats$sum))
str(isOutlier(stats$sum, type="lower"))
str(isOutlier(stats$sum, type="higher"))
str(isOutlier(stats$sum, log=TRUE))
b <- sample(LETTERS[1:3], ncol(example_sce), replace=TRUE)
str(isOutlier(stats$sum, log=TRUE, batch=b))
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.