Description Usage Arguments Details Value
View source: R/detect_outliers.R
modern
is a method to detect outliers in high-dimensional data
based on their impact on network reconstruction.
The core idea is that the topology of the network reconstructed from a
matrix of data should be robust to the inclusion or exclusion of each
individual data point in the matrix.
Single points that have a large impact on the global interaction profile
of a node (e.g., a gene, protein, or metabolite) compromise the robustness
of network inference, and are likely to be outliers.
1 2 | detect_outliers(mat, min_pairs = 10, method = c("pearson", "kendall",
"spearman"), bins = NA)
|
mat |
a numeric matrix, with nodes (e.g., analytes such as genes, proteins, or metabolites) in columns, and samples in rows |
min_pairs |
minimum number of paired, non-missing observations
to calculate a correlation coefficient; correlations between vectors with
fewer than this number of paired observations will be replaced with
|
method |
the correlation coefficient to be computed; one of
|
bins |
optionally, the number of bins into which to group nodes on the basis of the number of observations |
The degree to which a single point compromises the robustness of the network inference is quantified using autocorrelation. For each observation of a given node, the correlations between that node and all of its possible neighbors in the network are calculated with and without the inclusion of that observation. This yields two vectors of correlation coefficients. The correlation between these vectors, or autocorrelation, reflects the impact of the observation on the global interaction profile of that node, where a low correlation is indicative of network inference that is strongly dependent on the inclusion or exclusion of that single data point. This situation is reflective of a likely outlier that compromises the robustness of network inference.
The matrix of autocorrelations is subsequently converted to a matrix of
Z scores, such that the matrix has a mean of zero and a standard deviation
of one. If the matrix contains missing values, this scaling is performed for
each group of columns with equivalent numbers of missing values separately.
Optionally, if there are many possible numbers of missing values, the z score
can be calculated for approximately equal sized bins of missing value counts
using the bins
parameter.
a matrix with identical dimensions to the input matrix, containing the autocorrelation Z score assigned to each non-missing observation
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.