detect_outliers: Detect outliers by network reconstruction

Description Usage Arguments Details Value

View source: R/detect_outliers.R

Description

modern is a method to detect outliers in high-dimensional data based on their impact on network reconstruction. The core idea is that the topology of the network reconstructed from a matrix of data should be robust to the inclusion or exclusion of each individual data point in the matrix. Single points that have a large impact on the global interaction profile of a node (e.g., a gene, protein, or metabolite) compromise the robustness of network inference, and are likely to be outliers.

Usage

1
2
detect_outliers(mat, min_pairs = 10, method = c("pearson", "kendall",
  "spearman"), bins = NA)

Arguments

mat

a numeric matrix, with nodes (e.g., analytes such as genes, proteins, or metabolites) in columns, and samples in rows

min_pairs

minimum number of paired, non-missing observations to calculate a correlation coefficient; correlations between vectors with fewer than this number of paired observations will be replaced with NA

method

the correlation coefficient to be computed; one of "pearson" (default), "kendall", or "spearman"; can be abbreviated

bins

optionally, the number of bins into which to group nodes on the basis of the number of observations

Details

The degree to which a single point compromises the robustness of the network inference is quantified using autocorrelation. For each observation of a given node, the correlations between that node and all of its possible neighbors in the network are calculated with and without the inclusion of that observation. This yields two vectors of correlation coefficients. The correlation between these vectors, or autocorrelation, reflects the impact of the observation on the global interaction profile of that node, where a low correlation is indicative of network inference that is strongly dependent on the inclusion or exclusion of that single data point. This situation is reflective of a likely outlier that compromises the robustness of network inference.

The matrix of autocorrelations is subsequently converted to a matrix of Z scores, such that the matrix has a mean of zero and a standard deviation of one. If the matrix contains missing values, this scaling is performed for each group of columns with equivalent numbers of missing values separately. Optionally, if there are many possible numbers of missing values, the z score can be calculated for approximately equal sized bins of missing value counts using the bins parameter.

Value

a matrix with identical dimensions to the input matrix, containing the autocorrelation Z score assigned to each non-missing observation


skinnider/modern documentation built on Feb. 20, 2020, 1:52 p.m.