detect.outliers: Detect outliers

View source: R/detect.outliers.R

detect.outliersR Documentation

Detect outliers

Description

Detect outliers in normalized RNA-seq data.

Usage

detect.outliers(
  data,
  num.null = 1000,
  initial.screen.method = c("fdr", "p.value"),
  p.value.threshold = 0.05,
  fdr.threshold = 0.01,
  kmeans.nstart = 1
)

Arguments

data

A matrix or data frame of normalized RNA-seq data, organized with transcripts on rows and samples on columns. Transcript identifiers should be stored as rownames(data).

num.null

The number of transcripts to generate when simulating from null distributions; default is 1000. We recommend using at least 10,000 iterations for publication-level results, with 100,000 or even one million iterations providing more robust estimates.

initial.screen.method

The statistical criterion for initial gene selection; valid options are 'FDR' and 'p-value'.

p.value.threshold

The p-value threshold for the outlier test; default is 0.05. Once the p-value for a sample exceeds p.value.threshold, testing for that transcript ceases, and all remaining samples will have p-values equal to NA.

fdr.threshold

The false discovery rate (FDR)-adjusted p-value threshold for determining the final count of outliers; default is 0.01.

kmeans.nstart

The number of random starts when computing k-means fraction; default is 1. See ?stats::kmeans for further details.

Value

A list consisting of the following entries:

  • p.values: a matrix of unadjusted p-values for the outlier test run on each transcript in data.

  • fdr: a matrix of FDR-adjusted p-values for the outlier test run on each transcript in data.

  • num.outliers: a vector giving the number of outliers detected for each transcript based on the threshold.

  • outlier.test.results.list: a list of length max(num.outliers) + 1 containing entries roundN, where N is between one and max(num.outliers) + 1. roundN is the data frame of results for the outlier test after excluding the (N-1)th outlier sample, with round1 being for the original data set (i.e., before excluding any outlier samples).

  • distributions: a numeric vector indicating the optimal distribution for each transcript. Possible values are 1 (normal), 2 (log-normal), 3 (exponential), and 4 (gamma).

  • initial.screen.method: Specifies the statistical criterion for initial feature selection. Valid options are 'p-value' and 'FDR' (p-value used by default).

Examples

data(outliers);
outliers.subset <- outliers[1:10,];
results <- detect.outliers(
   data = outliers.subset,
   num.null = 10
   );

OutSeekR documentation built on April 11, 2025, 5:39 p.m.