testDA: Comparing differential abundance/expression methods by...

View source: R/testDA.R

testDAR Documentation

Comparing differential abundance/expression methods by Empirical power and False Discovery Rate

Description

Calculating Power, False Discovery Rates, False Positive Rates and AUC (Area Under the Receiver Operating Characteristic (ROC) Curve) for various differential abundance and expression methods

Usage

testDA(
  data,
  predictor,
  paired = NULL,
  covars = NULL,
  R = 20,
  tests = c("bay", "ds2", "ds2x", "per", "adx", "znb", "zpo", "msf", "zig", "erq",
    "erq2", "neb", "qpo", "poi", "sam", "lrm", "llm", "llm2", "lma", "lmc", "ere",
    "ere2", "pea", "spe", "wil", "kru", "qua", "fri", "abc", "ttt", "ltt", "ltt2", "tta",
    "ttc", "ttr", "aov", "lao", "lao2", "aoa", "aoc", "vli", "lim", "lli", "lli2", "lia",
    "lic"),
  relative = TRUE,
  effectSize = 5,
  k = NULL,
  cores = (detectCores() - 1),
  p.adj = "fdr",
  args = list(),
  out.all = NULL,
  alpha = 0.1,
  core.check = TRUE,
  verbose = TRUE
)

Arguments

data

Either a data.frame with counts/abundances, OR a phyloseq object. If a data.frame is provided rows should be taxa/genes/proteins and columns samples, and there should be rownames

predictor

The predictor of interest. Either a Factor or Numeric, OR if data is a phyloseq object the name of the variable in sample_data(data) in quotation. If the predictor is numeric it will be treated as such in the analyses

paired

For paired/blocked experimental designs. Either a Factor with Subject/Block ID for running paired/blocked analysis, OR if data is a phyloseq object the name of the variable in sample_data(data) in quotation.

covars

Either a named list with covariates, OR if data is a phyloseq object a character vector with names of the variables in sample_data(data)

R

Integer. Number of times to run the tests. Default 20

tests

Character. Which tests to include. Default all

relative

Logical. TRUE (default) for compositional data. FALSE for absolute abundances or pre-normalized data.

effectSize

Numeric. The effect size for the spike-ins. Default 5

k

Vector of length 3. Number of Features to spike in each tertile (lower, mid, upper). E.g. k=c(5,10,15): 5 features spiked in low abundance tertile, 10 features spiked in mid abundance tertile and 15 features spiked in high abundance tertile. Default NULL, which will spike 2 percent of the total amount of features in each tertile (a total of 6 percent), but minimum c(5,5,5)

cores

Integer. Number of cores to use for parallel computing. Default one less than available. Set to 1 for sequential computing.

p.adj

Character. Method for p-value adjustment. See p.adjust for details. Default "fdr"

args

List. A list with lists of arguments passed to the different methods. See details for more.

out.all

If TRUE linear models will output results and p-values from anova/drop1, ds2/ds2x will run LRT and not Wald test, erq and erq2 will produce one p-value for the predictor, and limma will run F-tests. If FALSE will output results for 2. level of the predictor. If NULL (default) set as TRUE for multi-class predictors and FALSE otherwise

alpha

q-value threshold for determining significance for Power. Default 0.1

core.check

If TRUE will make an interactive check that the amount of cores specified are desired. Only if cores>20. This is to ensure that the function doesn't automatically overloads a server with workers.

verbose

If TRUE will print informative messages

Details

mva is excluded by default, as it is slow.

Value

An object of class DA, which contains a list of results:

  • table - FPR, AUC and spike detection rate for each run

  • results - A complete list of output from all the methods. Example: Get wilcoxon results from 2. run as such: $results[[2]]["wil"]

  • details - A dataframe with details from the run

  • run.times - A dataframe with average run times of the different methods

Examples

# Creating random count_table and predictor
set.seed(5)
mat <- matrix(rnbinom(1000, size = 0.5, mu = 500), nrow = 50, ncol = 20)
rownames(mat) <- 1:50
pred <- c(rep("Control", 10), rep("Treatment", 10))

# Running testDA to find the best method
# This example only repeats the test 1 time (R = 1). 
# R should be increased to at least 20 for real test.
# It also uses 1 core (cores = 1). 
# Remove this argument to get it as high (and thereby fast) as possible.
res <- testDA(data = mat, predictor = pred, cores = 1, R = 1)
summary(res)


# Include a paired variable for dependent/blocked samples
subject <- rep(1:10, 2)
res <- testDA(data = mat, predictor = pred, paired = subject, cores = 1, R = 1)

# Include covariates
covar1 <- rnorm(20)
covar2 <- rep(c("A","B"), 10)
res <- testDA(data = mat, predictor = pred, 
              covars = list(FirstCovar = covar1, CallItWhatYouWant = covar2), cores = 1, R = 1)

# Data is absolute abundance
res <- testDA(data = mat, predictor = pred, relative = FALSE, cores = 1, R = 1)



Russel88/DAtest documentation built on March 24, 2022, 3:50 p.m.