Expression Methods

runtimeDA

R Documentation

Estimate runtime of `testDA` on large datasets

Description

Estimate the runtime of testDA from running on a subset of the features. Intended for datasets with at least 2000 features.

Usage

runtimeDA(
  data,
  predictor,
  paired = NULL,
  covars = NULL,
  subsamples = c(500, 1000, 1500, 2000),
  subsamples.slow = c(100, 150, 200, 250),
  tests = c("abc", "sam", "qua", "fri", "vli", "qpo", "pea", "wil", "ttt", "ttr",
    "ltt", "ltt2", "ere", "ere2", "msf", "zig", "lim", "lli", "lli2", "aov", "lao",
    "lao2", "kru", "lrm", "llm", "llm2", "spe", "aoa", "aoc", "tta", "ttc", "lma", "lmc",
    "lia", "lic"),
  tests.slow = c("mva", "neb", "bay", "per", "ds2", "ds2x", "zpo", "znb", "adx", "poi",
    "erq", "erq2"),
  cores = (detectCores() - 1),
  ...
)

Arguments

`data`	Either a matrix with counts/abundances, OR a `phyloseq` object. If a matrix/data.frame is provided rows should be taxa/genes/proteins and columns samples, and there should be rownames
`predictor`	The predictor of interest. Either a Factor or Numeric, OR if `data` is a `phyloseq` object the name of the variable in `sample_data(data)` in quotation. If the `predictor` is numeric it will be treated as such in the analyses
`paired`	For paired/blocked experimental designs. Either a Factor with Subject/Block ID for running paired/blocked analysis, OR if `data` is a `phyloseq` object the name of the variable in `sample_data(data)` in quotation. Only for "poi", "per", "ttt", "ltt", "ltt2", "neb", "wil", "erq", "ds2", "lrm", "llm", "llm2", "lim", "lli", "lli2" and "zig"
`covars`	Either a named list with covariates, OR if `data` is a `phyloseq` object a character vector with names of the variables in `sample_data(data)`
`subsamples`	Vector with numbers of features to subsample to estimate runtime for fast methods
`subsamples.slow`	Vector with numbers of features to subsample to estimate runtime for slow methods
`tests`	Fast methods to include
`tests.slow`	Slow methods to include
`cores`	Integer. Number of cores to use for parallel computing. Default one less than available. Set to 1 for sequential computing.
`...`	Additional arguments for the `testDA` function

Details

Outputs the estimated times for running each method 1 time. With cores=1 the runtime will be the sum of them all. With more cores the actual runtime will decrease asymptotically towards the slowest test

Runtime of all methods are expected to scale linearly with the number of features, except "anc" and "bay" which are modelled with a 2. order polynomial.

Value

A data.frame with estimated runtimes for 1 run

Examples

# Creating large random count_table and predictor
set.seed(5)
mat <- matrix(rnbinom(150000, size = 0.5, mu = 500), nrow = 10000, ncol = 10)
rownames(mat) <- 1:10000
pred <- c(rep("A", 5), rep("B", 5))

# Use runtimeDA to predict total runtime for all features
# This example uses 1 core (cores = 1). 
# Remove the cores argument to get it as high (and thereby fast) as possible.
# Also, in this example only a subset of tests are run.
runtimeDA(mat, pred, cores = 1, tests = c("ttt","wil"), tests.slow = c("neb"))

Russel88/DAtest documentation built on March 24, 2022, 3:50 p.m.