runAnalysis: Run an integrated co-occurence analysis for a microbial...

View source: R/significance_pipeline.R

runAnalysisR Documentation

Run an integrated co-occurence analysis for a microbial dataset

Description

Runs an automized processing of the OTU table or phyloseq object, passes the jobs to ccrepe and saves the results

Usage

runAnalysis(
  OTU_table,
  abundance_cutoff = 1e-04,
  q_crit = 0.05,
  parallel = FALSE,
  ncpus = getOption("micInt.ncpus", 1L),
  cl = NULL,
  returnVariables = NULL,
  subset = NULL,
  sim.scores = NULL,
  file = FALSE,
  magnitude_factor = 10,
  prefix = NULL,
  metadataCols = c("OTU Id", "taxonomy"),
  postfix = "",
  renormalize = TRUE,
  iterations = 1000,
  ccrepe_args = list()
)

Arguments

OTU_table

The raw OTU table (if a data.frame or a phyloseq otu_table is supplied) to be treated or an experiment level phyloseq object containing the data (the latter is recommended). Note that in the case of a phyloseq otu_table, no taxonomy can be handled.

abundance_cutoff

The mean abundance cutoff for the OTUs. If it is NULL, the there will be not filtering.

q_crit

Numeric, the q-value cutoff when construction interaction tables

parallel

Should the analysis be run in parallel?

ncpus

If parallel = TRUE, how many cores should be used? Defaults to one.

cl

Custom cluster to use if parallel = TRUE.

returnVariables

Which variables should the function return (character vector)? Available options are:

  • similarity_measures_significance: The interaction_table of significant interactions

  • refined_table: The processed OTU table

  • min_dataset: The smallest non-zero entity in the refined table

  • taxonomy: A named numberic containing the taxonomy of each OTU (collapsed into a single string)

  • outputargs: A list (with a element for each similarity measure) comtaining the arguments to be passed to output_ccrepe_data for each similarity measure

  • common_outputargs: Like outputargs, but these arguments stay the same for all similarity measure in order to avoid duplicates.

In addition, all paramerters for this function are available. Other internal variables found upon inspection of the source code may also be returned, but they are for advanced users only. If NULL, the listed parameters in this section in addtion to an echo of the parameters are retuned. Note: If OTU_table is a phyloseq object, the returned variables is a data frame corresponding to the phyloseq object. This is due to the fact that it is intervally converted into a data frame.

subset

Character, the subset of similarity measures to use, denoted by the its name in the list (not necessarly its string) returned from similarity_measures or similarity measure modiftying function such as noisify If NULL, all available measures will be used

sim.scores

The similarity measures of class sim.measure to use. If it is NULL, all measures available in the package will be used (recommanded for most purposes).

file

Should the tables of significant interactions be written to a file? If so, they are printed to csv-files containing the name of the similarity measure

magnitude_factor

When making noisified functions, the magnitude of the noise will be this number multiplied with min_dataset

prefix

The prefix of the file names being written. Ignored if file=FALSE.

metadataCols

The names (character vector) or position (integer) of the metadata columns to remove from the table before analyzing it. Ignored if a phyloseq object is supplied

postfix

The postfix of the file names being written. Ignored if file=FALSE.

renormalize

Should the data be renormalized during filtering process and permutation? Should be TRUE when used on relative abundances, but must be FALSE if absolute abundances are used.

iterations

Integer of length one, the number of iterations to run

ccrepe_args

A named list of custom arguments to ccrepe if it is necessary to fine-tune the workings. This argument list will override the effects of the other arguments.

Details

If the function is told to output a file and no prefix is given, the csv-files will all share a common prefix of the form: q_crit=(critical q-value)_cutoff=(the mean abundance cutoff)_magfac=(the magnitude factor), where all numbers are in scientific notation. Then the sim.score name follows, then the postfix and finally the csv extention. The postfix is by default empty.

In order for an OTU-table to be valid when the argument OTU_table is a data.frame, the following criteria must hold:

  • The data points (sample) are in columns, the abundances for each OTU is in rows.

  • The rows may only hold OTU abundances

  • There may be as many metadata colums as you like. However, they all need to be declared in the metadataCols argument and the column taxonomy has be there in order for the output file to contain the taxonomy.

  • The row names of the table are the OTU names and the column names are the sample names

For phyloseq objects (both experiment level and otu_table), you do not need to care about this, it is automatically handeled

Value

A list of the variables requested from the parameter returnVariables.

See Also

output_ccrepe_data

Examples

library(micInt)
data(seawater)
sim.scores <- similarity_measures(subset= c("spearman","pearson"))
runAnalysis(OTU_table = seawater, sim.scores = sim.scores, parallel = TRUE, ncpus = 2,
iterations = 100)


AlmaasLab/micInt documentation built on April 1, 2022, 10:37 a.m.