dacomp.select_references: Select a set of reference taxa, used for testing differential...

Description Usage Arguments Details Value References Examples

View source: R/dacomp_select_references.R

Description

The function receives a table of microbiome counts, and selects a set of taxa used for normalization in dacomp.test. The counts matrix should be formatted with taxa as columns, samples as rows. No rarefaction or preliminary normalization is required. The first step of the computation consists of computing the standard deviation of the ratio of each pair of taxa, across subjects. Each taxon is involved in the computation of m-1 pairwise standard deviations, with m being the number of taxa. The second step consists of finding the median pairwise standard deviation of each taxon, across all pairwise standard deviations computed. This value, computed for each taxon, is named the 'median SD' statistic. Taxa with values of the 'median SD' statistic below median_SD_threshold are selected as the reference set of taxa. By default, 'median SD' is set to 0, meaning a sufficient number of reference taxa will be selected, so that at least 'minimal_TA' counts will be available under the reference taxa for all samples. See vignette('dacomp_main_vignette') for additional details and formulas. The full algorithm and additional details are in Brill et. al. (2019).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
dacomp.select_references(
  X,
  median_SD_threshold = 0,
  minimal_TA = 100,
  maximal_TA = minimal_TA + 100,
  Pseudo_Count_used = 1,
  verbose = F,
  select_from = NULL,
  Previous_Reference_Selection_Object = NULL,
  run_in_parallel = F,
  Nr.Cores = 4
)

Arguments

X

Counts matrix, with rows representing samples, columns representing different taxa.

median_SD_threshold

Critical value for the 'median SD' statistic. Taxa with a 'median SD' statistic smaller than this value will be taken as reference. In any case, the threshold value used will be increased until at least 'minimal_TA' counts are avaiable for the reference taxa under all samples. The default parameter will select the minimal number of reference taxa until the 'minimal_TA' criterion is met.

minimal_TA

The minimal number of counts required in each sample, in the taxa selected as reference. If the set of reference taxa has a sample with less than 'minimal_TA' reads in the set of reference taxa selected, the function will increase the 'median SD' value until all samples have at least minimal_TA reads in the selected set of reference taxa.

maximal_TA

Relevant only if 'median SD' is set to a non-default value larger than zero. If all samples have more than maximal_TA reads available under the selected set of reference taxa, the 'median SD' threshold value for classifying taxa as references will be lowered, until the condition is met.

Pseudo_Count_used

Pseudo count added to all count values, to avoid dividing by zero.

verbose

If set to TRUE, messages will be displayed, as computation progresses.

select_from

The default value of NULL indicates that all taxa are valid candidates for selection. The user may limit the set of possible candidates for the reference set, by supplying a list of candidates (by indices) using this argument.

Previous_Reference_Selection_Object

If the user previously selected a set of reference taxa for the data with one threshold, and would like to select a new set of reference taxa with another threshold, the output of a previous run of dacomp.select_references can be supplied as an argument to speed up computations. See usage example in code snippet below.

run_in_parallel

should computation be parallelized

Nr.Cores

if computation is parallelised, how many parallel workers should be used

Details

Target Abundance (TA) limits: The function will attempt to use the argument median_SD_threshold as a threshold for classification. If needed, the function will increase the actual threshold used so that each sample has at least minimal_TA counts under taxa selected as reference. The function will lower the classification threshold if all samples have more than maximal_Ta reads under the selected set of reference taxa. Computation may take up to a minute or two, for large datasets.

Value

The function returns an object of type "dacomp.reference.selection.object", which is a list with the following fields:

References

Brill, Barak, Amnon Amir, and Ruth Heller. 2019. Testing for Differential Abundance in Compositional Counts Data, with Application to Microbiome Studies. arXiv Preprint arXiv:1904.08937.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#' \dontrun{
library(dacomp)

set.seed(1)

data = dacomp.generate_example_dataset.two_sample(m1 = 100,
       n_X = 50,
       n_Y = 50,
       signal_strength_as_change_in_microbial_load = 0.1)

# Select references: (may take a minute)
result.selected.references = dacomp.select_references(X = data$counts,
                                                     minimal_TA = 100, #Choosing the minimal number of reference taxa so that at least 100 reads are available under the reference for all samples
                                                     verbose = T)

length(result.selected.references$selected_references)

# Plot the reference selection scores (can also be used to better set the median SD threshold)
dacomp.plot_reference_scores(result.selected.references)

# Select a reference set with a different (lower) threshold. user can use the function argument
# 'Previous_Reference_Selection_Object' to provide a previous reference seleciton object for this data, to speed up computation.

result.selected.references.different.threshold = dacomp.select_references(X = data$counts,
          median_SD_threshold = 0.5, 
          verbose = F,
          Previous_Reference_Selection_Object = result.selected.references)


} 

barakbri/dacomp documentation built on June 17, 2021, 11:20 p.m.