dacomp.validate_references: Diagnostic procedure for DACOMP reference set.

Description Usage Arguments Details Value Examples

View source: R/dacomp_Validation_and_reselection_procedure.R

Description

The function receives a data set (through X and Y), and a chosen reference set for the data (in ref_obj), evaluates if the reference set contains differentially abundant taxa, and if so, reselect the reference set.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
dacomp.validate_references(
  X,
  Y,
  ref_obj,
  test,
  Q_validation = 0.1,
  Minimal_Counts_in_ref_threshold = 10,
  Reduction_Factor = 0.9,
  NR_perm = ceiling(1/(Q_validation/ncol(X))),
  select_from = NULL,
  Verbose = T,
  disable_DSFDR = F
)

Arguments

X

matrix of 16S counts for data, rows are samples, columns are taxa.

Y

Vector of trait values. Entries in this vector should correspond to the rows of X

ref_obj

A reference object, returned from dacomp.select_references.

test

The type of the test to be used, passed to dacomp.test when testing reference taxa.

Q_validation

The level of the Simes test (α) used to determine if reference taxa show a signal.

Minimal_Counts_in_ref_threshold

When shrinking the reference, the reference will not be shrinked past a point at least Minimal_Counts_in_ref_threshold counts are available in the reference set for each sample.

Reduction_Factor

When a contamination is found in the reference set, a new reference set is selected with the following rule. If the tested reference set has at least λ_{min} counts in all samples, the new reference set selected is the smallest set with at least Reduction_Factor \ctimes λ_{min} counts. When constructing the new reference set, taxa are inserted into the reference set based on their reference scores (as given by the object ref_obj), until the target minimal abundance is reached. This reselection happens at each iteration of the algorithm.

NR_perm

Number of permutations used for computing the P-values for the Simes intersection test.

select_from

Can be used to limit the set of taxa that the reference set is selected from. Receives a numeric vector detailing which taxa are valid. The default value of NULL means no taxa are excluded from the selection procedure.

Verbose

Logical value indicating if messages should be printed. Messages include: the iteration of the algorithm; if a contamination has been found using the Simes test at each iteration; the number of taxa remaining in the reference set after reselection; and a message describing if the final reference included no signal, or iterations were halted due to the parameter Minimal_Counts_in_ref_threshold.

disable_DSFDR

Logical value indicating if testing should be done using the Simes procedure (True, default value) or the DS-FDR procedure (False, experimental feature).

Details

At each iteration of the algorithm, all taxa in the reference set are tested for differential abundance in a leave one-out-manner: they are exluded from the reference set, and tested for differential abundance w.r.t to the remaining taxa in the reference set. The parameter test determines the test performed, and should be selected according to the values of Y, see additional details in dacomp.test. The P-values for r reference taxa are combined using the Simes test statistic (see 1986 paper): p_{Simes} = min_{j=1, ..., r}\frac{r\times p_{(j)}}{j}, where p_{(j)} are the ordered P-values. If p_{Simes} is smaller than Q_validation, we reselect the reference set set as follows. If the current reference set has at least λ_{min} counts in all samples, the new reference set selected is the smallest set with at least Reduction\_Factor\cdot λ_{min} counts. Taxa are inserted into the reference set based on their reference scores (as given by the object ref_obj), until the target minimal abundance is reached. The algorithm iterates over testing and reselection steps, until either p_{Simes} is larger than Q_validation, or the algorithm cannot select a reference set with at least Minimal_Counts_in_ref_threshold counts under the reference taxa for all samples.

Setting the parameter disable_DSFDR to F uses an experimental feature, where combined of P-values is done using the DS-FDR multiple testing procedure, rather than the Simes test.

Value

A numeric vector, containing the the indices of the taxa selected as the reference set by the function ( see description of algorithm under details). Indices are based on the columns of X. See example on how to use with testing procedure under dacomp.test.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## Not run: 
set.seed(1)
library(dacomp)
#generate data with two study groups
data = dacomp.generate_example_dataset.two_sample(n_X = 30,n_Y = 30,m1 = 50,signal_strength_as_change_in_microbial_load = 0.1)

# select references. We purposely select reference taxa so that differentially abundant taxa enter the reference set. In general, select using median_SD_threshold=0, minimal_TA=100, see paper for discussion of this selection strategy
result.selected.references = dacomp.select_references(X = data$counts,
                                                     median_SD_threshold = 1.3,
                                                     maximal_TA = 1000,
                                                     verbose = T)
                                                     
# some differentially abundant taxa entered the reference set:                                                    
sum(result.selected.references$selected_references %in% data$select_diff_abundant)

#run the sensitivity analysis.
cleaned_references = dacomp.validate_references(X =  data$counts,
                                               Y =  data$group_labels,
                                               ref_obj = result.selected.references,
                                               test =DACOMP.TEST.NAME.WILCOXON,
                                               Q_validation = 0.1,
                                               Minimal_Counts_in_ref_threshold = 100,
                                               Reduction_Factor = 0.5,
                                               Verbose = T,
                                               disable_DSFDR = T,
                                               NR_perm = 1000)
                                               
#now the reduced reference has no differentially abundant taxa inside....
sum(cleaned_references %in% data$select_diff_abundant)

# when testing, report results both for the original reference set selected, as well as the reduced reference.

## End(Not run)
 

barakbri/dacomp documentation built on June 17, 2021, 11:20 p.m.