dacomp.test: Test taxa for differential abundance, using a given set of...

Description Usage Arguments Details Value References Examples

View source: R/dacomp_main_function.R

Description

The function tests taxa for differential abundance, given a set of reference taxa used for normalization, as described in Brill et. al. (2019). The function supports several tests, by type of phenotype: two sample tests, K-sample tests, tests for association with a continuous phenotype and user defined tests. See 'details' below on how the input argument y shold be formatted for different tests.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
dacomp.test(
  X,
  y,
  ind_reference_taxa,
  test,
  q = 0.05,
  nr_perm = max(1/(q/(ncol(X))), 1000),
  disable_DSFDR = F,
  user_defined_test_function = NULL,
  compute_ratio_normalization = F,
  verbose = F,
  DSFDR_Filter = rep(T, ncol(X)),
  Test_All = F,
  return_rarefied_values = F
)

Arguments

X

Matrix of counts, with rows representing samples, columns representing taxa. See 'details' for additional information on how to format this matrix for paired study designs.

y

Vector of phenotype values by sample. See details on different formats used by different tests.

ind_reference_taxa

One of two options: Object of type 'dacomp.reference.selection.object' returned from dacomp.select_references, or a vector of indices of taxa selected as a reference set for normalization. See package vignette for additional details.

test

One of the values in the vector DACOMP.POSSIBLE.TEST.NAMES, or one of the constants available as DACOMP.TEST.NAME.* with asterisk representing the name of the test. See 'details' for additional details and inputs required, by test.

q

The required FDR level for the DS-FDR algorithm, see Jiang et. al. (2017) for details.

nr_perm

Number of permuations used for testing and computing P-values. The default number of permutations is set high enough to provide powerful inference after adjusting for multiplicity. Change the number of permutations only if you know how it effects power after correcting for multiplicity.

disable_DSFDR

Can be used to disable the DS-FDR computation (which may take up to a minute on large datasets). The default value is FALSE.

user_defined_test_function

Argument for inputing a function for a custom test of association supplied by the user, see 'details' below.

compute_ratio_normalization

Argument determines if in addition to the test described above, the function should a run test based on normalization by division rather than rarefaction. See description under details.

verbose

Should messages be printed to console, indicating computation progress. The default value is FALSE.

DSFDR_Filter

A logical (boolean) vector specifying for which taxa should the DSFDR adjusted P-values be computed. Taxa whose corresponding entries in this vector are set to are excluded from testing, and will not have DSFDR adjusted Pvalues computed. If a taxon is set to be in the reference set, and its corresponding entry is set to T in this vector, it will still be excluded from testing.

Test_All

A logical value specifying if all test (including the reference taxa) should be tested for differential abundance. When testing a reference taxon for differential abundance, it is excluded from the reference set.

return_rarefied_values

a logical value indicating if the rarefied draws used for testing should be returned as a matrix under the entry rarefied_counts in the returned object.

Details

The function tests each taxon not in the reference set for differential abundance as follows. For each taxon, the following procedure is performed. First, an identical number of reads is taken from each sample, from the reads available under the reference set of taxa, and the taxon being tested. Next, a test of association is performed between the rarefied reads and the phenotype given by y. P-values are computed by permutations. Finally, after all P-values are computed, the DS-FDR threshold for rejection is computed. Hypotheses with P-values lower than the DS-FDR rejection threshold are rejected. Reference taxa are not tested for differential abundance. See vignette('dacomp_main_vignette') for additional details and examples.

The function supports several tests. Different tests require different formats for the arguments X and y.

Two Sample Tests (treatment vs. control, healthy vs.sick):

The parameter compute_ratio_normalization computes in addition to the test described above, a test of association between the phenotype y and the proportion of a tested taxon out of a subvector containing only the tested taxon and a set of reference taxa. This procedure, described as DACOMP-ratio in Brill et al. (2019), may enjoy higher power to detect differentially abundant taxa with a low number of counts, due to avoiding rarefaction. However, for scenarios with an extreme change in the microbial load of the measured ecology, e.g., a four-fold change in the microbial load between different study groups, this variant of DACOMP may have a slightly inflated FDR. See method description and simulation results in the paper for additional details.

We suggest using this function toghether with the function dacomp.validate_references. The function dacomp.validate_references is meant to detect if differentially abundant taxa have entered the reference set, and if so, reselect the reference set. We suggest that results be reported twice: once using the original reference selected, and once using the re-selected reference set. If results agree, than it is more likely that the initial reference set selected did not feature a contamination. If results show disagreement, this could be due to a signal entering the reference set.

Value

An object of type "dacomp.result.object", which is a list with the follow fields:

References

Brill, Barak, Amnon Amir, and Ruth Heller. 2019. Testing for Differential Abundance in Compositional Counts Data, with Application to Microbiome Studies. arXiv Preprint arXiv:1904.08937.

Jiang, L, A Amir, JT Morton, R Heller, E Arias-Castro, and R Knight. 2017. Discrete False-Discovery Rate Improves Identification of Differentially Abundant Microbes. mSystems 2: E00092-17. Am Soc Microbiol.

Wagner, Brandie D, Charles E Robertson, and J Kirk Harris. 2011. Application of Two-Part Statistics for Comparison of Sequence Variant Counts. PloS One 6 (5). Public Library of Science: e20296.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
## Not run: 


library(dacomp)

set.seed(1)

# example for a study with two groups:

data = dacomp.generate_example_dataset.two_sample(m1 = 100,
       n_X = 50,
       n_Y = 50,
       signal_strength_as_change_in_microbial_load = 0.1)

#select references: (may take a minute)
result.selected.references = dacomp.select_references(X = data$counts,
                                                     minimal_TA = 50, #Choosing the minimal number of reference taxa so that at least 50 reads are available under the reference for all samples
                                                     verbose = T)

length(result.selected.references$selected_references)

#plot the reference selection scores (can also be used to better set the median SD threshold)
dacomp.plot_reference_scores(result.selected.references)

#multiplicity correction levels for the BH and DS-FDR methods
q_BH = q_DSFDR = 0.1

#Perform testing:
result.test = dacomp.test(X = data$counts,
                      y = data$group_labels,
                      ind_reference_taxa = result.selected.references,
                      test = DACOMP.TEST.NAME.WILCOXON,nr_perm = 1000,
                      verbose = T,q = q_DSFDR)

rejected_BH = which(p.adjust(result.test$p.values.test,method = 'BH')<=q_BH)
rejected_DSFDR = result.test$dsfdr_rejected

#We note that our reference set may be contaminated, i.e. include some differentially abundant taxa. Therefore, we run a diagnostic check, trying to remove possible signals from the reference, and retesting:
Cleaned_references = dacomp.validate_references(X =  data$counts,
                                               Y =  data$group_labels,
                                               ref_obj = result.selected.references,
                                               test =DACOMP.TEST.NAME.WILCOXON,
                                               Q_validation = 0.1,
                                               Minimal_Counts_in_ref_threshold = 10,
                                               Reduction_Factor = 0.9,
                                               Verbose = T,
                                               disable_DSFDR = T,
                                               NR_perm = 10000)
                                                
result.test.with.reduced.reference = dacomp.test(X = data$counts,
                       y = data$group_labels,
                      ind_reference_taxa = result.selected.references,
                      test = DACOMP.TEST.NAME.WILCOXON,nr_perm = 1000,
                      verbose = T,q = q_DSFDR)

rejected_BH = which(p.adjust(result.test.with.reduced.reference$p.values.test,method = 'BH')<=q_BH)
rejected_DSFDR = result.test.with.reduced.reference$dsfdr_rejected

# example with continous outcome
set.seed(1)

data = dacomp.generate_example_dataset_continuous(n = 100,m1 = 30,
signal_strength_as_change_in_microbial_load = 0.2)


result.selected.references = dacomp.select_references(X = data$counts,
                                                     minimal_TA = 50, #Choosing the minimal number of reference taxa so that at least 50 reads are available under the reference for all samples
                                                     verbose = T)
#number of selected references
length(result.selected.references$selected_references)

#plot the reference selection scores (can also be used to better set the median SD threshold)
dacomp.plot_reference_scores(result.selected.references)

#multiplicity correction levels for the BH and DS-FDR methods
q_BH = q_DSFDR = 0.1

#Perform testing:
result.test = dacomp.test(X = data$counts,
                         y = data$covariate,test = DACOMP.TEST.NAME.SPEARMAN,
                         ind_reference_taxa = result.selected.references,nr_perm = 1000,
                         verbose = T,q = q_DSFDR)

rejected_BH = which(p.adjust(result.test$p.values.test,method = 'BH')<=q_BH)
rejected_DSFDR = result.test$dsfdr_rejected

#We note that our reference set may be contaminated, i.e. include some differentially abundant taxa. Therefore, we run a diagnostic check, trying to remove possible signals from the reference, and retesting:
Cleaned_references = dacomp.validate_references(X =  data$counts,
                                               Y =  data$covariate,
                                               ref_obj = result.selected.references,
                                               test = DACOMP.TEST.NAME.SPEARMAN,
                                               Q_validation = 0.1,
                                               Minimal_Counts_in_ref_threshold = 10,
                                               Reduction_Factor = 0.9,
                                               Verbose = T,
                                               disable_DSFDR = T,
                                               NR_perm = 1000)
result.test.with.reduced.reference = dacomp.test(X = data$counts,
                                                y = data$covariate,
                                                ind_reference_taxa = Cleaned_references,
                                                test = DACOMP.TEST.NAME.SPEARMAN,nr_perm = 1000,
                                                verbose = T,q = q_DSFDR)

rejected_BH = which(p.adjust(result.test.with.reduced.reference$p.values.test,method = 'BH')<=q_BH)
rejected_DSFDR = result.test.with.reduced.reference$dsfdr_rejected



## End(Not run)

barakbri/dacomp documentation built on June 17, 2021, 11:20 p.m.