Description Usage Arguments Details Value References Examples
View source: R/dacomp_select_references.R
The function receives a table of microbiome counts, and selects a set of taxa used for normalization in dacomp.test
.
The counts matrix should be formatted with taxa as columns, samples as rows. No rarefaction or preliminary normalization is required.
The first step of the computation consists of computing the standard deviation of the ratio of each pair of taxa, across subjects. Each taxon is involved in the computation of m-1 pairwise standard deviations, with m being the number of taxa. The second step consists of finding the median pairwise standard deviation of each taxon, across all pairwise standard deviations computed.
This value, computed for each taxon, is named the 'median SD' statistic. Taxa with values of the 'median SD' statistic below median_SD_threshold
are selected as the reference set of taxa.
By default, 'median SD' is set to 0, meaning a sufficient number of reference taxa will be selected, so that at least 'minimal_TA' counts will be available under the reference taxa for all samples.
See vignette('dacomp_main_vignette')
for additional details and formulas.
The full algorithm and additional details are in Brill et. al. (2019).
1 2 3 4 5 6 7 8 9 10 11 12 | dacomp.select_references(
X,
median_SD_threshold = 0,
minimal_TA = 100,
maximal_TA = minimal_TA + 100,
Pseudo_Count_used = 1,
verbose = F,
select_from = NULL,
Previous_Reference_Selection_Object = NULL,
run_in_parallel = F,
Nr.Cores = 4
)
|
X |
Counts matrix, with rows representing samples, columns representing different taxa. |
median_SD_threshold |
Critical value for the 'median SD' statistic. Taxa with a 'median SD' statistic smaller than this value will be taken as reference. In any case, the threshold value used will be increased until at least 'minimal_TA' counts are avaiable for the reference taxa under all samples. The default parameter will select the minimal number of reference taxa until the 'minimal_TA' criterion is met. |
minimal_TA |
The minimal number of counts required in each sample, in the taxa selected as reference. If the set of reference taxa has a sample with less than 'minimal_TA' reads in the set of reference taxa selected, the function will increase the 'median SD' value until all samples have at least |
maximal_TA |
Relevant only if 'median SD' is set to a non-default value larger than zero. If all samples have more than |
Pseudo_Count_used |
Pseudo count added to all count values, to avoid dividing by zero. |
verbose |
If set to |
select_from |
The default value of |
Previous_Reference_Selection_Object |
If the user previously selected a set of reference taxa for the data with one threshold, and would like to select a new set of reference taxa with another threshold, the output of a previous run of |
run_in_parallel |
should computation be parallelized |
Nr.Cores |
if computation is parallelised, how many parallel workers should be used |
Target Abundance (TA) limits: The function will attempt to use the argument median_SD_threshold
as a threshold for classification. If needed, the function will increase the actual threshold used so that each sample has at least minimal_TA
counts under taxa selected as reference. The function will lower the classification threshold if all samples have more than maximal_Ta
reads under the selected set of reference taxa.
Computation may take up to a minute or two, for large datasets.
The function returns an object of type "dacomp.reference.selection.object", which is a list with the following fields:
selected_references - A vector with the indices of selected reference taxa.
mean_prevalence_over_the_sorted - A vector, containing fraction of zero counts in the reference set of taxa, across samples, if: the lowest median SD are taken as reference, two lowest median SD are taken as reference, three lowest...
min_abundance_over_the_sorted - A vector, containing the minimal number of counts observed in the reference set of taxa, across samples, if: the lowest median SD are taken as reference, two lowest median SD are taken as reference, three lowest...
which_is_min_abundance_over_the_sorted - A vector, containing the sample index for which the minimum was acheived for the entry min_abundance_over_the_sorted
ratio_matrix - The matrix of SD_j,k as defined in the paper and the package vignette.
scores - the median SD scores, S_j as defined in the package vignette and paper.
selected_MinAbundance - The minimal number of counts, available under the reference set of taxa, across subjects.
median_SD_threshold - The input supplied under the function argument with the same name, without modification by the Target Abundance feature (see 'details').
minimal_TA - The input supplied under the function argument with the same name.
maximal_TA - The input supplied under the function argument with the same name.
Brill, Barak, Amnon Amir, and Ruth Heller. 2019. Testing for Differential Abundance in Compositional Counts Data, with Application to Microbiome Studies. arXiv Preprint arXiv:1904.08937.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | #' \dontrun{
library(dacomp)
set.seed(1)
data = dacomp.generate_example_dataset.two_sample(m1 = 100,
n_X = 50,
n_Y = 50,
signal_strength_as_change_in_microbial_load = 0.1)
# Select references: (may take a minute)
result.selected.references = dacomp.select_references(X = data$counts,
minimal_TA = 100, #Choosing the minimal number of reference taxa so that at least 100 reads are available under the reference for all samples
verbose = T)
length(result.selected.references$selected_references)
# Plot the reference selection scores (can also be used to better set the median SD threshold)
dacomp.plot_reference_scores(result.selected.references)
# Select a reference set with a different (lower) threshold. user can use the function argument
# 'Previous_Reference_Selection_Object' to provide a previous reference seleciton object for this data, to speed up computation.
result.selected.references.different.threshold = dacomp.select_references(X = data$counts,
median_SD_threshold = 0.5,
verbose = F,
Previous_Reference_Selection_Object = result.selected.references)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.