trans_diff | R Documentation |
trans_diff
object for the differential analysis on the taxonomic abundanceThis class is a wrapper for a series of differential abundance test and indicator analysis methods, including
LEfSe based on the Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>,
random forest <doi:10.1016/j.geoderma.2018.09.035>, metastat based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>,
non-parametric Kruskal-Wallis Rank Sum Test,
Dunn's Kruskal-Wallis Multiple Comparisons based on the FSA
package, Wilcoxon Rank Sum and Signed Rank Tests, t-test, anova,
Scheirer Ray Hare test,
R package metagenomeSeq
Paulson et al. (2013) <doi:10.1038/nmeth.2658>,
R package ANCOMBC
<doi:10.1038/s41467-020-17041-7>, R package ALDEx2
<doi:10.1371/journal.pone.0067019; 10.1186/2049-2618-2-15>,
R package MicrobiomeStat
<doi:10.1186/s13059-022-02655-5>, beta regression <doi:10.18637/jss.v034.i02>, R package maaslin2
,
linear mixed-effects model and generalized linear mixed model.
new()
trans_diff$new( dataset = NULL, method = c("lefse", "rf", "metastat", "metagenomeSeq", "KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare", "ancombc2", "ALDEx2_t", "ALDEx2_kw", "DESeq2", "linda", "maaslin2", "betareg", "lme", "glmm")[1], group = NULL, taxa_level = "all", filter_thres = 0, alpha = 0.05, p_adjust_method = "fdr", transformation = NULL, remove_unknown = TRUE, lefse_subgroup = NULL, lefse_min_subsam = 10, lefse_norm = 1e+06, nresam = 0.6667, boots = 30, rf_ntree = 1000, group_choose_paired = NULL, metagenomeSeq_count = 1, ALDEx2_sig = c("wi.eBH", "kw.eBH"), by_group = NULL, by_ID = NULL, ... )
dataset
default NULL; microtable
object.
method
default "lefse"; see the following available options:
LEfSe method based on Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>
random forest and non-parametric test method based on An et al. (2019) <doi:10.1016/j.geoderma.2018.09.035>
Metastat method for all paired groups based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>
zero-inflated log-normal model-based differential test method from metagenomeSeq
package.
KW: Kruskal-Wallis Rank Sum Test for all groups (>= 2)
Dunn's Kruskal-Wallis Multiple Comparisons when group number > 2; see dunnTest function in FSA
package
Wilcoxon Rank Sum and Signed Rank Tests for all paired groups
Student's t-Test for all paired groups
ANOVA for one-way or multi-factor analysis; see cal_diff
function of trans_alpha
class
Scheirer Ray Hare test for nonparametric test used for a two-way factorial experiment;
see scheirerRayHare
function of rcompanion
package
Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC)
based on the ancombc2
function from ANCOMBC
package;
only support the case that group
is same with fix_formula
parameter in order to get well-organized output table;
For more flexible usages, please see and use ancombc2
function directly;
Reference: <doi:10.1038/s41467-020-17041-7>; Require ANCOMBC
package to be installed
(https://bioconductor.org/packages/release/bioc/html/ANCOMBC.html)
runs Welch's t and Wilcoxon tests with ALDEx2
package; see also the test parameter in ALDEx2::aldex
function;
ALDEx2 uses the centred log-ratio (clr) transformation and estimates per-feature technical variation within each sample using Monte-Carlo instances
drawn from the Dirichlet distribution; Reference: <doi:10.1371/journal.pone.0067019> and <doi:10.1186/2049-2618-2-15>;
require ALDEx2
package to be installed
(https://bioconductor.org/packages/release/bioc/html/ALDEx2.html)
runs Kruskal-Wallace and generalized linear model (glm) test with ALDEx2
package;
see also the test
parameter in ALDEx2::aldex
function.
Differential expression analysis based on the Negative Binomial (a.k.a. Gamma-Poisson) distribution based on the DESeq2
package.
Linear Model for Differential Abundance Analysis of High-dimensional Compositional Data
based on the linda
function of MicrobiomeStat
package.
Here the group parameter is passed to formula parameter in linda
function with the prefix '~'.
The parameter feature.dat.type = 'count'
has been fixed. Other parameters can be passed to the linda
function.
Reference: <doi:10.1186/s13059-022-02655-5>
finding associations between metadata and potentially high-dimensional microbial multi-omics data based on the Maaslin2 package.
Using this option can invoke the trans_env$cal_cor
function with cor_method = "maaslin2"
.
Beta Regression based on the betareg
package
Linear Mixed Effect Model based on the lmerTest
package.
In the return table, the significance of fixed factors are tested by function anova
.
The significance of 'Estimate' in each term of fixed factors comes from the model.
Generalized linear mixed model (GLMM) based on the glmmTMB
package.
If the relative abundance is applied, it is recommended to use beta family function, i.e. family = glmmTMB::beta_family(link = "logit")
.
Note that beta family function limits 0 < response value < 1.
If an error 'y values must be 0 < y < 1' occurs, please first transform input table like: (dataset$taxa_abund$Phylum + 1e-10)/(1 + 2e-10)
.
For more parameters, please see glmmTMB::glmmTMB
function.
In the return table, Conditional_R2 and Marginal_R2 represent total variance (explained by both fixed and random effects) and the variance explained by
fixed effects, respectively. The significance of fixed factors are tested by Chi-square test from function car::Anova
.
The significance of 'Estimate' in each term of fixed factors comes from the model.
group
default NULL; sample group used for the comparision; a colname of input microtable$sample_table
;
It is necessary when method is not "anova" or method is "anova" but formula is not provided.
Once group is provided, the return res_abund will have mean and sd values for group.
taxa_level
default "all"; 'all' represents using abundance data at all taxonomic ranks;
For testing at a specific rank, provide taxonomic rank name, such as "Genus".
If the provided taxonomic name is neither 'all' nor a colname in tax_table of input dataset,
the function will use the features in input microtable$otu_table
automatically.
filter_thres
default 0; the abundance threshold, such as 0.0005 when the input is relative abundance; only available when method != "metastat". The features with abundances lower than filter_thres will be filtered.
alpha
default 0.05; significance threshold to select taxa when method is "lefse" or "rf";
or used to generate significance letters when method is 'anova' or 'KW_dunn' like the alpha parameter in cal_diff
of trans_alpha
class.
p_adjust_method
default "fdr"; p.adjust method; see method parameter of p.adjust
function for other available options;
"none" means disable p value adjustment; So when p_adjust_method = "none"
, P.adj is same with P.unadj.
transformation
default NULL; feature abundance transformation method based on the mecodev package (https://github.com/ChiLiubio/mecodev),
such as 'AST' for the arc sine square root transformation. Please see the trans_norm
class of mecodev package.
Only available when method
is one of "KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare", "betareg" and "lme".
remove_unknown
default TRUE; whether remove unknown features that donot have clear classification information.
lefse_subgroup
default NULL; sample sub group used for sub-comparision in lefse; Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>.
lefse_min_subsam
default 10; sample numbers required in the subgroup test.
lefse_norm
default 1000000; scale value in lefse.
nresam
default 0.6667; sample number ratio used in each bootstrap for method = "lefse" or "rf".
boots
default 30; bootstrap test number for method = "lefse" or "rf".
rf_ntree
default 1000; see ntree in randomForest function of randomForest package when method = "rf".
group_choose_paired
default NULL; a vector used for selecting the required groups for paired testing, only used for method = "metastat" or "metagenomeSeq".
metagenomeSeq_count
default 1; Filter features to have at least 'counts' counts.; see the count parameter in MRcoefs function of metagenomeSeq
package.
ALDEx2_sig
default c("wi.eBH", "kw.eBH"); which column of the final result is used as the significance asterisk assignment; applied to method = "ALDEx2_t" or "ALDEx2_kw"; the first element is provided to "ALDEx2_t"; the second is provided to "ALDEx2_kw"; for "ALDEx2_t", the available choice is "wi.eBH" (Expected Benjamini-Hochberg corrected P value of Wilcoxon test) and "we.eBH" (Expected BH corrected P value of Welch's t test); for "ALDEx2_kw"; for "ALDEx2_t", the available choice is "kw.eBH" (Expected BH corrected P value of Kruskal-Wallace test) and "glm.eBH" (Expected BH corrected P value of glm test).
by_group
default NULL; a column of sample_table used to perform the differential test
among groups (group
parameter) for each group (by_group
parameter). So by_group
has a higher level than group
parameter.
Same with the by_group
parameter in trans_alpha
class.
Only available when method is one of c("KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare")
.
by_ID
default NULL; a column of sample_table used to perform paired t test or paired wilcox test for the paired data,
such as the data of plant compartments for different plant species (ID).
So by_ID
in sample_table should be the smallest unit of sample collection without any repetition in it.
Same with the by_ID
parameter in trans_alpha class.
...
parameters passed to cal_diff
function of trans_alpha
class when method is one of
"KW", "KW_dunn", "wilcox", "t.test", "anova", "betareg", "lme" or "glmm";
passed to ANCOMBC::ancombc2
function when method is "ancombc2" (except tax_level, global and fix_formula parameters);
passed to ALDEx2::aldex
function when method = "ALDEx2_t" or "ALDEx2_kw";
passed to DESeq2::DESeq
function when method = "DESeq2";
passed to MicrobiomeStat::linda
function when method = "linda";
passed to trans_env$cal_cor
function when method = "maaslin2".
res_diff and res_abund.
res_abund includes mean abundance of each taxa (Mean), standard deviation (SD), standard error (SE) and sample number (N) in the group (Group).
res_diff is the detailed differential test result, may containing:
"Comparison": The groups for the comparision, maybe all groups or paired groups. If this column is not found, all groups are used;
"Group": Which group has the maximum median or mean value across the test groups;
For non-parametric methods, median value; For t.test, mean value;
"Taxa": which taxa is used in this comparision;
"Method": Test method used in the analysis depending on the method input;
"LDA" or "MeanDecreaseGini": LDA: linear discriminant score in LEfSe; MeanDecreaseGini: mean decreasing gini index in random forest;
"P.unadj": original p value;
"P.adj": adjusted p value;
Others: qvalue: qvalue in metastat analysis.
\donttest{ data(dataset) t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group") t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group") t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus") t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group") }
plot_diff_abund()
Plot the abundance of differential taxa
trans_diff$plot_diff_abund( use_number = 1:20, color_values = RColorBrewer::brewer.pal(8, "Dark2"), select_group = NULL, select_taxa = NULL, simplify_names = TRUE, keep_prefix = TRUE, group_order = NULL, barwidth = 0.9, use_se = TRUE, add_sig = FALSE, add_sig_label = "Significance", add_sig_label_color = "black", add_sig_tip_length = 0.01, y_start = 1.01, y_increase = 0.05, text_y_size = 10, coord_flip = TRUE, xtext_angle = 45, ... )
use_number
default 1:20; numeric vector; the taxa numbers (1:n) selected in the plot; If the n is larger than the number of total significant taxa, automatically use all the taxa.
color_values
default RColorBrewer::brewer.pal
(8, "Dark2"); colors palette.
select_group
default NULL; this is used to select the paired groups.
This parameter is especially useful when the comparision methods is applied to paired groups;
The input select_group must be one of object$res_diff$Comparison
.
select_taxa
default NULL; character vector to provide taxa names.
The taxa names should be same with the names shown in the plot, not the 'Taxa' column names in object$res_diff$Taxa
.
simplify_names
default TRUE; whether use the simplified taxonomic name.
keep_prefix
default TRUE; whether retain the taxonomic prefix.
group_order
default NULL; a vector to order groups, i.e. reorder the legend and colors in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, overlook the levels in the group of sample_table.
barwidth
default 0.9; the bar width in plot.
use_se
default TRUE; whether use SE in plot, if FALSE, use SD.
add_sig
default FALSE; whether add the significance label to the plot.
add_sig_label
default "Significance"; select a colname of object$res_diff for the label text, such as 'P.adj' or 'Significance'.
add_sig_label_color
default "black"; the color for the label text when add_sig = TRUE.
add_sig_tip_length
default 0.01; the tip length for the added line when add_sig = TRUE.
y_start
default 1.01; the y axis position from which to add the label; the default 1.01 means 1.01 * Value; For method != "anova", all the start positions are same, i.e. Value = max(Mean+SD or Mean+SE); For method = "anova"; the stat position is calculated for each point, i.e. Value = Mean+SD or Mean+SE.
y_increase
default 0.05; the increasing y axia space to add label for paired groups; the default 0.05 means 0.05 * y_start * Value; In addition, this parameter is also used to label the letters of anova result with the fixed (1 + y_increase) * y_start * Value.
text_y_size
default 10; the size for the y axis text, i.e. feature text.
coord_flip
default TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.
xtext_angle
default 45; number ranging from 0 to 90; used to make x axis text generate angle to reduce text overlap; only available when coord_flip = FALSE.
...
parameters passed to ggsignif::stat_signif
when add_sig = TRUE.
ggplot.
\donttest{ t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus") t1$plot_diff_abund(use_number = 1:10) t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE) t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group") t1$plot_diff_abund(use_number = 1:20) t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE) t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group") t1$plot_diff_abund(use_number = 1:20) t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE) }
plot_diff_bar()
Bar plot for indicator index, such as LDA score and P value.
trans_diff$plot_diff_bar( color_values = RColorBrewer::brewer.pal(8, "Dark2"), color_group_map = FALSE, use_number = 1:10, threshold = NULL, select_group = NULL, keep_full_name = FALSE, keep_prefix = TRUE, group_order = NULL, axis_text_y = 12, coord_flip = TRUE, xtext_angle = 45, xtext_size = 10, heatmap_cell = "P.unadj", heatmap_sig = "Significance", heatmap_x = "Factors", heatmap_y = "Taxa", heatmap_lab_fill = "P value", ... )
color_values
default RColorBrewer::brewer.pal
(8, "Dark2"); colors palette for different groups.
color_group_map
default FALSE; whether match the colors to groups in order to fix the color in each group when part of groups are not shown in the plot.
When color_group_map = TRUE
, the group_order inside the object will be used as full groups set to guide the color extraction.
use_number
default 1:10; numeric vector; the taxa numbers used in the plot, i.e. 1:n.
threshold
default NULL; threshold value of indicators for selecting taxa, such as 3 for LDA score of LEfSe.
select_group
default NULL; this is used to select the paired group when multiple comparisions are generated;
The input select_group must be one of object$res_diff$Comparison
.
keep_full_name
default FALSE; whether keep the taxonomic full lineage names.
keep_prefix
default TRUE; whether retain the taxonomic prefix, such as "g__".
group_order
default NULL; a vector to order the legend and colors in plot;
If NULL, the function can first determine whether the group column of microtable$sample_table
is factor. If yes, use the levels in it.
If provided, this parameter can overwrite the levels in the group of microtable$sample_table
.
axis_text_y
default 12; the size for the y axis text.
coord_flip
default TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.
xtext_angle
default 45; number ranging from 0 to 90; used to make x axis text generate angle to reduce text overlap; only available when coord_flip = FALSE.
xtext_size
default 10; the text size of x axis.
heatmap_cell
default "P.unadj"; the column of data for the cell of heatmap when formula with multiple factors is found in the method.
heatmap_sig
default "Significance"; the column of data for the significance label of heatmap.
heatmap_x
default "Factors"; the column of data for the x axis of heatmap.
heatmap_y
default "Taxa"; the column of data for the y axis of heatmap.
heatmap_lab_fill
default "P value"; legend title of heatmap.
...
parameters passing to geom_bar
for the bar plot or
the plot_cor
function in trans_env
class for the heatmap of multiple factors when formula is found in the method.
ggplot.
\donttest{ t1$plot_diff_bar(use_number = 1:20) }
plot_diff_cladogram()
Plot the cladogram using taxa with significant difference.
trans_diff$plot_diff_cladogram( color = RColorBrewer::brewer.pal(8, "Dark2"), group_order = NULL, use_taxa_num = 200, filter_taxa = NULL, use_feature_num = NULL, clade_label_level = 4, select_show_labels = NULL, only_select_show = FALSE, sep = "|", branch_size = 0.2, alpha = 0.2, clade_label_size = 2, clade_label_size_add = 5, clade_label_size_log = exp(1), node_size_scale = 1, node_size_offset = 1, annotation_shape = 22, annotation_shape_size = 5 )
color
default RColorBrewer::brewer.pal
(8, "Dark2"); color palette used in the plot.
group_order
default NULL; a vector to order the legend in plot;
If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it.
If provided, this parameter can overwrite the levels in the group of sample_table.
If the number of provided group_order is less than the number of groups in res_diff$Group
, the function will select the groups of group_order automatically.
use_taxa_num
default 200; integer; The taxa number used in the background tree plot; select the taxa according to the mean abundance .
filter_taxa
default NULL; The mean relative abundance used to filter the taxa with low abundance.
use_feature_num
default NULL; integer; The feature number used in the plot; select the features according to the LDA score (method = "lefse") or MeanDecreaseGini (method = "rf") from high to low.
clade_label_level
default 4; the taxonomic level for marking the label with letters, root is the largest.
select_show_labels
default NULL; character vector; The features to show in the plot with full label names, not the letters.
only_select_show
default FALSE; whether only use the the select features in the parameter select_show_labels
.
sep
default "|"; the seperate character in the taxonomic information.
branch_size
default 0.2; numberic, size of branch.
alpha
default 0.2; shading of the color.
clade_label_size
default 2; basic size for the clade label; please also see clade_label_size_add
and clade_label_size_log
.
clade_label_size_add
default 5; added basic size for the clade label; see the formula in clade_label_size_log
parameter.
clade_label_size_log
default exp(1)
; the base of log
function for added size of the clade label; the size formula:
clade_label_size + log(clade_label_level + clade_label_size_add, base = clade_label_size_log)
;
so use clade_label_size_log
, clade_label_size_add
and clade_label_size
can totally control the label size for different taxonomic levels.
node_size_scale
default 1; scale for the node size.
node_size_offset
default 1; offset for the node size.
annotation_shape
default 22; shape used in the annotation legend.
annotation_shape_size
default 5; size used in the annotation legend.
ggplot.
\donttest{ t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL) }
print()
Print the trans_alpha object.
trans_diff$print()
clone()
The objects of this class are cloneable with this method.
trans_diff$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------
## Method `trans_diff$new`
## ------------------------------------------------
data(dataset)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus")
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
## ------------------------------------------------
## Method `trans_diff$plot_diff_abund`
## ------------------------------------------------
t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus")
t1$plot_diff_abund(use_number = 1:10)
t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
## ------------------------------------------------
## Method `trans_diff$plot_diff_bar`
## ------------------------------------------------
t1$plot_diff_bar(use_number = 1:20)
## ------------------------------------------------
## Method `trans_diff$plot_diff_cladogram`
## ------------------------------------------------
t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.