trans_diff | R Documentation |
trans_diff
object for the differential analysis on the taxonomic abundanceThis class is a wrapper for a series of differential abundance test and indicator analysis methods, including
LEfSe based on the Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>,
random forest <doi:10.1016/j.geoderma.2018.09.035>, metastat based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>,
non-parametric Kruskal-Wallis Rank Sum Test,
Dunn's Kruskal-Wallis Multiple Comparisons based on the FSA
package, Wilcoxon Rank Sum and Signed Rank Tests, t-test, anova,
Scheirer Ray Hare test,
R package metagenomeSeq
Paulson et al. (2013) <doi:10.1038/nmeth.2658>,
R package ANCOMBC
<doi:10.1038/s41467-020-17041-7>, R package ALDEx2
<doi:10.1371/journal.pone.0067019; 10.1186/2049-2618-2-15>,
R package MicrobiomeStat
<doi:10.1186/s13059-022-02655-5>, beta regression <doi:10.18637/jss.v034.i02>, R package maaslin2
,
linear mixed-effects model and generalized linear mixed model.
new()
trans_diff$new( dataset = NULL, method = c("lefse", "rf", "metastat", "metagenomeSeq", "KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare", "lm", "ancombc2", "ALDEx2_t", "ALDEx2_kw", "DESeq2", "edgeR", "linda", "maaslin2", "betareg", "lme", "glmm", "glmm_beta")[1], group = NULL, taxa_level = "all", filter_thres = 0, alpha = 0.05, p_adjust_method = "fdr", transformation = NULL, remove_unknown = TRUE, lefse_subgroup = NULL, lefse_min_subsam = 10, lefse_norm = 1e+06, nresam = 0.6667, boots = 30, rf_imp_type = 2, group_choose_paired = NULL, metagenomeSeq_count = 1, ALDEx2_sig = c("wi.eBH", "kw.eBH"), by_group = NULL, by_ID = NULL, beta_pseudo = .Machine$double.eps, ... )
dataset
default NULL; microtable
object.
method
default "lefse"; see the following available options:
LEfSe method based on Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>
random forest and non-parametric test method based on An et al. (2019) <doi:10.1016/j.geoderma.2018.09.035>
Metastat method for all paired groups based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>
zero-inflated log-normal model-based differential test method from metagenomeSeq
package.
KW: Kruskal-Wallis Rank Sum Test for all groups (>= 2)
Dunn's Kruskal-Wallis Multiple Comparisons when group number > 2; see dunnTest function in FSA
package
Wilcoxon Rank Sum and Signed Rank Tests for all paired groups
Student's t-Test for all paired groups
ANOVA for one-way or multi-factor analysis; see cal_diff
function of trans_alpha
class
Scheirer Ray Hare test for nonparametric test used for a two-way factorial experiment;
see scheirerRayHare
function of rcompanion
package
Linear Model based on the lm
function
runs Welch's t and Wilcoxon tests with ALDEx2
package; see also the test parameter in ALDEx2::aldex
function;
ALDEx2 uses the centred log-ratio (clr) transformation and estimates per-feature technical variation within each sample using Monte-Carlo instances
drawn from the Dirichlet distribution; Reference: <doi:10.1371/journal.pone.0067019> and <doi:10.1186/2049-2618-2-15>;
require ALDEx2
package to be installed
(https://bioconductor.org/packages/release/bioc/html/ALDEx2.html)
runs Kruskal-Wallace and generalized linear model (glm) test with ALDEx2
package;
see also the test
parameter in ALDEx2::aldex
function.
Differential expression analysis based on the Negative Binomial (a.k.a. Gamma-Poisson) distribution based on the DESeq2
package.
The exactTest
method of edgeR
package is implemented.
Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC)
based on the ancombc2
function from ANCOMBC
package.
If the fix_formula
parameter is not provided, the function can automatically assign it by using group parameter.
For this method, the group
parameter is directly passed to the group parameter of ancombc2
function.
Reference: <doi:10.1038/s41467-020-17041-7><10.1038/s41592-023-02092-7>; Require ANCOMBC
package to be installed
(https://bioconductor.org/packages/release/bioc/html/ANCOMBC.html)
Linear Model for Differential Abundance Analysis of High-dimensional Compositional Data
based on the linda
function of MicrobiomeStat
package.
For linda method, please provide either the group parameter or the formula parameter.
When the formula parameter is provided, it should start with '~' as it is directly used by the linda function.
If the group parameter is used, the prefix '~' is not necessary as the function can automatically add it.
The parameter feature.dat.type = 'count'
has been fixed. Other parameters can be passed to the linda
function.
Reference: <doi:10.1186/s13059-022-02655-5>
finding associations between metadata and potentially high-dimensional microbial multi-omics data based on the Maaslin2 package.
Using this option can invoke the trans_env$cal_cor
function with cor_method = "maaslin2"
.
Beta Regression based on the betareg
package.
Please see the beta_pseudo
parameter for the use of pseudo value when there is 0 or 1 in the data
Linear Mixed Effect Model based on the lmerTest
package.
In the return table, the significance of fixed factors are tested by function anova
.
The significance of 'Estimate' in each term of fixed factors comes from the model.
Generalized linear mixed model (GLMM) based on the glmmTMB
package.
The formula
and family
parameters are needed.
Please refer to glmmTMB package to select the family function, e.g. family = glmmTMB::lognormal(link = "log")
.
The usage of formula is similar with that in 'lme' method.
For more available parameters, please see glmmTMB::glmmTMB
function and use parameter passing.
In the result, Conditional R2 and Marginal R2 represent the variance explained by both fixed and random effects and the variance explained by
fixed effects, respectively. For more details on R2 calculation, please refer to the article <doi: 10.1098/rsif.2017.0213>.
The significance of fixed factors are tested by Chi-square test from function car::Anova
.
The significance of 'Estimate' in each term of fixed factors comes from the model.
Generalized linear mixed model with a family function of beta distribution,
developed for the relative abundance (ranging from 0 to 1) of taxa specifically.
This is an extension of the GLMM model in 'glmm'
option.
The only difference is in glmm_beta
the family function is fixed with the beta distribution function,
i.e. family = glmmTMB::beta_family(link = "logit")
.
Please see the beta_pseudo
parameter for the use of pseudo value when there is 0 or 1 in the data
group
default NULL; sample group used for the comparision; a colname of input microtable$sample_table
;
It is necessary when method is not "anova" or method is "anova" but formula is not provided.
Once group is provided, the return res_abund will have mean and sd values for group.
taxa_level
default "all"; 'all' represents using abundance data at all taxonomic ranks;
For testing at a specific rank, provide taxonomic rank name, such as "Genus".
If the provided taxonomic name is neither 'all' nor a colname in tax_table of input dataset,
the function will use the features in input microtable$otu_table
automatically.
filter_thres
default 0; the abundance threshold, such as 0.0005 when the input is relative abundance; only available when method != "metastat". The features with abundances lower than filter_thres will be filtered.
alpha
default 0.05; significance threshold to select taxa when method is "lefse" or "rf";
or used to generate significance letters when method is 'anova' or 'KW_dunn' like the alpha parameter in cal_diff
of trans_alpha
class.
p_adjust_method
default "fdr"; p.adjust method; see method parameter of p.adjust
function for other available options;
"none" means disable p value adjustment; So when p_adjust_method = "none"
, P.adj is same with P.unadj.
transformation
default NULL; feature abundance transformation method in the class trans_norm
,
such as 'AST' for the arc sine square root transformation.
Only available when method
is one of "KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare", "betareg" and "lme".
remove_unknown
default TRUE; whether remove unknown features that donot have clear classification information.
lefse_subgroup
default NULL; sample sub group used for sub-comparision in lefse; Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>.
lefse_min_subsam
default 10; sample numbers required in the subgroup test.
lefse_norm
default 1000000; normalization value used in lefse to scale abundances for each level.
A lefse_norm
value < 0 (e.g., -1) means no normalization same with the LEfSe python version.
nresam
default 0.6667; sample number ratio used in each bootstrap for method = "lefse" or "rf".
boots
default 30; bootstrap test number for method = "lefse" or "rf".
rf_imp_type
default 2; the type of feature importance in random forest when method = "rf"
.
Same with type
parameter in importance
function of randomForest
package.
1=mean decrease in accuracy (MeanDecreaseAccuracy), 2=mean decrease in node impurity (MeanDecreaseGini).
group_choose_paired
default NULL; a vector used for selecting the required groups for paired testing instead of all paired combinations across groups; Available when method is "metastat", "metagenomeSeq", "ALDEx2_t" or "edgeR".
metagenomeSeq_count
default 1; Filter features to have at least 'counts' counts.; see the count parameter in MRcoefs function of metagenomeSeq
package.
ALDEx2_sig
default c("wi.eBH", "kw.eBH"); which column of the final result is used as the significance asterisk assignment; applied to method = "ALDEx2_t" or "ALDEx2_kw"; the first element is provided to "ALDEx2_t"; the second is provided to "ALDEx2_kw"; for "ALDEx2_t", the available choice is "wi.eBH" (Expected Benjamini-Hochberg corrected P value of Wilcoxon test) and "we.eBH" (Expected BH corrected P value of Welch's t test); for "ALDEx2_kw"; for "ALDEx2_t", the available choice is "kw.eBH" (Expected BH corrected P value of Kruskal-Wallace test) and "glm.eBH" (Expected BH corrected P value of glm test).
by_group
default NULL; a column of sample_table used to perform the differential test
among groups (group
parameter) for each group (by_group
parameter). So by_group
has a higher level than group
parameter.
Same with the by_group
parameter in trans_alpha
class.
Only available when method is one of c("KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare")
.
by_ID
default NULL; a column of sample_table used to perform paired t test or paired wilcox test for the paired data,
such as the data of plant compartments for different plant species (ID).
So by_ID
in sample_table should be the smallest unit of sample collection without any repetition in it.
Same with the by_ID
parameter in trans_alpha class.
beta_pseudo
default .Machine$double.eps; the pseudo value used when the parameter method
is 'betareg'
or 'glmm_beta'
.
As the beta distribution function limits 0 < response value < 1, a pseudo value will be added for the data that equal to 0.
The data that equal to 1 will be replaced by 1/(1 + beta_pseudo)
.
...
parameters passed to cal_diff
function of trans_alpha
class when method is one of
"KW", "KW_dunn", "wilcox", "t.test", "anova", "betareg", "lme", "glmm" or "glmm_beta";
passed to randomForest::randomForest
function when method = "rf";
passed to ANCOMBC::ancombc2
function when method is "ancombc2" (except tax_level, global and fix_formula parameters);
passed to ALDEx2::aldex
function when method = "ALDEx2_t" or "ALDEx2_kw";
passed to DESeq2::DESeq
function when method = "DESeq2";
passed to MicrobiomeStat::linda
function when method = "linda";
passed to trans_env$cal_cor
function when method = "maaslin2".
res_diff and res_abund.
res_abund includes mean abundance of each taxa (Mean), standard deviation (SD), standard error (SE) and sample number (N) in the group (Group).
res_diff is the detailed differential test result depending on the method choice, may containing:
"Comparison": The groups for the comparision, maybe all groups or paired groups. If this column is not found, all groups are used;
"Group": Which group has the maximum median or mean value across the test groups;
For non-parametric methods, median value; For t.test, mean value;
"Taxa": which taxa is used in this comparision;
"Method": Test method used in the analysis depending on the method input;
"LDA" or others: LDA: linear discriminant score in LEfSe;
MeanDecreaseAccuracy and MeanDecreaseGini: mean decreasing in accuracy or in node impurity (gini index) in random forest;
"P.unadj": original p value;
"P.adj": adjusted p value;
"Estimate" and "Std.Error": When method is "betareg", "lm", "lme" or "glmm",
"Estimate" and "Std.Error" represent fitted coefficient and its standard error, respectively;
Others: qvalue: qvalue in metastat analysis.
\donttest{ data(dataset) t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group") t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group") t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus") t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group") }
plot_diff_abund()
Plot the abundance of taxa.
The significance can be optionally added in the plot. The taxa displayed are based on the taxa in the 'res_diff' table, selected using parameters. If the user filters out the non-significant taxa from the 'res_diff' table, these taxa will also be filtered from the plot.
trans_diff$plot_diff_abund( use_number = 1:10, color_values = RColorBrewer::brewer.pal(8, "Dark2"), select_taxa = NULL, simplify_names = TRUE, keep_prefix = TRUE, group_order = NULL, order_x_mean = TRUE, coord_flip = TRUE, add_sig = TRUE, xtext_angle = 45, xtext_size = 13, ytitle_size = 17, ... )
use_number
default 1:10; numeric vector; the sequences of taxa (1:n) selected in the plot; If n is larger than the number of total significant taxa, automatically use the total number as n.
color_values
default RColorBrewer::brewer.pal
(8, "Dark2"); color pallete for groups.
select_taxa
default NULL; character vector to provide taxa names.
The taxa names should be same with the names shown in the plot, not the 'Taxa' column names in object$res_diff$Taxa
.
simplify_names
default TRUE; whether use the simplified taxonomic name.
keep_prefix
default TRUE; whether retain the taxonomic prefix.
group_order
default NULL; a vector to order groups, i.e. reorder the legend and colors in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, overlook the levels in the group of sample_table.
order_x_mean
default TRUE; whether order x axis by the means of groups from large to small.
coord_flip
default TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.
add_sig
default TRUE; whether add the significance label to the plot.
xtext_angle
default 45; number (e.g. 45). Angle of text in x axis.
xtext_size
default 13; x axis text size. NULL means the default size in ggplot2. If coord_flip = TRUE
, it represents the text size of the y axis.
ytitle_size
default 17; y axis title size. If coord_flip = TRUE
, it represents the title size of the x axis (i.e. "Relative abundance").
...
parameters passed to plot_alpha
function of trans_alpha
class.
ggplot.
\donttest{ t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus") t1$plot_diff_abund(use_number = 1:10) t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE) t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group") t1$plot_diff_abund(use_number = 1:20) t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE) t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group") t1$plot_diff_abund(use_number = 1:20) t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE) }
plot_diff_bar()
Bar plot for indicator index, such as LDA score and P value.
trans_diff$plot_diff_bar( color_values = RColorBrewer::brewer.pal(8, "Dark2"), color_group_map = FALSE, use_number = 1:10, threshold = NULL, select_group = NULL, keep_full_name = FALSE, keep_prefix = TRUE, group_order = NULL, group_aggre = TRUE, group_two_sep = TRUE, coord_flip = TRUE, add_sig = FALSE, add_sig_increase = 0.1, add_sig_text_size = 5, xtext_angle = 45, xtext_size = 10, axis_text_y = 12, heatmap_cell = "P.unadj", heatmap_sig = "Significance", heatmap_x = "Factors", heatmap_y = "Taxa", heatmap_lab_fill = "P value", ... )
color_values
default RColorBrewer::brewer.pal
(8, "Dark2"); colors palette for different groups.
color_group_map
default FALSE; whether match the colors to groups in order to fix the color in each group when part of groups are not shown in the plot.
When color_group_map = TRUE
, the group_order inside the object will be used as full groups set to guide the color extraction.
use_number
default 1:10; numeric vector; the taxa numbers used in the plot, i.e. 1:n.
threshold
default NULL; threshold value of indicators for selecting taxa, such as 3 for LDA score of LEfSe.
select_group
default NULL; this is used to select the paired group when multiple comparisions are generated;
The input select_group must be one of object$res_diff$Comparison
.
keep_full_name
default FALSE; whether keep the taxonomic full lineage names.
keep_prefix
default TRUE; whether retain the taxonomic prefix, such as "g__".
group_order
default NULL; a vector to order the legend and colors in plot;
If NULL, the function can first determine whether the group column of microtable$sample_table
is factor. If yes, use the levels in it.
If provided, this parameter can overwrite the levels in the group of microtable$sample_table
.
group_aggre
default TRUE; whether aggregate the features for each group.
group_two_sep
default TRUE; whether display the features of two groups on opposite sides of the coordinate axes when there are only two groups in total.
coord_flip
default TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.
add_sig
default FALSE; whether add significance label (asterisk) above the bar.
add_sig_increase
default 0.1; the axis position (Value + add_sig_increase * max(Value)
) from which to add the significance label;
only available when add_sig = TRUE
.
add_sig_text_size
default 5; the size of added significance label; only available when add_sig = TRUE
.
xtext_angle
default 45; number ranging from 0 to 90; used to make x axis text generate angle to reduce text overlap; only available when coord_flip = FALSE.
xtext_size
default 10; the text size of x axis.
axis_text_y
default 12; the size for the y axis text.
heatmap_cell
default "P.unadj"; the column of data for the cell of heatmap when formula with multiple factors is found in the method.
heatmap_sig
default "Significance"; the column of data for the significance label of heatmap.
heatmap_x
default "Factors"; the column of data for the x axis of heatmap.
heatmap_y
default "Taxa"; the column of data for the y axis of heatmap.
heatmap_lab_fill
default "P value"; legend title of heatmap.
...
parameters passing to geom_bar
for the bar plot or
plot_cor
function in trans_env
class for the heatmap of multiple factors when formula is found in the method.
ggplot.
\donttest{ t1$plot_diff_bar(use_number = 1:20) }
plot_diff_cladogram()
Plot the cladogram using taxa with significant difference.
trans_diff$plot_diff_cladogram( color = RColorBrewer::brewer.pal(8, "Dark2"), group_order = NULL, use_taxa_num = 200, filter_taxa = NULL, use_feature_num = NULL, clade_label_level = 4, select_show_labels = NULL, only_select_show = FALSE, sep = "|", branch_size = 0.2, alpha = 0.2, clade_label_size = 2, clade_label_size_add = 5, clade_label_size_log = exp(1), node_size_scale = 1, node_size_offset = 1, annotation_shape = 22, annotation_shape_size = 5 )
color
default RColorBrewer::brewer.pal
(8, "Dark2"); color palette used in the plot.
group_order
default NULL; a vector to order the legend in plot;
If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it.
If provided, this parameter can overwrite the levels in the group of sample_table.
If the number of provided group_order is less than the number of groups in res_diff$Group
, the function will select the groups of group_order automatically.
use_taxa_num
default 200; integer; The taxa number used in the background tree plot; select the taxa according to the mean abundance .
filter_taxa
default NULL; The mean relative abundance used to filter the taxa with low abundance.
use_feature_num
default NULL; integer; The feature number used in the plot; select the features according to the metric (method = "lefse" or "rf") from high to low.
clade_label_level
default 4; the taxonomic level for marking the label with letters, root is the largest.
select_show_labels
default NULL; character vector; The features to show in the plot with full label names, not the letters.
only_select_show
default FALSE; whether only use the the select features in the parameter select_show_labels
.
sep
default "|"; the seperate character in the taxonomic information.
branch_size
default 0.2; numberic, size of branch.
alpha
default 0.2; shading of the color.
clade_label_size
default 2; basic size for the clade label; please also see clade_label_size_add
and clade_label_size_log
.
clade_label_size_add
default 5; added basic size for the clade label; see the formula in clade_label_size_log
parameter.
clade_label_size_log
default exp(1)
; the base of log
function for added size of the clade label; the size formula:
clade_label_size + log(clade_label_level + clade_label_size_add, base = clade_label_size_log)
;
so use clade_label_size_log
, clade_label_size_add
and clade_label_size
can totally control the label size for different taxonomic levels.
node_size_scale
default 1; scale for the node size.
node_size_offset
default 1; offset for the node size.
annotation_shape
default 22; shape used in the annotation legend.
annotation_shape_size
default 5; size used in the annotation legend.
ggplot.
\dontrun{ t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL) }
print()
Print the trans_alpha object.
trans_diff$print()
clone()
The objects of this class are cloneable with this method.
trans_diff$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------
## Method `trans_diff$new`
## ------------------------------------------------
data(dataset)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus")
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
## ------------------------------------------------
## Method `trans_diff$plot_diff_abund`
## ------------------------------------------------
t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus")
t1$plot_diff_abund(use_number = 1:10)
t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
## ------------------------------------------------
## Method `trans_diff$plot_diff_bar`
## ------------------------------------------------
t1$plot_diff_bar(use_number = 1:20)
## ------------------------------------------------
## Method `trans_diff$plot_diff_cladogram`
## ------------------------------------------------
## Not run:
t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.