View source: R/calculate_diff_abundance.R
calculate_diff_abundance | R Documentation |
Performs differential abundance calculations and statistical hypothesis tests on data frames with protein, peptide or precursor data. Different methods for statistical testing are available.
calculate_diff_abundance( data, sample, condition, grouping, intensity_log2, missingness = missingness, comparison = comparison, mean = NULL, sd = NULL, n_samples = NULL, ref_condition = "all", filter_NA_missingness = TRUE, method = c("moderated_t-test", "t-test", "t-test_mean_sd", "proDA"), p_adj_method = "BH", retain_columns = NULL )
data |
a data frame containing at least the input variables that are required for the
selected method. Ideally the output of |
sample |
a character column in the |
condition |
a character or numeric column in the |
grouping |
a character column in the |
intensity_log2 |
a numeric column in the |
missingness |
a character column in the |
comparison |
a character column in the |
mean |
a numeric column in the |
sd |
a numeric column in the |
n_samples |
a numeric column in the |
ref_condition |
optional, character value providing the condition that is used as a
reference for differential abundance calculation. Only required for |
filter_NA_missingness |
a logical value, default is |
method |
a character value, specifies the method used for statistical hypothesis testing.
Methods include Welch test ( |
p_adj_method |
a character value, specifies the p-value correction method. Possible
methods are c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default
method is |
retain_columns |
a vector indicating if certain columns should be retained from the input
data frame. Default is not retaining additional columns |
A data frame that contains differential abundances (diff
), p-values (pval
)
and adjusted p-values (adj_pval
) for each protein, peptide or precursor (depending on
the grouping
variable) and the associated treatment/reference pair. Depending on the
method the data frame contains additional columns:
"t-test": The std_error
column contains the standard error of the differential
abundances. n_obs
contains the number of observations for the specific protein, peptide
or precursor (depending on the grouping
variable) and the associated treatment/reference pair.
"t-test_mean_sd": Columns labeled as control refer to the second condition of the
comparison pairs. Treated refers to the first condition. mean_control
and mean_treated
columns contain the means for the reference and treatment condition, respectively. sd_control
and sd_treated
columns contain the standard deviations for the reference and treatment
condition, respectively. n_control
and n_treated
columns contain the numbers of
samples for the reference and treatment condition, respectively. The std_error
column
contains the standard error of the differential abundances. t_statistic
contains the
t_statistic for the t-test.
"moderated_t-test": CI_2.5
and CI_97.5
contain the 2.5% and 97.5%
confidence interval borders for differential abundances. avg_abundance
contains average
abundances for treatment/reference pairs (mean of the two group means). t_statistic
contains the t_statistic for the t-test. B
The B-statistic is the log-odds that the
protein, peptide or precursor (depending on grouping
) has a differential abundance
between the two groups. Suppose B=1.5. The odds of differential abundance is exp(1.5)=4.48, i.e,
about four and a half to one. The probability that there is a differential abundance is
4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this group is differentially
abundant. A B-statistic of zero corresponds to a 50-50 chance that the group is differentially
abundant.n_obs
contains the number of observations for the specific protein, peptide or
precursor (depending on the grouping
variable) and the associated treatment/reference pair.
"proDA": The std_error
column contains the standard error of the differential
abundances. avg_abundance
contains average abundances for treatment/reference pairs
(mean of the two group means). t_statistic
contains the t_statistic for the t-test.
n_obs
contains the number of observations for the specific protein, peptide or precursor
(depending on the grouping
variable) and the associated treatment/reference pair.
For all methods execept "proDA"
, the p-value adjustment is performed only on the
proportion of data that contains a p-value that is not NA
. For "proDA"
the
p-value adjustment is either performed on the complete dataset (filter_NA_missingness = TRUE
)
or on the subset of the dataset with missingness that is not NA
(filter_NA_missingness = FALSE
).
set.seed(123) # Makes example reproducible # Create synthetic data data <- create_synthetic_data( n_proteins = 10, frac_change = 0.5, n_replicates = 4, n_conditions = 2, method = "effect_random", additional_metadata = FALSE ) # Assign missingness information data_missing <- assign_missingness( data, sample = sample, condition = condition, grouping = peptide, intensity = peptide_intensity_missing, ref_condition = "all", retain_columns = c(protein, change_peptide) ) # Calculate differential abundances # Using "moderated_t-test" and "proDA" improves # true positive recovery progressively diff <- calculate_diff_abundance( data = data_missing, sample = sample, condition = condition, grouping = peptide, intensity_log2 = peptide_intensity_missing, missingness = missingness, comparison = comparison, method = "t-test", retain_columns = c(protein, change_peptide) ) head(diff, n = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.