View source: R/calculate_diff_abundance.R
calculate_diff_abundance | R Documentation |
Performs differential abundance calculations and statistical hypothesis tests on data frames with protein, peptide or precursor data. Different methods for statistical testing are available.
calculate_diff_abundance(
data,
sample,
condition,
grouping,
intensity_log2,
missingness = missingness,
comparison = comparison,
mean = NULL,
sd = NULL,
n_samples = NULL,
ref_condition = "all",
filter_NA_missingness = TRUE,
method = c("moderated_t-test", "t-test", "t-test_mean_sd", "proDA"),
p_adj_method = "BH",
retain_columns = NULL
)
data |
a data frame containing at least the input variables that are required for the
selected method. Ideally the output of |
sample |
a character column in the |
condition |
a character or numeric column in the |
grouping |
a character column in the |
intensity_log2 |
a numeric column in the |
missingness |
a character column in the |
comparison |
a character column in the |
mean |
a numeric column in the |
sd |
a numeric column in the |
n_samples |
a numeric column in the |
ref_condition |
optional, character value providing the condition that is used as a
reference for differential abundance calculation. Only required for |
filter_NA_missingness |
a logical value, default is |
method |
a character value, specifies the method used for statistical hypothesis testing.
Methods include Welch test ( |
p_adj_method |
a character value, specifies the p-value correction method. Possible
methods are c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default
method is |
retain_columns |
a vector indicating if certain columns should be retained from the input
data frame. Default is not retaining additional columns |
A data frame that contains differential abundances (diff
), p-values (pval
)
and adjusted p-values (adj_pval
) for each protein, peptide or precursor (depending on
the grouping
variable) and the associated treatment/reference pair. Depending on the
method the data frame contains additional columns:
"t-test": The std_error
column contains the standard error of the differential
abundances. n_obs
contains the number of observations for the specific protein, peptide
or precursor (depending on the grouping
variable) and the associated treatment/reference pair.
"t-test_mean_sd": Columns labeled as control refer to the second condition of the
comparison pairs. Treated refers to the first condition. mean_control
and mean_treated
columns contain the means for the reference and treatment condition, respectively. sd_control
and sd_treated
columns contain the standard deviations for the reference and treatment
condition, respectively. n_control
and n_treated
columns contain the numbers of
samples for the reference and treatment condition, respectively. The std_error
column
contains the standard error of the differential abundances. t_statistic
contains the
t_statistic for the t-test.
"moderated_t-test": CI_2.5
and CI_97.5
contain the 2.5% and 97.5%
confidence interval borders for differential abundances. avg_abundance
contains average
abundances for treatment/reference pairs (mean of the two group means). t_statistic
contains the t_statistic for the t-test. B
The B-statistic is the log-odds that the
protein, peptide or precursor (depending on grouping
) has a differential abundance
between the two groups. Suppose B=1.5. The odds of differential abundance is exp(1.5)=4.48, i.e,
about four and a half to one. The probability that there is a differential abundance is
4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this group is differentially
abundant. A B-statistic of zero corresponds to a 50-50 chance that the group is differentially
abundant.n_obs
contains the number of observations for the specific protein, peptide or
precursor (depending on the grouping
variable) and the associated treatment/reference pair.
"proDA": The std_error
column contains the standard error of the differential
abundances. avg_abundance
contains average abundances for treatment/reference pairs
(mean of the two group means). t_statistic
contains the t_statistic for the t-test.
n_obs
contains the number of observations for the specific protein, peptide or precursor
(depending on the grouping
variable) and the associated treatment/reference pair.
For all methods execept "proDA"
, the p-value adjustment is performed only on the
proportion of data that contains a p-value that is not NA
. For "proDA"
the
p-value adjustment is either performed on the complete dataset (filter_NA_missingness = TRUE
)
or on the subset of the dataset with missingness that is not NA
(filter_NA_missingness = FALSE
).
set.seed(123) # Makes example reproducible
# Create synthetic data
data <- create_synthetic_data(
n_proteins = 10,
frac_change = 0.5,
n_replicates = 4,
n_conditions = 2,
method = "effect_random",
additional_metadata = FALSE
)
# Assign missingness information
data_missing <- assign_missingness(
data,
sample = sample,
condition = condition,
grouping = peptide,
intensity = peptide_intensity_missing,
ref_condition = "all",
retain_columns = c(protein, change_peptide)
)
# Calculate differential abundances
# Using "moderated_t-test" and "proDA" improves
# true positive recovery progressively
diff <- calculate_diff_abundance(
data = data_missing,
sample = sample,
condition = condition,
grouping = peptide,
intensity_log2 = peptide_intensity_missing,
missingness = missingness,
comparison = comparison,
method = "t-test",
retain_columns = c(protein, change_peptide)
)
head(diff, n = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.