test_diff: Identify differentially abundant proteins

View source: R/test_diff.R

test_diffR Documentation

Identify differentially abundant proteins

Description

The ‘test_diff()' function is used to test coefficients of a ’proDAFit' object. It provides a Wald test to test individual coefficients and a likelihood ratio F-test to compare the original model with a reduced model. The result_names method provides a quick overview which coefficients are available for testing.

Usage

test_diff(
  fit,
  contrast,
  reduced_model = ~1,
  alternative = c("two.sided", "greater", "less"),
  pval_adjust_method = "BH",
  sort_by = NULL,
  decreasing = FALSE,
  n_max = Inf,
  verbose = FALSE
)

## S4 method for signature 'proDAFit'
result_names(fit)

Arguments

fit

an object of class 'proDAFit'. Usually, this is produced by calling proDA()

contrast

an expression or a string specifying which contrast is tested. It can be a single coefficient (to see the available options use result_names(fit)) or any linear combination of them. The contrast is always compared against zero. Thus, to find out if two coefficients differ use coef1 - coef2. Remember if the coefficient is not a valid identifier in R, to escape it using back ticks. For example if you test the interaction of A and B use `A:B`.

reduced_model

If you don't want to test an individual coefficient, you can can specify a reduced model and compare it with the original model using an F-test. This is useful to find out how a set of parameters affect the goodness of the fit. If neither a contrast, nor a reduced_model is specified, by default a comparison with an intercept model (ie. just the average across conditions) is done. Default: ~ 1.

alternative

a string that decides how the hypothesis test is done. This parameter is only relevant for the Wald-test specified using the 'contrast' argument. Default: "two.sided"

pval_adjust_method

a string the indicates the method that is used to adjust the p-value for the multiple testing. It must match the options in p.adjust. Default: "BH"

sort_by

a string that specifies the column that is used to sort the resulting data.frame. Default: NULL which means the result is sorted by the order of the input matrix.

decreasing

a boolean to indicate if the order is reversed. Default: FALSE

n_max

the maximum number of rows returned by the method. Default: Inf

verbose

boolean that signals if the method prints informative messages. Default: FALSE.

Details

To test if coefficient is different from zero with a Wald test use the contrast function argument. To test if two models differ with an F-test use the reduced_model argument. Depending on the test that is conducted, the functions returns slightly different data.frames.

The function is designed to follow the principles of the base R test functions (ie. t.test and wilcox.test) and the functions designed for collecting the results of high-throughput testing (ie. limma::topTable and DESeq2::results).

Value

The 'result_names()' function returns a character vector.

The 'test_diff()' function returns a data.frame with one row per protein with the key parameters of the statistical test. Depending what kind of test (Wald or F test) the content of the 'data.frame' differs.

The Wald test, which can considered equivalent to a t-test, returns a 'data.frame' with the following columns:

name

the name of the protein, extracted from the rowname of the input matrix

pval

the p-value of the statistical test

adj_pval

the multiple testing adjusted p-value

diff

the difference that particular coefficient makes. In differential expression analysis this value is also called log fold change, which is equivalent to the difference on the log scale.

t_statistic

the diff divided by the standard error se

se

the standard error associated with the diff

df

the degrees of freedom, which describe the amount of available information for estimating the se. They are the sum of the number of samples the protein was observed in, the amount of information contained in the missing values, and the degrees of freedom of the variance prior.

avg_abundance

the estimate of the average abundance of the protein across all samples.

n_approx

the approximated information available for estimating the protein features, expressed as multiple of the information contained in one observed value.

n_obs

the number of samples a protein was observed in

The F-test returns a 'data.frame' with the following columns

name

the name of the protein, extracted from the rowname of the input matrix

pval

the p-value of the statistical test

adj_pval

the multiple testing adjusted p-value

f_statistic

the ratio of difference of normalized deviances from original model and the reduced model, divided by the standard deviation.

df1

the difference of the number of coefficients in the original model and the number of coefficients in the reduced model

df2

the degrees of freedom, which describe the amount of available information for estimating the se. They are the sum of the number of samples the protein was observed in, the amount of information contained in the missing values, and the degrees of freedom of the variance prior.

avg_abundance

the estimate of the average abundance of the protein across all samples.

n_approx

the information available for estimating the protein features, expressed as multiple of the information contained in one observed value.

n_obs

the number of samples a protein was observed in

See Also

The contrast argument is inspired by limma::makeContrasts.

Examples

  # "t-test"
  syn_data <- generate_synthetic_data(n_proteins = 10)
  fit <- proDA(syn_data$Y, design = syn_data$groups)
  result_names(fit)
  test_diff(fit, Condition_1 - Condition_2)


  suppressPackageStartupMessages(library(SummarizedExperiment))
  se <- generate_synthetic_data(n_proteins = 10,
                                n_conditions = 3,
                                return_summarized_experiment = TRUE)
  colData(se)$age <- rnorm(9, mean=45, sd=5)
  colData(se)
  fit <- proDA(se, design = ~ group + age)
  result_names(fit)
  test_diff(fit, "groupCondition_2",
            n_max = 3, sort_by = "pval")

  # F-test
  test_diff(fit, reduced_model = ~ group)



const-ae/proDA documentation built on Oct. 31, 2023, 9:39 p.m.