fdge: Peform a differential expression analysis.
In facilebio/FacileAnalysis: Modularized and interactive analyses over a FacileDataStore

View source: R/fdge-base.R

fdge	R Documentation

Peform a differential expression analysis.

Description

Use flm_def() to define the design matrix and contrast to test and pass the FacileLinearModelDefinition object returned from that to fdge() to run the desired differential testing framework (dictated by the method parameter) over the data. flm_def accepts a

Usage

fdge(
  x,
  assay_name = NULL,
  method = NULL,
  features = NULL,
  filter = "default",
  with_sample_weights = FALSE,
  treat_lfc = NULL,
  ...,
  metadata = list(),
  verbose = FALSE
)

## S3 method for class 'FacileAnovaModelDefinition'
fdge(
  x,
  assay_name = NULL,
  method = NULL,
  features = NULL,
  filter = "default",
  with_sample_weights = FALSE,
  ...,
  metadata = list(),
  verbose = FALSE
)

## S3 method for class 'FacileTtestDGEModelDefinition'
fdge(
  x,
  assay_name = NULL,
  method = NULL,
  features = NULL,
  filter = "default",
  with_sample_weights = FALSE,
  treat_lfc = NULL,
  ...,
  metadata = list(),
  verbose = FALSE
)

## S3 method for class 'FacileLinearModelDefinition'
fdge(
  x,
  assay_name = NULL,
  method = NULL,
  features = NULL,
  filter = "default",
  with_sample_weights = FALSE,
  treat_lfc = NULL,
  weights = NULL,
  with_box = FALSE,
  ...,
  biocbox = NULL,
  trend.eBayes = FALSE,
  robust.eBayes = FALSE,
  metadata = list(),
  verbose = FALSE
)

## S3 method for class 'FacileTtestAnalysisResult'
compare(x, y, treat_lfc = param(x, "treat_lfc"), rerun = TRUE, ...)

## S3 method for class 'FacileTtestComparisonAnalysisResult'
tidy(
  x,
  max_padj = 0.1,
  min_logFC = NULL,
  labels = NULL,
  ...,
  max_padj_x = max_padj,
  max_padj_y = max_padj,
  min_logFC_x = min_logFC,
  min_logFC_y = min_logFC
)

## S3 method for class 'FacileTtestAnalysisResult'
report(
  x,
  type = c("dge", "features"),
  ntop = 200,
  max_padj = 0.1,
  min_logFC = 1,
  features = NULL,
  highlight = NULL,
  round_digits = 3,
  event_source = "A",
  webgl = TRUE,
  caption = NULL,
  ...
)

## S3 method for class 'FacileTtestComparisonAnalysisResult'
ffsea(
  x,
  fsets,
  methods = NULL,
  type = c("interaction", "quadrants"),
  features = NULL,
  min_logFC = param(x, "treat_lfc"),
  max_padj = 0.1,
  rank_by = "logFC",
  signed = TRUE,
  biased_by = NULL,
  ...,
  rank_order = "ranked",
  group_by = "direction",
  select_by = "significant"
)

Arguments

`x`	a data source
`assay_name`	the name of the assay that holds the measurements for test. Defaults to `default_assay(x)`.
`method`	The differential testing framework to use over the data. The valid choices are defined by the type of assay `assay_name`is. Refer to the Differential Expression Testing Methods section for more details.
`features`	Explicitly request the features to test. If this is provided, then the `filter` criteria (to remove low abundance features, for instance) is skipped. By default this is `NULL`
`filter`, `with_sample_weights`	Passed into `biocbox()` to determine which features (genes) are removed from the dataset for testing, as well as if to use `limma::arrayWeights()` or `limma::voomWithQualityWeights()` (where appropriate) when testing (default is not to).
`treat_lfc`	If this is numeric, this activates limma's "treat" functionality and tests for differential expression against this specified log2 fold change threshold.
`...`	passed down into inner methods, such as `biocbox` to tweak filtering criteria, for instance
`weights`	a `⁠sample_id,feature_id,weight⁠` table of observation weights to use when `method == "limma"`.
`rerun`	When comparing two results, the features analyzed in each may differ, making comparisons between the two objects sparse, at times. When `rerun = TRUE` (default), the original linear models are rerun with their features set to `union(features(x), features(y))`.

Differential Expression Testing Methods

The appropriate statistical framework to use for differential expression testing is defined by the type of data that is recorded in the assay assay_name, ie. assay_info(x, assay_name)$assay_type.

The fdge_methods() function returns a tibble of appropriate assay_type -> dge_method associations. The first entry for each dge_method is the default method used if one isn't provided by the caller.

The available methods are:

"voom": For count data, uses limma::voomWithQualityWeights() when with_sample_weights = TRUE.
"edgeR-qlf": The edgeR quasi-likelihood method, for count data.
"limma-trend": Usable for log-transformed data that "looks like" it came from count data, or where there is a "trend" of the variance with the mean, uses limma::arrayWeights() when with_sample_weights = TRUE.
"limma": Straightup limma, this expects log2-normal like data, with (largely) no trend of the variance to the mean worth modeling. Uses limma::arrayWeights() when with_sample_weights = TRUE

Feature Filtering Strategy

You will almost always want to filter out lowly abundant features before performing differential expression analysis. You can either do this by explicitly requesting which features to test via the features parameter, or by setting ⁠filter = "default⁠.

When filter == "default", the filtering strategy is largely based on the logic found in edgeR::filterByExpr().

When fdge analysis is performed on count data, the filtering is precisely executed using this function, using design(x) as the design parameter to filterByExpr. You can modify the filtering behavior by passing some named parameters found in the edgeR::filterByExpr() function down to it via fdge's ... parameter, with the exception of the design paremeter, because this is already defined. Use:

filter_min_count for the edgeR::filterByExpr(min.count) parameter
filter_min_total_count for edgeR::filterByExpr(min.total.count)
filter_large_n for edgeR::filterByExpr(large.n); and
filter_min_prop for edgeR::filterByExpr(min.prop).

There are times when you want to tweak this behavior in ways that aren't exactly supported by filterByExpr. You can pass in a "feature descriptor" (a character vector of feature ids, or a data.frame with a "feature_id" column) into the following parameters:

filter_universe: The features enumerated in this parameter will restrict the universe of features that can potentially be included in the downstream analysis. The filterByExpr() logic will happen downstream of this universe. The default value is NULL, which specifies the universe of features to be all of the ones measured using this assay.
filter_require: The filterByExpr logic happens first on the universe of features as parameterized. All features enumerated here will be forcibly included in the analysis, irrespective of whether they would have passed the perscribed filter criteria or not. The defalut value for this argument is NULL, which means no genes are forcibly included in the analysis when they do not pass muster given the filtering criteria.

Comparing DGE Results (interaction effect)

It is often useful to compare the results of two t-tests, and for many experimental designs, this can be a an intuitive way to perform test an interaction effect.

The filtering strategy in the interaction model dictates that the union of all features found in x are y are used in the test.

Statistics Tables

The stats table from the differential expression analysis can be retrieved using the tidy() function. Depending upon the type of analysis run, the exact columns of the returned table may differ.

Interaction Statistics Calling tidy() on an interaction test result (FacileTtestComparisonAnalysisResult), returns the statistics for the interaction test itself (if it was performed), as well as the statistics for the individual DGE results that were run vai compare(x, y) to get the interaction results itself. The columns of statistics related to the individual tests will be suffixed with ⁠*.x⁠ and ⁠*.y⁠, respectively.

An interaction_group column will also be added to indicate what type of statisticaly significance was found for each gene. The values in here can be:

"both": this gene was statistically significant in both tests
"x": this gene was only significant in the x dge result
"y": this gene was only significant in the y dge result
"none": was not statistically significant in either test result

The genes selected for significance from the input results x and y are based on their padj and logFC values. The thresholds are tweaked by the following parameters in the call to tidy(compare(x,y)):

max_padj_(x|y): if not specified, defaults to 0.10
min_logFC_(x|y): if not specified, we will take the value that was used in the treat_lfc parameters to x and y if those tests were run against a threshold, otherwise defaults to 1.

Interacting with results

The report function will create an htmlwidget which can be explored by the analyst or dropped into an Rmarkdown report.

report(result, "dge", max_padj = 0.05, min_logFC = 1) will create a side-by-side volcano and datatable for differential expression results.

Gene Set Enrichment Analysis

There are a few ways you may consider running a gene set analysis over an interaction analysis.

On the statistics of the interaction itself; or
On the statistics of the different "quadrants" the features are binned into that are found in the sigclass columns of the tidied result table, ie. tidy.FacileTtestComparisonAnalysisResult(); or
Both.

Note that an analysis on (2) only lends itself to an overrepresentation analysis, ie. methods = "ora".

Examples

efds <- FacileData::exampleFacileDataSet()
samples <- efds |>
  FacileData::filter_samples(indication == "BLCA") |>
  FacileData::with_sample_covariates() |> 
  dplyr::mutate(something = sample(c("a", "b"), dplyr::n(), replace = TRUE))
mdef <- flm_def(samples, covariate = "sample_type",
                numer = "tumor", denom = "normal",
                batch = "sex",
                metadata = list(label = "test flm"))
dge <- fdge(mdef, method = "voom", metadata = list(label = "test dge"))
if (interactive()) {
  viz(dge)
  viz(dge, "146909")
  shine(dge)
}
dge.stats <- tidy(dge)
dge.sig <- signature(dge)

stage.anova <- samples |>
  flm_def(covariate = "stage", batch = "sex") |>
  fdge(method = "voom")
anova.sig <- signature(stage.anova)
# Comparing two T-test results ----------------------------------------------
# Let's compare the tumor vs normal DGE results in CRC vs BLCA

efds <- FacileData::exampleFacileDataSet()
dge.crc <- FacileData::filter_samples(efds, indication == "CRC") |>
  flm_def("sample_type", "tumor", "normal", "sex") |>
  fdge()
dge.blca <- FacileData::filter_samples(efds, indication == "BLCA") |>
  flm_def("sample_type", "tumor", "normal", "sex") |>
  fdge()
dge.comp <- compare(dge.crc, dge.blca)
comp.hi <- tidy(dge.comp) |>
  dplyr::group_by(interaction_group) |>
  dplyr::slice(1:3) |>
  dplyr::ungroup()
# Static visualization generates the main "4-way" plot, as well as the
# facets for each category.
sviz <- viz(dge.comp, labels = c(x = "CRC", y = "BLCA"),
            subtitle = "Tumor vs normal comparisons across indications",
            highlight = comp.hi)
# highlight some of them
s.hi <- sviz$input_data |>
  dplyr::group_by(interaction_group) |>
  dplyr::slice(1:3) |>
  dplyr::ungroup()
if (requireNamespace("patchwork")) {
  patchwork::wrap_plots(
    sviz$plot + ggplot2::theme(legend.position = "bottom"),
    sviz$plot_facets + ggplot2::theme(legend.position = "none"),
    nrow = 1)
  viz(dge.comp, labels = c(x = "CRC", y = "BLCA"),
      colors = "darkgrey")$plot_facets
}

facilebio/FacileAnalysis documentation built on April 5, 2025, 2:42 p.m.