MiniMax: Find Gene Set / Pathway Significance across Multi-Omics Data

Description Usage Arguments Details Value Examples

View source: R/wrapper_MiniMax.R

Description

Given a data frame of pathway-level p-values across multiple -omics platforms, use the MiniMax technique to assign statistical significance to concordant or cascading pathway-level biological effects.

Usage

1
2
3
4
5
6
7
8
MiniMax(
  pValues_df,
  pValuesNull_df = NULL,
  orderStat = 2L,
  method = c("parametric", "MLE", "MoM"),
  annotateResults = TRUE,
  ...
)

Arguments

pValues_df

A data frame of pathway / gene set p-values under true responses (this data set should contain true biological signal). The rows correspond to gene sets / pathways, and the columns correspond to the data platforms for the disease of interest.

pValuesNull_df

A data frame of pathway / gene set p-values under the null hypothesis, most likely constructed from randomly permuting the response and re-estimating all significance levels (this data set should NOT contain any true biological signal). As with pValues_df, the rows correspond to gene sets / pathways, and the columns correspond to the data platforms for the disease of interest. NOTE: if this data set is not provided, only method = "parametric" will be available.

orderStat

How many platforms should show a biological signal for a pathway / gene set to have multi-omic "enrichment"? Defaults to 2. See "Details" for more information.

method

If pValuesNull_df is provided, which estimation method will be used to find the parameters of the Beta Distribution? Options are "parametric" (no estimation from the data; this should be used only in cases where no MiniMax statistics under the null hypothesis are available, such as in the case of pure meta-analysis approaches), "MLE" (Maximum Likelihood Estimates), or "MoM" (Method of Moments estimates). Using "MLE" or "MoM" requires the user to provide pValuesNull_df. See "Details" for more information.

annotateResults

Should the platforms driving each result be marked? Defaults to TRUE. See MiniMax_calculateDrivers for more information.

...

Additional arguments passed to the MiniMax_calculateDrivers function.

Details

Concerning Parameter Estimation Methods: We currently support 3 options to estimate the parameters of the Beta Distribution. The "parametric" option does not use the data, and it is therefore the only option available if pValuesNull_df is not provided. Instead, it assumes that the MiniMax statistics will have a Beta (k, n + 1 - k) distribution, where k is the value of orderStat and n has the value nPlatforms. See https://en.wikipedia.org/wiki/Order_statistic.

The next two estimation options make use of the pValuesNull_df data frame, which should be calculated by finding the same significance levels of the statistical tests used on the real data (for each pathway and data platform), but by using a random permutation of the outcome of interest instead of the real values; more permutations are better. The "MLE" option uses the beta.mle function to find the Maximum Likelihood Estimates of α and β. The "MoM" option uses the closed-form Method of Moments estimators of α and β as shown in https://en.wikipedia.org/wiki/Beta_distribution#Method_of_moments.

Concerning Appropriate Order Statistics: The MiniMax operation is equivalent to sorting the p-values and taking the second smallest. In our experience, setting this "order statistic" cutoff to 2 is appropriate for =< 5 data platforms. Biologically, this is equivalent to saying "if this pathway is dysregulated in at least two data types for this disease / condition, it is worthy of additional consideration". In situations where more than 5 data platforms are available for the disease of interest, we recommend increasing the orderStat value to 3.

Value

A copy of the pValues_df data frame with two additional columns: MiniMax (the statistic values for each gene set) and MiniMaxP (the p-values of these statistics). This data frame is sorted by ascending MiniMax p-value.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
 data("multiOmicsMedSignalResults_df")
 data("nullMiniMaxResults_df")

 MiniMax(
   pValues_df = multiOmicsMedSignalResults_df,
   pValuesNull_df = nullMiniMaxResults_df[, -5],
   method = "MLE",
   # Passed to the MiniMax_calculateDrivers() function
   drivers_char = c("cnv", "rnaSeq", "protein")
 )

TransBioInfoLab/pathwayMultiomics documentation built on Dec. 18, 2021, 5:12 p.m.