geva.finalize: Concatenating GEVA calculations into the final results
In sbcblab/geva: Gene Expression Variation Analysis (GEVA)

Description Usage Arguments Details Value Note See Also Examples

Merges the obtained information (Summarization, Clustering, and Quantiles), then applies the final steps to produce the classification results for the SV points (genes).

geva.finalize(
  gsummary,
  ...,
  p.value = 0.05,
  p.val.adjust = options.factoring.p.adjust,
  constraint.factors = TRUE
)

options.factoring.p.adjust
# c("partial.quantiles", "holm", "hochberg", "hommel", 
#   "bonferroni", "BH", "BY", "fdr", "none")

`gsummary`	a `GEVASummary` object
`...`	Intermediate results produced from the `gsummary` object, such as clusters (`GEVACluster`), quantiles (`GEVAQuantiles`), or any other object inherited from `GEVAGroupSet`
`p.value`	`numeric` (0 to 1), p-value cutoff used in the ANOVA procedures (factor analysis only)
`p.val.adjust`	`character`, p-value correction method (factor analysis only). Possible values are: "partial.quantiles", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
`constraint.factors`	`logical`. If `TRUE`, the S values are restricted to the range within the quantile centroids (factor analysis only)

In this procedure, the SV points (i.e., each row in the GEVASummary object) are classified according to the detected quantiles (see geva.quantiles), whose results can be adjusted using other grouping analysis results such as clusters (see geva.cluster). To achieve the best statistical accuracy, both GEVAQuantiles and GEVACluster objects must be given in the ... as optional arguments. If a GEVAQuantiles argument is not present, it is automatically calculated using the default parameters.

If multiple factors are present in the GEVASummary object (retrieved by factors(gsummary)), a factor analysis is also performed, giving two additional possible classifications (factor-dependent and factor-specific) besides the default ones (similar, basal, and sparse).

In factor analysis, an ANOVA is applied for each gene using Fisher's and Levene's tests to distinguish genes whose logFC (differential expression) variation is dependent or specific to the analyzed factors based on the p-value cutoff. The p.val.adjust argument defines how these p-values will be adjusted: by quantile separation between each factor ("partial.quantiles" method); or by one of the default methods listed in stats::p.adjust.methods.

The constraint.factors argument determines if the S values (summarized logFC) will be limited to the range between the quantile centroids during factor analysis. For example, if the quantile centroids were -0.90, 0.00, and 0.90 in the S axis, values such as -1.53 and 2.96 would be converted to -0.90 and 0.90, respectively. This constraint is particularly applied to avoid significative observations from ANOVA based on multiple degrees of differential expression.
In another example to illustrate the constraint of factors, given two sets of values: A = (-1.00, -1,10, 0.00, 0.20, 1.00, 1.15), and B = (0.00, 0.12, 1.11, 1.00, 1.95, 2.00), with the centroids located in C = (-0.90, 0.00, 0.90), and the factors F = (Cond1, Cond1, Cond2, Cond2, Cond3, Cond3). If constraint.factors is FALSE, both A and B are considered as significantly separated factors, whereas if TRUE, only A will present a significant separation, since in B the values 1.11, 1.00, 1.95, and 2.00 are converted to 0.90. In qualitative terms, if constraint.factors is TRUE, all values above 0.90 are considered the same over-expressed values, ensuring that they will fit in the same degree of differential expression. Hence, in this example using the constrained values, B would not represent a significant separation between the factors Cond1, Cond2, and Cond3.

A GEVAResults object, containing the entire set of results. The relevant genes can be retrieved using top.genes()

To perform factor analysis, the following observations must be considered:

The factors must be defined in the provided data. They can be retrieved using the factors accessor. If factors are not present or are entirelly composed by NA, they can be assigned through factors<- by providing a factor or character vector of the same length of the input columns;
Each factor must include two or more values, since the factor analysis is based on ANOVA and at least two values are needed to variance calculation;
Columns whose factor value is NA are not considered.

p.adjust.methods

## Finalizing example using a random generated input
ginput <- geva.ideal.example()       # Generates a random input (for testing purposes only)
gsummary <- geva.summarize(ginput)   # Summarizes the input
gquant <- geva.quantiles(gsummary)   # Calculates the quantiles
gclust <- geva.cluster(gsummary)     # Calculates the clusters
gresults <- geva.finalize(gsummary, gquant, gclust)  # Finishes the results

head(top.genes(gresults))            # Prints the final results
plot(gresults)                       # Plots the final SV-plot