calculate_difference: Calculate splicing diversity changes between two conditions.
In esebesty/SplicingFactory: Splicing Diversity Analysis for Transcriptome Data

Description Usage Arguments Details Value Examples

Calculate splicing diversity changes between two conditions.

calculate_difference(
  x,
  samples,
  control,
  method = "mean",
  test = "wilcoxon",
  randomizations = 100,
  pcorr = "BH",
  assayno = 1,
  verbose = FALSE,
  ...
)

`x`	A `SummarizedExperiment` with splicing diversity values for each gene in each sample or a `data.frame` with gene names in the first column and splicing diversity values for each sample in additional columns.
`samples`	A vector of length one, specifying the column name of the `colData` annotation column from the `SummarizedExperiment` object, that should be used as the category column or a character vector with an equal length to the number of columns in the input dataset, specifying the category of each sample in the case of a `data.frame` input.
`control`	Name of the control sample category, defined in the `samples` vector, e.g. `control = 'Normal'` or `control = 'WT'`.
`method`	Method to use for calculating the average splicing diversity value in a condition. Can be `'mean'` or `'median'`.
`test`	Method to use for p-value calculation: use `'wilcoxon'` for Wilcoxon rank sum test or `'shuffle'` for a label shuffling test.
`randomizations`	Number of random shuffles, used for the label shuffling test (default = 100).
`pcorr`	P-value correction method applied to the Wilcoxon rank sum test or label shuffling test results, as defined in the `p.adjust` function.
`assayno`	An integer value. In case of multiple assays in a `SummarizedExperiment` input, the argument specifies the assay number to use for difference calculations.
`verbose`	If `TRUE`, the function will print additional diagnostic messages.
`...`	Further arguments to be passed on for other methods.

The function calculates diversity changes between two sample conditions. It uses the output of the diversity calculation function, which is a SummarizedExperiment object of splicing diversity values. Additionally, it can use a data.frame as input, where the first column contains gene names, and all additional columns contain splicing diversity values for each sample. A vector of sample conditions also serves as input, used for aggregating the samples by condition.

It calculates the mean or median of the splicing diversity data per sample condition, the difference of these values and the log2 fold change of the two conditions. Furthermore, the user can select a statistical method to calculate the significance of the changes. The p-values and adjusted p-values are calculated using a Wilcoxon sum rank test or label shuffling test.

The function will exclude genes of low sample size from the significance calculation, depending on which statistical test is applied.

A data.frame with the mean or median values of splicing diversity across sample categories and all samples, log2(fold change) of the two different conditions, raw and corrected p-values.

# data.frame with splicing diversity values
x <- data.frame(Genes = letters[seq_len(10)], matrix(runif(80), ncol = 8))

# sample categories
samples <- c(rep('Healthy', 4), rep('Pathogenic', 4))

# To calculate the difference of splicing diversity changes between the
# 'Healthy' and 'Pathogenic' condition together with the significance values,
# using mean and Wilcoxon rank sum test, use:
calculate_difference(x, samples, control = 'Healthy', method = 'mean', test = 'wilcoxon')