calculate_difference: Calculate splicing diversity changes between two conditions.

View source: R/calculate_difference.R

calculate_differenceR Documentation

Calculate splicing diversity changes between two conditions.

Description

Calculate splicing diversity changes between two conditions.

Usage

calculate_difference(
  x,
  samples,
  control,
  method = "mean",
  test = "wilcoxon",
  randomizations = 100,
  pcorr = "BH",
  assayno = 1,
  verbose = FALSE,
  ...
)

Arguments

x

A SummarizedExperiment with splicing diversity values for each gene in each sample or a data.frame with gene names in the first column and splicing diversity values for each sample in additional columns.

samples

A vector of length one, specifying the column name of the colData annotation column from the SummarizedExperiment object, that should be used as the category column or a character vector with an equal length to the number of columns in the input dataset, specifying the category of each sample in the case of a data.frame input.

control

Name of the control sample category, defined in the samples vector, e.g. control = 'Normal' or control = 'WT'.

method

Method to use for calculating the average splicing diversity value in a condition. Can be 'mean' or 'median'.

test

Method to use for p-value calculation: use 'wilcoxon' for Wilcoxon rank sum test or 'shuffle' for a label shuffling test.

randomizations

Number of random shuffles, used for the label shuffling test (default = 100).

pcorr

P-value correction method applied to the Wilcoxon rank sum test or label shuffling test results, as defined in the p.adjust function.

assayno

An integer value. In case of multiple assays in a SummarizedExperiment input, the argument specifies the assay number to use for difference calculations.

verbose

If TRUE, the function will print additional diagnostic messages.

...

Further arguments to be passed on for other methods.

Details

The function calculates diversity changes between two sample conditions. It uses the output of the diversity calculation function, which is a SummarizedExperiment object of splicing diversity values. Additionally, it can use a data.frame as input, where the first column contains gene names, and all additional columns contain splicing diversity values for each sample. A vector of sample conditions also serves as input, used for aggregating the samples by condition.

It calculates the mean or median of the splicing diversity data per sample condition, the difference of these values and the log2 fold change of the two conditions. Furthermore, the user can select a statistical method to calculate the significance of the changes. The p-values and adjusted p-values are calculated using a Wilcoxon sum rank test or label shuffling test.

The function will exclude genes of low sample size from the significance calculation, depending on which statistical test is applied.

Value

A data.frame with the mean or median values of splicing diversity across sample categories and all samples, log2(fold change) of the two different conditions, raw and corrected p-values.

Examples

# data.frame with splicing diversity values
x <- data.frame(Genes = letters[seq_len(10)], matrix(runif(80), ncol = 8))

# sample categories
samples <- c(rep('Healthy', 4), rep('Pathogenic', 4))

# To calculate the difference of splicing diversity changes between the
# 'Healthy' and 'Pathogenic' condition together with the significance values,
# using mean and Wilcoxon rank sum test, use:
calculate_difference(x, samples, control = 'Healthy', method = 'mean', test = 'wilcoxon')

SU-CompBio/SplicingFactory documentation built on March 28, 2022, 4:39 a.m.