calculate_diversity: Main function for calculating splicing diversity

Description Usage Arguments Details Value Examples

View source: R/calculate_diversity.R

Description

Main function for calculating splicing diversity

Usage

1
2
3
4
5
6
7
8
9
calculate_diversity(
  x,
  genes = NULL,
  method = "laplace",
  norm = TRUE,
  tpm = FALSE,
  assayno = 1,
  verbose = FALSE
)

Arguments

x

A numeric matrix, data.frame, tximport list, DGEList, SummarizedExperiment or ExpressionSet.

genes

Character vector with equal length to the number of rows of the input dataset with transcript-level expression values. The values in x are grouped into genes based on this vector.

method

Method to use for splicing diversity calculation, including naive entropy (naive), Laplace entropy (laplace), Gini index (gini), Simpson index (simpson) and inverse Simpson index (invsimpson). The default method is Laplace entropy.

norm

If TRUE, the entropy values are normalized to the number of transcripts for each gene. The normalized entropy values are always between 0 and 1. If FALSE, genes cannot be compared to each other, due to possibly different maximum entropy values.

tpm

In the case of a tximport list, TPM values or raw read counts can serve as an input. If TRUE, TPM values will be used, if FALSE, read counts will be used.

assayno

An integer value. In case of multiple assays in a SummarizedExperiment input, the argument specifies the assay number to use for diversity calculations.

verbose

If TRUE, the function will print additional diagnostic messages, besides the warnings and errors.

Details

The function is intended to process transcript-level expression data from RNA-seq or similar datasets.

Given a N x M matrix or similar data structure, where the N rows are transcripts and the M columns are samples, and a vector of gene ids, used for aggregating the transcript level data, the function calculates transcript diversity values for each gene in each sample. These diversity values can be used to investigate the dominance of a specific transcript for a gene, the diversity of transcripts in a gene, and analyze changes in diversity.

There are a number of diversity values implemented in the package. These include the following:

The function can calculate the gene level diversity index using any kind of expression measure, including raw read counts, FPKM, RPKM or TPM values, although results may vary.

Value

Gene-level splicing diversity values in a SummarizedExperiment object.

Examples

1
2
3
4
5
6
7
8
9
# matrix with RNA-seq read counts
x <- matrix(rpois(60, 10), ncol = 6)
colnames(x) <- paste0("Sample", 1:6)

# gene names used for grouping the transcript level data
gene <- c(rep("Gene1", 3), rep("Gene2", 2), rep("Gene3", 3), rep("Gene4", 2))

# calculating normalized Laplace entropy
result <- calculate_diversity(x, gene, method = "laplace", norm = TRUE)

esebesty/SplicingFactory documentation built on Feb. 27, 2022, 12:08 a.m.