View source: R/calculate_diversity.R
calculate_diversity | R Documentation |
Main function for calculating splicing diversity
calculate_diversity( x, genes = NULL, method = "laplace", norm = TRUE, tpm = FALSE, assayno = 1, verbose = FALSE )
x |
A numeric |
genes |
Character vector with equal length to the number of rows of the
input dataset with transcript-level expression values. The values in
|
method |
Method to use for splicing diversity calculation, including
naive entropy ( |
norm |
If |
tpm |
In the case of a tximport list, TPM values or raw read counts can
serve as an input. If |
assayno |
An integer value. In case of multiple assays in a
|
verbose |
If |
The function is intended to process transcript-level expression data from RNA-seq or similar datasets.
Given a N x M matrix or similar data structure, where the N rows are transcripts and the M columns are samples, and a vector of gene ids, used for aggregating the transcript level data, the function calculates transcript diversity values for each gene in each sample. These diversity values can be used to investigate the dominance of a specific transcript for a gene, the diversity of transcripts in a gene, and analyze changes in diversity.
There are a number of diversity values implemented in the package. These include the following:
Naive entropy: Shannon entropy using the transcript frequencies as probabilities. 0 entropy means a single dominant transcript, higher values mean a more diverse set of transcripts for a gene.
Laplace entropy: Shannon entropy where the transcript frequencies are replaced by a Bayesian estimate, using Laplace's prior.
Gini index: a measure of statistical dispersion originally used in economy. This measurement ranges from 0 (complete equality) to 1 (complete inequality). A value of 1 (complete inequality) means a single dominant transcript.
Simpson index: a measure of diversity, characterizing the number of different species (transcripts of a gene) in a dataset. Originally, this measurement calculates the probability that randomly selected individuals belong to different species. Simpson index ranges between 0 and 1; the higher the value, the higher the diversity.
Inverse Simpson index: Similar concept as the Simpson index, although a higher inverse-Simpson index means greater diversity. It ranges between 1 and the total number of transcripts for a gene.
The function can calculate the gene level diversity index using any kind of expression measure, including raw read counts, FPKM, RPKM or TPM values, although results may vary.
Gene-level splicing diversity values in a SummarizedExperiment
object.
# matrix with RNA-seq read counts x <- matrix(rpois(60, 10), ncol = 6) colnames(x) <- paste0("Sample", 1:6) # gene names used for grouping the transcript level data gene <- c(rep("Gene1", 3), rep("Gene2", 2), rep("Gene3", 3), rep("Gene4", 2)) # calculating normalized Laplace entropy result <- calculate_diversity(x, gene, method = "laplace", norm = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.