View source: R/shannon_entropy.R
shannon_entropy | R Documentation |
Takes a rearranged vcf dataframe and calculates the Shannon entropy
shannon_entropy(df, genome_size)
df |
A rearranged vcf dataframe (arrange_data) |
genome_size |
Size of whole genome being used |
Shannon entropy is a commonly used metric to describe the amount of genetic diversity in sequencing data. It is calculated by considering the frequency of the ALT and REF allele at every position and then summing those values over 1) a segment or 2) the entire genome. These values can then be normalized by sequence length (kb) in order to compare across different segments or samples.
A dataframe with Shannon entropy/kb calculations for the chroms and entire genome
# Sample dataframe
df <- data.frame(sample = c("m1", "m2", "m1", "m2", "m1"),
CHROM = c("PB1", "PB1", "PB2", "PB2", "NP"),
minorfreq = c(0.010, 0.022, 0.043, 0.055, 0.011),
majorfreq = c(0.990, 0.978, 0.957, 0.945, 0.989),
SegmentSize = c(2280, 2280, 2274, 2274, 1809)
)
df
genome_size = 13133
# MOdify the dataframe to add 5 new columns of shannon entropy data:
# 1. shannon_ntpos
# 2. chrom_shannon
# 3. genome_shannon
# 4. shannon_chrom_perkb
# 5. genome_shannon_perkb
shannon_entropy(df, genome_size)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.