View source: R/getEntropySignature.R
| getEntropySignature | R Documentation |
Calculates genome-wide Shannon entropies from SNV data.
getEntropySignature(
polymorphisms,
position = "position",
linkage = "linkage",
ref = "ref",
alt = "alt",
protein = "protein",
aa_position = "aa_position",
ref_aa = "ref_aa",
alt_aa = "alt_aa",
alt_aa_freq = "alt_aa_freq",
categories = "robust",
genome = mn908947.3
)
polymorphisms |
A data frame. Please see Details and Examples. |
position |
Name of the |
linkage |
Information on linked positions. |
ref |
Column name with reference bases. |
alt |
Column name with the alternative bases observed in the metagenome. |
protein |
Name of the column carrying protein names. |
aa_position |
Name of the column that indicates the protein positions of the mutated amino acids. |
ref_aa |
Name of the column that carries the reference amino acids. |
alt_aa |
Name of the column carrying alternative amino acids observed in the metagenome. |
alt_aa_freq |
Name of the column giving the frequencies of alternative amino acids in the metagenome. |
categories |
Whether a class per amino acid should be used ("sensitive") or they should be grouped into aliphatic, aromatic, polar, positively charged, negatively charged, and special ("robust") (Mirny and Shakhnovich, 1999). |
genome |
A list providing CDS data and length of the reference genome. |
You provide a data frame with SNVs information including reference
and alternative aminoacids, their frequencies, and corresponding positions
relative to a reference sequence.
This type of data can be generated by numerous programs and pipelines.
The objective is to assess the biological impact of nonsynonymous
variation within a viral population, such as an environmental sample (e.g.
wastewater) or a single infection (aka quasisepecies).
Entropy is calculated within the metagenome and is therefore independent
of the reference sequence.
Some mutations may be part of a same codon.
This is to be indicated in the linkage column, providing a downstream
linked position, or the closest upstream position if there are no downstream
positions that are part of the same codon.
For example, in the wWater dataset, mutations T22673C and C22674T are linked
to each other and affect codon 371 of the S gene:
| wave | position | linkage | ref | alt | protein | ... | |
| ... | |||||||
| 105 | third | 22599 | NA | G | A | S | ... |
| 106 | third | 22673 | 22674 | T | C | S | ... |
| 107 | third | 22674 | 22673 | C | T | S | ... |
| 108 | third | 22679 | NA | T | C | S | ... |
| ... | |||||||
The genome parameter is a list that provides data on the topology of
protein-coding regions in the genome and its length, used internally
primarily for graphical and summary purposes.
The package provides an example (mn908947.3) of how this
information is to be organized.
An object of class entropyProfile. It contains a tidy,
summarized version of the SNV table, a data frame with
information on genome-wide entropy, a data frame with
information on each CDS and corresponding mutations observed in the
virome, and a list with CDS data and length of the reference
genome used in variant calling.
Mirny and Shakhnovich, 1999. J Mol Biol 291:177-196. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1006/jmbi.1999.2911")}.
Shannon, 1948. Bell System Technical Journal, 27:379-423. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/j.1538-7305.1948.tb01338.x")}.
# Entropy across the genome in ancestral lineages
ancestral <- getEntropySignature(wWater[wWater$wave == "first", ], categories = "sensitive")
# Inspect profile
plot(ancestral, chartType = "entroScan")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.