getEntropySignature: Infer Entropy Signature

View source: R/getEntropySignature.R

getEntropySignatureR Documentation

Infer Entropy Signature

Description

Calculates genome-wide Shannon entropies from SNV data.

Usage

getEntropySignature(
  polymorphisms,
  position = "position",
  linkage = "linkage",
  ref = "ref",
  alt = "alt",
  protein = "protein",
  aa_position = "aa_position",
  ref_aa = "ref_aa",
  alt_aa = "alt_aa",
  alt_aa_freq = "alt_aa_freq",
  categories = "robust",
  genome = mn908947.3
)

Arguments

polymorphisms

A data frame. Please see Details and Examples.

position

Name of the polymorphisms's column that indicates SNV locations in the genome.

linkage

Information on linked positions.

ref

Column name with reference bases.

alt

Column name with the alternative bases observed in the metagenome.

protein

Name of the column carrying protein names.

aa_position

Name of the column that indicates the protein positions of the mutated amino acids.

ref_aa

Name of the column that carries the reference amino acids.

alt_aa

Name of the column carrying alternative amino acids observed in the metagenome.

alt_aa_freq

Name of the column giving the frequencies of alternative amino acids in the metagenome.

categories

Whether a class per amino acid should be used ("sensitive") or they should be grouped into aliphatic, aromatic, polar, positively charged, negatively charged, and special ("robust") (Mirny and Shakhnovich, 1999).

genome

A list providing CDS data and length of the reference genome.

Details

You provide a data frame with SNVs information including reference and alternative aminoacids, their frequencies, and corresponding positions relative to a reference sequence. This type of data can be generated by numerous programs and pipelines. The objective is to assess the biological impact of nonsynonymous variation within a viral population, such as an environmental sample (e.g. wastewater) or a single infection (aka quasisepecies). Entropy is calculated within the metagenome and is therefore independent of the reference sequence. Some mutations may be part of a same codon. This is to be indicated in the linkage column, providing a downstream linked position, or the closest upstream position if there are no downstream positions that are part of the same codon. For example, in the wWater dataset, mutations T22673C and C22674T are linked to each other and affect codon 371 of the S gene:

wave position linkage ref alt protein ...
...
105 third 22599 NA G A S ...
106 third 22673 22674 T C S ...
107 third 22674 22673 C T S ...
108 third 22679 NA T C S ...
...

The genome parameter is a list that provides data on the topology of protein-coding regions in the genome and its length, used internally primarily for graphical and summary purposes. The package provides an example (mn908947.3) of how this information is to be organized.

Value

An object of class entropyProfile. It contains a tidy, summarized version of the SNV table, a data frame with information on genome-wide entropy, a data frame with information on each CDS and corresponding mutations observed in the virome, and a list with CDS data and length of the reference genome used in variant calling.

References

Mirny and Shakhnovich, 1999. J Mol Biol 291:177-196. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1006/jmbi.1999.2911")}.

Shannon, 1948. Bell System Technical Journal, 27:379-423. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/j.1538-7305.1948.tb01338.x")}.

Examples


# Entropy across the genome in ancestral lineages
ancestral <- getEntropySignature(wWater[wWater$wave == "first", ], categories = "sensitive")

# Inspect profile
plot(ancestral, chartType = "entroScan")



MetaEntropy documentation built on March 3, 2026, 5:08 p.m.