Scoring: Scoring Functions.

ScoringR Documentation

Scoring Functions.

Description

score_degen

Determines the degeneration score of a sequence.

score_conservation

Determines the sequence conservation scores of a set of templates using Shannon entropy.

score_primers

Computes scores for a set of primers based on the deviations of the primers from the constraints.

Usage

score_conservation(template.df, gap.char = "-", win.len = 30, by.group = TRUE)

score_degen(seq, gap.char = "-")

score_primers(
  primer.df,
  settings,
  active.constraints = names(constraints(settings)),
  alpha = 0.5
)

Arguments

template.df

A Templates object providing the set of templates.

gap.char

The gap character in the sequences. The default is "-".

win.len

The size of a window for evaluating conservation. The default window size is set to 30.

by.group

Whether the determination of binding regions should be stratified according to the groups defined in template.df. The default is TRUE.

seq

A list of vectors containing individual characters of a nucleotide sequence.

primer.df

A Primers object containing the primers.

settings

A DesignSettings object containing the analysis settings.

active.constraints

A character vector of constraint identifiers that are considered for scoring the primers.

alpha

A numeric that is used to determine the trade-off between the impact of the maximal observed deviation and the total deviation. At its default alpha is set to 0.5 such that the maximal deviation and the total deviation have an equal weight when computing the penalties.

Details

score_degen computes the degeneration of an ambiguous sequence by considering the number of unambiguous sequences that are represented by the the ambiguous sequence. Let a sequence S of length n be represented by a collection of sets such that

S = {s_1, s_2, \ldots, s_n}

where s_i indicates the set of unambiguous bases found at position i of the primer. Then the degeneracy D of a primer can be defined as

D = \prod_i{|s_i|}

where |s_i| provides the number of disambiguated bases at position i.

score_primers determines the penalty of a primer in the following way. Let d be a vector indicating the absolute deviations from individual constraints and let p be the scalar penalty that is assigned to a primer. We define

p = \alpha \cdot \max_i d_i + \sum_i (1 - \alpha) \cdot d_i

such that for large values of alpha the maximal deviation dominates giving rise to a local penalty (reflecting the largest absolute deviation) and for small alpha the total deviation dominates giving rise to a global penalty (reflecting the sum of constraint deviations). When alpha is 1 only the most extreme absolute deviation is considered and when alpha is 0 the sum of all absolute deviations is computed.

Value

A list containing Entropies and Alignments. Entropies is a data frame with conservation scores. Each column indicates a position in the alignment of template sequences and each row gives the entropies of the sequences belonging to a specific group of template sequences. Alignments is a list of DNABin objects, where each object gives the alignment corresponding to one group of template sequences.

score_degen finds the number of unambiguous sequences that are represented by seq.

score_primers returns a data frame containing scores for individual primers.

Note

score_conservation requires the MAFFT software for multiple alignments (http://mafft.cbrc.jp/alignment/software/).

Examples

## Not run: 
data(Ippolito)
entropy.data <- score_conservation(template.df, gap.char = "-", win.len = 18, by.group = TRUE)

## End(Not run)
# Compute degeneration for sequences with differing number of ambiguous bases
seq <- strsplit(c("ctggaattacggtacc", "taggaaccggrtaagc", "rtaaasrygtar"), split = "")
degen <- score_degen(seq)

# Score the primers
data(Ippolito)
primer.scores <- score_primers(primer.df, settings)

matdoering/openPrimeR documentation built on Feb. 11, 2024, 9:22 p.m.