PrimerDesign: Primer Design Functionalities.

PrimerDesignR Documentation

Primer Design Functionalities.

Description

design_primers

Designs a primer set maximizing the number of covered templates using the smallest possible number of primers. The algorithm tries to ensure that the designed set of primers achieves a coverage ratio not lower than required.cvg. To this end, the constraints for designing primers may be relaxed.

get_initial_primers

Creates a set of primer candidates based on the input template sequences. This set of primers can be used to create custom primer design algorithms.

Usage

classify_design_problem(
  template.df,
  mode.directionality = c("both", "fw", "rev"),
  primer.length = 18,
  primer.estimate = FALSE,
  required.cvg = 1
)

get_initial_primers(
  sample,
  template.df,
  primer.lengths,
  mode.directionality = c("fw", "rev"),
  allowed.region.definition = c("within", "any"),
  init.algo = c("naive", "tree"),
  max.degen = 16,
  conservation = 1,
  updateProgress = NULL
)

design_primers(
  template.df,
  mode.directionality = c("both", "fw", "rev"),
  settings,
  init.algo = c("naive", "tree"),
  opti.algo = c("Greedy", "ILP"),
  required.cvg = 1,
  timeout = Inf,
  max.degen = 16,
  conservation = 1,
  sample.name = NULL,
  cur.results.loc = NULL,
  primer.df = NULL,
  updateProgress = NULL
)

Arguments

template.df

A Templates object containing the template sequences with annotated primer target binding regions.

mode.directionality

The template strand for which primers shall be designed. Primers can be designed either for forward strands ("fw"), for reverse strands ("rev"), or for both strands ("both"). The default setting is "both".

primer.length

A scalar numeric providing the target length of the designed primers. The default length of generated primers is set to 18.

primer.estimate

Whether the number of required primers shall be estimated. By default (FALSE), the number of required primers is not estimated.

required.cvg

The desired ratio of of covered template sequences. If the target coverage ratio cannot be reached, the constraint settings are relaxed according to the the constraint limits in order to reach the target coverage. The default required.cvg is set to 1, indicating that 100% of the templates are to be covered.

sample

Character vector providing an identifier for the templates.

primer.lengths

Numeric vector of length 2 providing the minimal and maximal allowed lengths for generated primers.

allowed.region.definition

A character vector providing the definition of region where primers are to be constructed. If allowed.region.definition is "within", constructed primers lie within the allowed binding region. If allowed.region.definition is "any", primers overlap with the allowed binding region. The default is "within".

init.algo

The algorithm to be used for initializing primers. If init.algo is "naive", then primers are constructed from substrings of the input template sequences. If init.algo is "tree", phylogenetic trees are used to form degenerate primers whose degeneracy is bounded by max.degen. This option requires an installation of MAFFT (see notes). The default init.algo is "naive".

max.degen

The maximal degeneracy of primer candidates. This setting is particularly relevant when init.algo is set to "tree". The default setting is 16, which means that at most 4 maximally degenerate positions are allowed per primer.

conservation

Restrict the percentile of considered regions according to their conservation. Only applicable for the tree-based primer initialization. At the default of 1, all available binding regions are considered.

updateProgress

Shiny progress callback function. The default is NULL such that no progress is logged.

settings

A DesignSettings object specifying the constraint settings for designing primers.

opti.algo

The algorithm to be used for solving the primer set covering problem. If opti.algo is "Greedy" a greedy algorithm is used to solve the set cover problem. If opti.algo is "ILP" an integer linear programming formulation is used. The default opti.algo is "Greedy".

timeout

Timeout in seconds. Only applicable when opti.algo is "ILP". The default is Inf, which does not limit the runtime.

sample.name

An identifier for the primer design task. The default setting is NULL, which means that the run identifier provided in template.df is used.

cur.results.loc

Directory for storing the results of the primer design procedure. The default setting is NULL such that no output is stored.

primer.df

An optional Primers object. If an evaluated primer.df is provided, the primer design procedure only optimizes primer.df and does not perform the initialization and filtering steps. The default is NULL such that primers are initialized and filtered from scratch.

Details

classify_design_problem determines the difficulty of a primer design task by estimating the distribution of coverage ratios per primer by performing exact string matching with primers of length primer.length, which are constructed by extracting template subsequences. Next, a beta distribution is fitted to the estimated coverage distribution, which is then compare to reference distributions representing primer design problems of different difficulties via the total variance distance. The difficulty of the input primer design problem is found by selecting the class of the reference distributions that has the smallest distance to the estimated coverage distribution. An estimate of the required number of primers to reach a given required.cvg can be computed by setting primer.estimate to TRUE. Since this estimate is based solely on perfect matching primers, the number of primers that would actually be required is typically less.

The primer design algorithm used by design_primers consists of three steps: primer initialization, filtering, and optimization. The method for initializing a set of candidate primers is determined via init.algo. If init.algo is set to naive, primers are created by extracting substrings from all input template sequences. If init.algo is set to tree, degenerate primers are created by merging similar subsequences by forming their consensus sequence up to a degeneracy of at most max.degen. The tree-based initialization is recommended for related sequences.

The candidate primer set is filtered according to the constraints specified in the settings object. In some cases, it is necessary to relax the constraints in order to reach the desired required.cvg. In these cases, primers that fail the input constraints may be selected. If you would like to skip the initialization and filtering stages, you can provide an evaluated Primers object via primer.df.

Optimizing a primer set entails finding the smallest subset of primers maximizing the coverage, which is done by solving the set cover problem. If melting temperature differences are a constraint, the optimization procedure automatically samples ranges of melting temperatures to find optimal sets for all possible temperatures. You can select the used optimization algorithm via optia.algo, where you can set "Greedy" for a greedy algorithm or "ILP for an integer linear program formulation (ILP). While the worst-case runtime of the greedy algorithm is shorter than the worst-case runtime of the ILP, the greedy solution may yield larger primer sets than the ILP solution.

Value

classify_design_problem returns a list with the following fields:

Classification

The estimated difficulty of the primer design task.

Class-Distances

The total variance distance of the fitted beta distribution to the reference distribution.

Confidence

The confidence in the estimate of the design tasks' difficulty as based on the class distances.

Uncertain

Whether the classification is highly uncertain, that is low-confidence.

Nbr_primers_fw and Nbr_primers_rev

The respective number of required forward and reverse primers if primer.estimate was set to TRUE.

get_initial_primers returns a data frame with candidate primers for optimization.

design_primers returns a list with the following fields:

opti:

A Primers object providing the designed primer set.

used_constraints:

A list with DesignSettings objects for each primer direction providing the (possibly relaxed) constraints used for designing the optimal primers.

all_results:

A list containing objects of class Primers. Each list entry corresponds to an optimal primer set for a given melting temperature.

all_used_constraints:

A list containing DesignSettings object for each optimized set in all_results.

filtered:

A list with data providing information on the results of the filtering procedure.

Note

Some constraints can only be computed if additional software is installed, please see the documentation of DesignSettings for more information. The usage of init.algo = "tree" requires an installation of the multiple alignment program MAFFT (http://mafft.cbrc.jp/alignment/software/).

Examples


data(Ippolito)
# Naive primer initialization
init.primers <- get_initial_primers("InitialPrimers", template.df, 
                         c(18,18), "fw", init.algo = "naive")
# Tree-based primer initialization (requires MAFFT)
## Not run: 
init.primers <- get_initial_primers("InitialPrimers", template.df, 
                         c(18,18), "fw", init.algo = "tree")

## End(Not run)

# Define PCR settings and primer criteria
data(Ippolito)
# design only with minimal set of constraints
constraints(settings)$primer_length <- c("min" = 18, "max" = 18)
constraints(settings) <- constraints(settings)[c("primer_length", "primer_coverage")]
# Design only forward primers using a greedy algorithm
optimal.primers.greedy <- design_primers(template.df[1:2,], "both", settings, init.algo = "naive")
# Usage of the tree-based initialization strategy (requires MAFFT)
## Not run: 
out.dir <- tempdir()
optimal.primers.tree <- design_primers(template.df[1:2,], "both", settings,
                         init.algo = "tree", opti.algo = "ILP",
                         max.degen = 16,
                         cur.results.loc = out.dir)

## End(Not run)

matdoering/openPrimeR documentation built on Feb. 11, 2024, 9:22 p.m.