CPSdesign.dual: Gene design with two codon pair bias references

Description Usage Arguments Details Value Note Examples

Description

Shuffle codons to generate sequences with different average codon pair scores relative to two poorly correlated reference codon pair biases.

Usage

1
2
3
4
5
CPSdesign.dual(sequence, reference, referenceTwo, score, scoreTwo, start = 1,
  end = NULL, cycles = NULL, buffer = 0.5, bind = c(1, 1, 1, 1),
  transTable = standardTranslation, restrictSeqs = NULL,
  complementary = FALSE, windowSize = NULL, save = FALSE, name = NULL,
  location = NULL, draw = TRUE, silent = FALSE)

Arguments

sequence

Sequences can be input directly as a character string or as the file path to a fasta file. All sequences must be in the correct reading frame, stop codons or codons not defined in the translation table are not allowed and will generate an error.

reference

CPB reference table (see CPBtable).

referenceTwo

A second reference table for recoding relative to two CPB's.

score

Ideal score relative to the first reference. Input can be numeric, ‘min’, or ‘max’.

scoreTwo

Ideal score relative to the second reference. Format same as score.

start

Nucleotide position in the sequence.

end

Nucleotide position in the sequence. If NULL, the last in frame nucleotide position is used.

cycles

Optional input designating the number of recoding cycles. If empty, a minimal number of cycles is determined. Increasing the number of cycles may result in scores closer to the ideal value.

buffer

Controls the probability of alternating recoding preference toward the other reference during a single round of codon shuffling. Accepts numeric input.

bind

Controls the permissivity of scores greater than or less than the ideal score for each reference. Input is a four element numeric vector interpreted relative to each other. Position in the vector designates which direction and which reference. The first position controls scores less than the ideal first score, the second position controls scores greater than the ideal first score, and the third and fourth numbers control scores less than, and greater than the desired second score. Default values represent no bias toward either reference and direction.

transTable

Alternative translation tables can be used.

restrictSeqs

A string of comma separated sequences to remove or avoid in the input sequence while recoding. Search is performed 5' to 3' on the given strand. R regular expressions are allowed.

complementary

If TRUE search for restricted sequences is also performed on the complementary strand 5' to 3'.

windowSize

CPS line plots are smoothed by locally weighted polynomial regression where windowSize designates the number of nucleotides over which individual codon pair scores are smoothed. If NULL, smoothing spans 7.5% of the sequence length.

save

Save recoded sequence in fasta file.

name

Output sequence name and fasta file name.

location

Save location of output sequence.

draw

If TRUE a line plot showing the local CPS along the length of the sequence is output to the graphics device during recoding.

silent

If TRUE output to the console is suppressed.

Details

An input sequence can be differentially recoded between two poorly correlated CPB references. A correlation test on two CPB reference tables can be computed directly with CPBcorr. CPSdesign.dual will attempt to create a recoded sequence characterized by two ideal codon pair scores relative to two different CPBs. See listCPB for a list of available CPB reference tables.

To get best results with dual reference recoding it is not advised to use 'max' and 'min' inputs for ideal score. It is better to first determine the range of possible scores relative to each reference alone, and then use specific scores when recoding for two references. Use the bind argument if the 'max' or 'min' possible score is desired. The bind argument controls the preference and direction of recoding for either reference. It takes a four element numeric vector input, the first position of the vector controls the permissivity of scores less than the ideal score for the first reference, the number in the second position controls scores greater than ideal for the first reference, and the third and fourth numbers control scores less than and greater than ideal for the second reference, respectively. Preference between directions and references is decided relative to each other, therefore any four identical numbers result in no biased preference. Increasing one value relative to the others will allow recoding to explore more sequences in the direction and reference designated by that position in the bind vector. Output scores can be further optimized by increasing the buffer value, if the ideal scores between two references are dramatically different or there is very little correlation between the reference CPBs. Increasing the buffer will increase the probability of alternating between references during a single permutation, whereas normally each round of codon shuffling is performed relative to one reference at a time.

Recoded sequences can be saved as a fasta file with the save argument. Additional information about the recoded sequence is returned as an invisible list.

Value

firstoldCPS

Average codon pair score of the input sequence relative to reference 1.

firstnewCPS

Average codon pair score of recoded sequence relative to reference 1.

secondoldCPS

Average codon pair score of the input sequence relative to reference 2.

secondnewCPS

Average codon pair score of recoded sequence relative to reference 2.

codonchanges

If codons were changed function will return an error.

mutations

Number of point mutations generated by recoding.

oldCPSarray

Vector of individual codon pair scores for the input sequence.

newCPSarray

Vector of individual codon pair scores for the recoded sequence.

returnSeq

The recoded sequence.

If restricted sequences are given an additional item is returned:

restrSeqs

The number of matches in the recoded sequence to the restricted sequences, value will include the complementary sequence is complementary is TRUE.

Note

Both single and dual reference recoding offer the option of removing certain types of sequence elements by restricting which codons can be paired together. For example, restriction enzyme recognition sequences can be removed or prevented from appearing in the recoded sequence. These restricted sequences can be entered to the restrictSeqs argument as a character string with sequences seperated by commas. A comprehensive list of restriction enzyme recognition sequences expressed as R regular expressions is provided in this package, see REseqs. Restricted sequences are searched only on the single input strand, the complementary arguement allows for searching on the reverse complement strand. Depending on the type and number of sequences that are restricted this functionality can dramatically slow down recoding and restrict the CPS.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
fastaLocation <- system.file('tbevns5.fasta', package = 'CPBias')
tbev <- importFasta(fastaLocation)[[1]]

# A dual CPS recoding of TBE virus relative to the CPB of two of its natural hosts,
# Homo.sapiens and Aedes aegypti. Design strategy is to make Homo.sapiens CPS as high
# as possible and less than WT CPS in Aedes.aegypti.

# If correlation between codon pair biases is too high, dual differential recoding may not
# be possible.
CPBcorr(Homo.sapiens, Aedes.aegypti)

# First estimate the possible range of scores relative to both hosts
HumMax <- CPSdesign.single(tbev, Homo.sapiens, 'max', silent=TRUE, draw=FALSE)[[2]]
AedMin <- CPSdesign.single(tbev, Aedes.aegypti, 'min', silent=TRUE, draw=FALSE)[[2]]

# Bind is set to prefer greater CPS in Homo.sapienss while greater CPS in Aedes
# is strongly prohibited. There is room to play with these settings.
tbevDual <- CPSdesign.dual(tbev, Homo.sapiens, Aedes.aegypti, .25, -.02,
bind=c(1,1000,1,.001))
tbevDual

alex-sbu/CPBias documentation built on May 11, 2019, 11:24 p.m.