CPSdesign.single: Gene design with a single codon pair bias reference

Description Usage Arguments Details Value Note Examples

Description

Shuffle codons to generate sequences with altered average codon pair scores, or random permutations, relative to one reference codon pair bias.

Usage

1
2
3
4
5
CPSdesign.single(sequence, reference, score, start = 1, end = NULL,
  cycles = NULL, scramble = FALSE, maxmutations = FALSE,
  transTable = standardTranslation, restrictSeqs = NULL,
  complementary = FALSE, windowSize = NULL, save = FALSE, name = NULL,
  location = NULL, draw = TRUE, silent = FALSE)

Arguments

sequence

Sequences can be input directly as a character string or as the file path to a fasta file. All sequences must be in the correct reading frame, stop codons or codons not defined in the translation table are not allowed and will generate an error.

reference

CPB reference table (see CPBtable).

score

Ideal score relative to the first reference. Input can be numeric, ‘min’, ‘max’, or ‘random’.

start

Nucleotide position in the sequence.

end

Nucleotide position in the sequence. If NULL, the last in frame nucleotide position is used.

cycles

Optional input designating the number of recoding cycles. If empty, a minimal number of cycles is determined. Increasing the number of cycles may result in scores closer to the ideal value.

scramble

Optional TRUE or FALSE input designating whether priority should be given to increasing the number of mutations. If NULL, scramble is set to FALSE.

maxmutations

If scramble and maxmutations are TRUE, the sequence generated with the greatest number of mutations is returned, with little control over the CPS.

transTable

Alternative translation tables can be used.

restrictSeqs

A string of comma separated sequences to remove or avoid in the input sequence while recoding. Search is performed 5' to 3' on the given strand. R regular expressions are allowed.

complementary

If TRUE search for restricted sequences is also performed on the complementary strand 5' to 3'.

windowSize

CPS line plots are smoothed by locally weighted polynomial regression where windowSize designates the number of nucleotides over which individual codon pair scores are smoothed. If NULL, smoothing spans 7.5% of the sequence length.

save

Save recoded sequence in fasta file.

name

Output sequence name and fasta file name.

location

Save location of output sequence.

draw

If TRUE a line plot showing the local CPS along the length of the sequence is output to the graphics device during recoding.

silent

If TRUE output to the console is suppressed.

Details

This function optimizes the shuffling of existing codons in a protein coding sequence while preserving the order of amino acids. Codon usage is not changed by recoding because codons are not added or removed in the process. Shuffling is directed toward an ideal average codon pair score relative to a reference codon pair bias (CPB). CPB references are calculations of observed to expected codon pair frequencies performed on a large number of CDS sequences. See listCPB for a list of available CPB reference tables.

The CPSdesign algorithm generates multiple sequence permutations. Returned sequences are by default those with a score (CPS) closest to the ideal (designated by the score argument). The shuffling algorithm can also be set to favor sequences dissimilar to the original sequence at any possible CPS, by setting scramble to TRUE. Scrambling will preferentially select codon positions different from the original sequence, however extreme scores may not be possible. To fully maximize the number of codon position differences without regard to the ideal score set maxmutations to TRUE.

Scrambling the sequence is not the same as randomly shuffling codons. To generate a true random permutation of existing codons enter “random” for the score argument.

Recoded sequences can be saved as a fasta file with the save argument. Additional information about the recoded sequence is returned as an invisible list.

Value

oldCPS

Average codon pair score of the input sequence.

newCPS

Average codon pair score of recoded sequence.

codonchanges

If codons were changed function will return an error.

mutations

Number of point mutations generated by recoding.

oldCPSarray

Vector of individual codon pair scores for the input sequence.

newCPSarray

Vector of individual codon pair scores for the recoded sequence.

returnSeq

The recoded sequence.

If restricted sequences are given an additional item is returned:

restrSeqs

The number of matches in the recoded sequence to the restricted sequences, value will include the complementary sequence is complementary is TRUE.

Note

Both single and dual reference recoding offer the option of removing certain types of sequence elements by restricting which codons can be paired together. For example, restriction enzyme recognition sequences can be removed or prevented from appearing in the recoded sequence. These restricted sequences can be entered to the restrictSeqs argument as a character string with sequences seperated by commas. A comprehensive list of restriction enzyme recognition sequences expressed as R regular expressions is provided in this package, see REseqs. Restricted sequences are searched only on the single input strand, the complementary arguement allows for searching on the reverse complement strand. Depending on the type and number of sequences that are restricted this functionality can dramatically slow down recoding and restrict the CPS.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
fastaLocation <- system.file('tbevns5.fasta', package = 'CPBias')
tbev <- importFasta(fastaLocation)[[1]]

# Create a Homo.sapiens 'min' using 300 cycles, and save the recoded sequence.
CPSdesign.single(tbev, Homo.sapiens, 'min', cycles = 300, name = 'demoseq CPS min.fasta',
save = TRUE)

# Create a scrambled sequence with a wild-type Homo.sapiens average CPS and omit some
# restriction enzyme sequences.

# Find restriction enzyme recognition sequences in REseqs
selEnz <- which(REseqs[,1] %in% c('PfoI','SmlI','PflFI'))

# Create comma separated string containing regex versions of the recognition sequences
omitRE <- paste0(REseqs[selEnz,5], collapse=',')

# Get WT CPS relative to Homo.sapiens CPB by running CPScalc in silent mode
tbevScrambled <- CPSdesign.single(tbev, Homo.sapiens, CPScalc(tbev, Homo.sapiens, silent=TRUE,
draw= FALSE)[[1]], scramble = TRUE, restrictSeqs = omitRE)

alex-sbu/CPBias documentation built on May 11, 2019, 11:24 p.m.