Description Usage Arguments Details Value Note Examples
Shuffle codons to generate sequences with altered average codon pair scores, or random permutations, relative to one reference codon pair bias.
1 2 3 4 5 |
sequence |
Sequences can be input directly as a character string or as the file path to a fasta file. All sequences must be in the correct reading frame, stop codons or codons not defined in the translation table are not allowed and will generate an error. |
reference |
CPB reference table (see |
score |
Ideal score relative to the first reference. Input can be numeric, ‘min’, ‘max’, or ‘random’. |
start |
Nucleotide position in the sequence. |
end |
Nucleotide position in the sequence. If NULL, the last in frame nucleotide position is used. |
cycles |
Optional input designating the number of recoding cycles. If empty, a minimal number of cycles is determined. Increasing the number of cycles may result in scores closer to the ideal value. |
scramble |
Optional TRUE or FALSE input designating whether priority should be given to increasing the number of mutations. If NULL, |
maxmutations |
If |
transTable |
Alternative translation tables can be used. |
restrictSeqs |
A string of comma separated sequences to remove or avoid in the input sequence while recoding. Search is performed 5' to 3' on the given strand. R regular expressions are allowed. |
complementary |
If TRUE search for restricted sequences is also performed on the complementary strand 5' to 3'. |
windowSize |
CPS line plots are smoothed by locally weighted polynomial regression where |
save |
Save recoded sequence in fasta file. |
name |
Output sequence name and fasta file name. |
location |
Save location of output sequence. |
draw |
If TRUE a line plot showing the local CPS along the length of the sequence is output to the graphics device during recoding. |
silent |
If TRUE output to the console is suppressed. |
This function optimizes the shuffling of existing codons in a protein coding sequence while preserving the order of amino acids. Codon usage is not changed by recoding because codons are not added or removed in the process. Shuffling is directed toward an ideal average codon pair score relative to a reference codon pair bias (CPB). CPB references are calculations of observed to expected codon pair frequencies performed on a large number of CDS sequences. See listCPB
for a list of available CPB reference tables.
The CPSdesign algorithm generates multiple sequence permutations. Returned sequences are by default those with a score (CPS) closest to the ideal (designated by the score
argument). The shuffling algorithm can also be set to favor sequences dissimilar to the original sequence at any possible CPS, by setting scramble
to TRUE. Scrambling will preferentially select codon positions different from the original sequence, however extreme scores may not be possible. To fully maximize the number of codon position differences without regard to the ideal score set maxmutations
to TRUE.
Scrambling the sequence is not the same as randomly shuffling codons. To generate a true random permutation of existing codons enter “random” for the score
argument.
Recoded sequences can be saved as a fasta file with the save
argument. Additional information about the recoded sequence is returned as an invisible list.
oldCPS |
Average codon pair score of the input sequence. |
newCPS |
Average codon pair score of recoded sequence. |
codonchanges |
If codons were changed function will return an error. |
mutations |
Number of point mutations generated by recoding. |
oldCPSarray |
Vector of individual codon pair scores for the input sequence. |
newCPSarray |
Vector of individual codon pair scores for the recoded sequence. |
returnSeq |
The recoded sequence. |
If restricted sequences are given an additional item is returned:
restrSeqs |
The number of matches in the recoded sequence to the restricted sequences, value will include the complementary sequence is |
Both single and dual reference recoding offer the option of removing certain types of sequence elements by restricting which codons can be paired together. For example, restriction enzyme recognition sequences can be removed or prevented from appearing in the recoded sequence. These restricted sequences can be entered to the restrictSeqs
argument as a character string with sequences seperated by commas. A comprehensive list of restriction enzyme recognition sequences expressed as R regular expressions is provided in this package, see REseqs
. Restricted sequences are searched only on the single input strand, the complementary
arguement allows for searching on the reverse complement strand. Depending on the type and number of sequences that are restricted this functionality can dramatically slow down recoding and restrict the CPS.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | fastaLocation <- system.file('tbevns5.fasta', package = 'CPBias')
tbev <- importFasta(fastaLocation)[[1]]
# Create a Homo.sapiens 'min' using 300 cycles, and save the recoded sequence.
CPSdesign.single(tbev, Homo.sapiens, 'min', cycles = 300, name = 'demoseq CPS min.fasta',
save = TRUE)
# Create a scrambled sequence with a wild-type Homo.sapiens average CPS and omit some
# restriction enzyme sequences.
# Find restriction enzyme recognition sequences in REseqs
selEnz <- which(REseqs[,1] %in% c('PfoI','SmlI','PflFI'))
# Create comma separated string containing regex versions of the recognition sequences
omitRE <- paste0(REseqs[selEnz,5], collapse=',')
# Get WT CPS relative to Homo.sapiens CPB by running CPScalc in silent mode
tbevScrambled <- CPSdesign.single(tbev, Homo.sapiens, CPScalc(tbev, Homo.sapiens, silent=TRUE,
draw= FALSE)[[1]], scramble = TRUE, restrictSeqs = omitRE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.