CPBtable: Determine the codon pair bias

Description Usage Arguments Details Value Note References Examples

Description

Calculate the observed to expected frequency of all codon pairs for a given set of protein coding gene sequences.

Usage

1
2
CPBtable(sequences, dnfControl = FALSE, transTable = standardTranslation,
  name = NULL, save = FALSE, location = NULL, silent = FALSE)

Arguments

sequences

Input can be the location of a fasta file, or a character string vector.

dnfControl

If TRUE, III-I dinucleotide bias is factored out.

transTable

Translation table to use for identifying codons. See standardTranslation.

name

Name the table if save is TRUE.

save

TRUE or FALSE to save the CPB reference table as a comma delimited csv file.

location

File path to save the csv file.

silent

If TRUE the progress bar is suppressed.

Details

There are 3,721 coding codon pairs, if using a standard translation table. The score (CPS) of each individual codon pair is determined by,

ln(codon pair[ab] x (amino acid[a] x amino acid[b]))/((codon[a] x codon[b]) x amino acid pair[ab])

Each value is measured as the relative frequency of the total. Tandem codon positions are marked a and b. A codon pair consists of 6 nucleotides, and counting is every three nucleotides along the sequence.

Sequences containing nucleotides not found in the translation table and sequences not divisible by three are excluded. Codon pairs containing codons undefined in the translation table, and codon pairs containing stop codons, will generate NA's and those codon pairs will not be included in the CPB calculation. By default, the standard translation table is used (see standardTranslation). All input sequences should be in frame, protein coding (CDS) sequences.

Value

A list with two elements is returned invisibly:

CPBtable

CPB reference table containing all coding codon pairs and their individual CPS calculated with the above formula. This format can be used as the CPB reference input to the CPSdesign functions.

complete.CPBtable

Larger CPBtable containing the frequencies of all components of the codon pair score calculation.

Note

Codon pair bias, also called codon context, is typically regarded to be species specific. To this end CPB reference tables have been calculated for organisms which have whole genome CDS sequence data available. See listCPB for a list of pre-calculated CPB reference tables.

References

Coleman JR, et al. 2008 Virus attenuation by genome-scale changes in codon pair bias. Science 320(5884):1784–1787.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
fastaLocation <- system.file('ccds.fasta', package = 'CPBias')
ccds <- importFasta(fastaLocation, sepSequences = TRUE)[[1]]

ccdsCPB <- CPBtable(ccds)
# First element in returned list is the  CPB reference table
ccds.sample <- ccdsCPB[[1]]

# CPBtable will import sequences automatically if fasta location is given
ccdsCPB <- CPBtable(fastaLocation)[[1]]
head(ccdsCPB)

# Factor out dinucleotide bias between codons
dnCPB <- CPBtable(fastaLocation, dnfControl=TRUE)[[1]]
plot(ccdsCPB[,2], dnCPB[,2])

alex-sbu/CPBias documentation built on May 11, 2019, 11:24 p.m.